Thursday, 9 December 2010

Blog dump 1: Diet Python

I've got a few blog posts hanging around on my PC, so I thought I'd post them even though they're not really finished. Here's the first.

I've talked about this before, but have been doing some more work on Diet Python (available in my Python Decompiler on gitorious.org).

Diet Python is a sub-set of Python which is just as expressive. The PyPy project, for example, uses a subset of Python called RPython which is restricted to exclude the really dynamic stuff, and Google's Go language is essentially the same. They exist to keep the syntax and readability of Python, but by throwing away bits they make it much easier to implement. That's not the goal of Diet Python. In Diet Python we don't really care about readability, and syntax can be as awkward as we like as long as it's still valid Python. What Diet Python does is throw away the redundant bits of Python that can be replaced with other Python that does exactly the same thing.

As a simple example, a + b does the same thing as a.__add__(b), but if we're making a Python interpreter we have to handle both. Diet Python throws away the +, since it's less general than the function call, so that Diet Python implementations would only have to implement the function call, they can forget about the +. Diet Python has a 'compiler' which translates Python into Diet Python, so that we can write normal Python (eg. a + b) and then strip away the fat to get a Diet Python equivalent. This Diet Python, since it's a subset of Python, will also run fine in a standard Python interpreter.

There are two approaches I'm taking with Diet Python. The first is to remain 100% compatible, so that there is no change between the input semantics and the output semantics and both can run the same in a standard Python interpreter. The other is to make the output as 'pure' as possible, so that we may end up doing things which make no sense in standard Python but 'should' make sense: for example adding methods to True and False. In CPython that's not possible since they're poorly implemented in C, and this would also change their API, but as a Python programmer it feels like this is a bug in CPython and that there's no reason to stop doing it just because it doesn't work due to CPython. That gives optional translations which, for example, change:

if a:

print b

else:

print c

to (a).__if__('print b', 'print c') which wouldn't work in CPython due to its True and False peculiarities.

Given enough of each type of translation, compatible and uncompatible, I hope to able to get Python code down to a bare minimum of syntax. I believe that to be message sends and strings.

The blog post kind of trails off there. Oh well, better that I'm writing code than blogs anyway ;)

No comments: