Monday 2 May 2011

Closures vs. First-class functions vs. Higher-order functions

This is a reply to http://notes-on-haskell.blogspot.com/2007/02/whats-wrong-with-for-loop.html which tries to make the point that closures are a good thing, but ends up not mentioning closures at all in the argument ;) My reply was too long for Blogger, so I've posted it here, since I think it's interesting on its own too.
Note that the examples here are written in Python, and I've put "...." to represent indentation, since the Blogger post editor is terrible at HTML (it second-guesses me every time, surely if there is a "HTML" tab then the "Composer" tab shouldn't be for HTML? But writing escaped HTML entities just makes them appear verbatim in their escaped form >:( )

As has been stated above, the article uses first-class and higher-order functions, without actually making use of closures.

The difference is that closures contain an environment, which can be modified by the code in the closure. Thus calling a closure over and over with the same arguments won't always give you the same results.

My Java's a little rusty, so I'll give some examples in Python. Let's say we want to sum a list of numbers. The first-class function approach, shown in the article, relies on the concept of a "sum" function for adding values, and a "reduce" function for walking the list:

def sum(x, y):
....return x+y

total = reduce(sum,my_list)

The way we would approach this with closures would be to define an "accumulate" closure. This is like the "sum" function but instead of taking 2 arguments and returning their sum, it takes 1 argument and adds this to its own internal state. We can then use "map" to apply it to my_list:

def make_accumulator():
....running_total = 0
....def acc(a):
........running_total += a
........return running_total
....return acc

accumulator = make_accumulator()

map(accumulator, my_list)

total = accumulator(0)

Python's scoping rules are a little weird, so I'll walk through this. First we create a first-class function object and call it "make_accumulator". Whenever this function is called, it creates 2 objects in its internal namespace; a number object called "running_total" and a function object called "acc".

Crucially, Python's name resolving works from the inside out: any code can access variables defined in a 'parent' namespace (as long as this name hasn't been overridden in the local namespace), but cannot access any local namespaces defined inside it (eg. the namespace of an inner function).

Thus "acc" has complete access to the "running_total" variable, and thus the function is free to increment running_total by whatever argument it is passed.
"acc" isn't yet a closure, since the body of "make_accumulator" is also free to change the value of running_total, although in our case it simply returns the "acc" function object.

Next we call "make_accumulator" and bind the function it returns to the variable "accumulator". It is actually this binding that makes "accumulator" a closure, rather than a regular function!

In order to be a closure, "acc" functions (like the one bound to "accumulator") need exclusive control over ('to close over') their internal environment, which in this case is "running_total". As long as "make_accumulator" is running, this is not the case. Once it finishes then the function obtains complete control over running_total and becomes a closure. If the result is discarded, however, the "acc" closure just gets garbage collected and is useless. However, if we bind it, like we do to "accumulator", we have a function with internal state. (Note that calling "make_accumulator" again will create new, independent instances of "running_total" and "acc", so our closures retain complete control of their own instances of running_total).

With this closure in hand we then run every value of my_list through it using "map". This returns a list of each intermediate result, but we don't care about them (except for the last one) so they're discarded.

To recover the final result we call the closure again, but with an identity value (0 is the identity for addition). This gives us our result.

Note that closures don't have to return their state when called. For example we could make a closure that returns its argument, but negates it on alternate calls:

def make_flipper():
....is_odd=True
....def flipper(x):
........is_odd = not is_odd
........if is_odd:
............return x
........else:
............return -1 * x
....return flipper

f = make_flipper()
print str(map(f,[0,1,2,3,4,5]))

This would output "[0,1,-2,3,-4,5]" (since -1 * 0 = 0)

Of course there are parallels to be made between closures (functions with internal values) and objects (values with internal functions ('methods')). It's been said that closures are a poor man's objects, and that objects are a poor man's closures. Still, they're another useful tool to have available, especially if they're done in a less clunky way than in Python (which seems like a useful side-effect of the scoping rules, rather than an explicit design decision).

Tuesday 12 April 2011

Python Decompiler Performance

I knew the performance of Python Decompiler was rubbish, but I didn't realise how rubbish! Based on the code analysis I wrote about last time, I've put the node classes in order of frequency and reordered the comparisons made in "thing" to match this. That's all I've done, and it's given a 58% speed increase and a 47% RAM decrease compared to the alphabetical order! Running the tests file on itself now only takes 16 seconds and 69MB of RAM. If I compare it to before I had a go at optimising, back before it used the input stream as a stack, the current code takes 64% less time and 66% less RAM.

Nice!

Code, as always, is on Gitorious :)

Saturday 9 April 2011

Some Python Stats

I realised today that there's a really easy-to-fix bottleneck in my Python Abstract Syntax Tree decompiler. The decompiler looks for "things", and for each "thing" it checks against a list of alternatives to see what it might be (and hence how to decompile it). This list is ordered, so it checks against the first rule, if it doesn't match then it checks against the next rule and so on until it either finds a match, or it runs out of rules to check against (and throws an error).

This ordering of the checks, specifically ordered choice, is what defines Parsing Expression Grammars as a deterministic counterpart to Backus-Naur Form, along with greedy matching.

There are two quirks of my Python decompiler, however, that can be used to our advantage to make it faster really easily.

Firstly, if the choice were not ordered, as in Backus-Naur Form, there is only one instance of nondeterminism, which happens for deletion. Thus, as long as deletion comes first, we can actually rearrange the other rules however we like.

The second quirk is that I have implemented the rules pretty much in accordance with the classes of AST nodes used in Python's compiler module. This means that we can predict the chances of each match succeeding by simply parsing existing Python code and counting up the number of times each node occurs.

I've written a quick script to do this counting, which you can find here. To use it, you need to provide a list of Python files for it to check. I did this for my whole operating system by running the command 'find / -name "*.py" > ALL_PYTHON_FILES' which will (after some time and "permission denied" warnings) give you a list called "ALL_PYTHON_FILES". For me this list contains 25,087 files.

Whilst the node_counter.py script runs reasonably quickly, I keep getting a segmentation fault with around 23,700 files to go. For this reason, I've also given it an optional second argument, which is the number of files to use from the given list. These are picked at random from throughout the file, to prevent it grabbing everything from the same project and thus biasing the results.

The script outputs a spreadsheet which contains each type of node and the number of times it was found in the files it checked. I used Gnumeric to work out some percentages and collate the results from running it against 1 file, 10 files, 100 files and 1000 files, and generated the following (which you can find the source for here)

The green lines are the most accurate, since they're from the sample of 1000 files; the others are there to give an indication of the variance, because I couldn't be bothered to work out the standard deviation.

What this shows us is that we can't guess what kind of node a given "thing" will be, since none of these is above 0.5 (ie. 50%), but what it does show is that some types of node are far more common than others. We can see that Name nodes (which represent the "x" in "x = 5", for example) are used all over the place, such that around 30% of the nodes are Names. Getattr is probably the next-most-used node (which represents the ".foo" in "bar.foo = 8") and the frequency rapidly decreases until we get some very niche nodes like Ellipsis which don't occur at all. Note that some classes, like "Expr" and "Node" will always show zero since they're superclasses that aren't directly instantiated by the compiler.

So what does this mean for performance increases? It means that by putting the node types in order of frequency, we can make the most common cases match after only a few attempts, whilst sacrificing the performance of rarely used classes.

Since this script is part of the Python decompiler, it's Public Domain. If it's of any use to you, go ahead and take it :)