Monday 2 May 2011

Closures vs. First-class functions vs. Higher-order functions

This is a reply to http://notes-on-haskell.blogspot.com/2007/02/whats-wrong-with-for-loop.html which tries to make the point that closures are a good thing, but ends up not mentioning closures at all in the argument ;) My reply was too long for Blogger, so I've posted it here, since I think it's interesting on its own too.
Note that the examples here are written in Python, and I've put "...." to represent indentation, since the Blogger post editor is terrible at HTML (it second-guesses me every time, surely if there is a "HTML" tab then the "Composer" tab shouldn't be for HTML? But writing escaped HTML entities just makes them appear verbatim in their escaped form >:( )

As has been stated above, the article uses first-class and higher-order functions, without actually making use of closures.

The difference is that closures contain an environment, which can be modified by the code in the closure. Thus calling a closure over and over with the same arguments won't always give you the same results.

My Java's a little rusty, so I'll give some examples in Python. Let's say we want to sum a list of numbers. The first-class function approach, shown in the article, relies on the concept of a "sum" function for adding values, and a "reduce" function for walking the list:

def sum(x, y):
....return x+y

total = reduce(sum,my_list)

The way we would approach this with closures would be to define an "accumulate" closure. This is like the "sum" function but instead of taking 2 arguments and returning their sum, it takes 1 argument and adds this to its own internal state. We can then use "map" to apply it to my_list:

def make_accumulator():
....running_total = 0
....def acc(a):
........running_total += a
........return running_total
....return acc

accumulator = make_accumulator()

map(accumulator, my_list)

total = accumulator(0)

Python's scoping rules are a little weird, so I'll walk through this. First we create a first-class function object and call it "make_accumulator". Whenever this function is called, it creates 2 objects in its internal namespace; a number object called "running_total" and a function object called "acc".

Crucially, Python's name resolving works from the inside out: any code can access variables defined in a 'parent' namespace (as long as this name hasn't been overridden in the local namespace), but cannot access any local namespaces defined inside it (eg. the namespace of an inner function).

Thus "acc" has complete access to the "running_total" variable, and thus the function is free to increment running_total by whatever argument it is passed.
"acc" isn't yet a closure, since the body of "make_accumulator" is also free to change the value of running_total, although in our case it simply returns the "acc" function object.

Next we call "make_accumulator" and bind the function it returns to the variable "accumulator". It is actually this binding that makes "accumulator" a closure, rather than a regular function!

In order to be a closure, "acc" functions (like the one bound to "accumulator") need exclusive control over ('to close over') their internal environment, which in this case is "running_total". As long as "make_accumulator" is running, this is not the case. Once it finishes then the function obtains complete control over running_total and becomes a closure. If the result is discarded, however, the "acc" closure just gets garbage collected and is useless. However, if we bind it, like we do to "accumulator", we have a function with internal state. (Note that calling "make_accumulator" again will create new, independent instances of "running_total" and "acc", so our closures retain complete control of their own instances of running_total).

With this closure in hand we then run every value of my_list through it using "map". This returns a list of each intermediate result, but we don't care about them (except for the last one) so they're discarded.

To recover the final result we call the closure again, but with an identity value (0 is the identity for addition). This gives us our result.

Note that closures don't have to return their state when called. For example we could make a closure that returns its argument, but negates it on alternate calls:

def make_flipper():
....is_odd=True
....def flipper(x):
........is_odd = not is_odd
........if is_odd:
............return x
........else:
............return -1 * x
....return flipper

f = make_flipper()
print str(map(f,[0,1,2,3,4,5]))

This would output "[0,1,-2,3,-4,5]" (since -1 * 0 = 0)

Of course there are parallels to be made between closures (functions with internal values) and objects (values with internal functions ('methods')). It's been said that closures are a poor man's objects, and that objects are a poor man's closures. Still, they're another useful tool to have available, especially if they're done in a less clunky way than in Python (which seems like a useful side-effect of the scoping rules, rather than an explicit design decision).

Tuesday 12 April 2011

Python Decompiler Performance

I knew the performance of Python Decompiler was rubbish, but I didn't realise how rubbish! Based on the code analysis I wrote about last time, I've put the node classes in order of frequency and reordered the comparisons made in "thing" to match this. That's all I've done, and it's given a 58% speed increase and a 47% RAM decrease compared to the alphabetical order! Running the tests file on itself now only takes 16 seconds and 69MB of RAM. If I compare it to before I had a go at optimising, back before it used the input stream as a stack, the current code takes 64% less time and 66% less RAM.

Nice!

Code, as always, is on Gitorious :)

Saturday 9 April 2011

Some Python Stats

I realised today that there's a really easy-to-fix bottleneck in my Python Abstract Syntax Tree decompiler. The decompiler looks for "things", and for each "thing" it checks against a list of alternatives to see what it might be (and hence how to decompile it). This list is ordered, so it checks against the first rule, if it doesn't match then it checks against the next rule and so on until it either finds a match, or it runs out of rules to check against (and throws an error).

This ordering of the checks, specifically ordered choice, is what defines Parsing Expression Grammars as a deterministic counterpart to Backus-Naur Form, along with greedy matching.

There are two quirks of my Python decompiler, however, that can be used to our advantage to make it faster really easily.

Firstly, if the choice were not ordered, as in Backus-Naur Form, there is only one instance of nondeterminism, which happens for deletion. Thus, as long as deletion comes first, we can actually rearrange the other rules however we like.

The second quirk is that I have implemented the rules pretty much in accordance with the classes of AST nodes used in Python's compiler module. This means that we can predict the chances of each match succeeding by simply parsing existing Python code and counting up the number of times each node occurs.

I've written a quick script to do this counting, which you can find here. To use it, you need to provide a list of Python files for it to check. I did this for my whole operating system by running the command 'find / -name "*.py" > ALL_PYTHON_FILES' which will (after some time and "permission denied" warnings) give you a list called "ALL_PYTHON_FILES". For me this list contains 25,087 files.

Whilst the node_counter.py script runs reasonably quickly, I keep getting a segmentation fault with around 23,700 files to go. For this reason, I've also given it an optional second argument, which is the number of files to use from the given list. These are picked at random from throughout the file, to prevent it grabbing everything from the same project and thus biasing the results.

The script outputs a spreadsheet which contains each type of node and the number of times it was found in the files it checked. I used Gnumeric to work out some percentages and collate the results from running it against 1 file, 10 files, 100 files and 1000 files, and generated the following (which you can find the source for here)

The green lines are the most accurate, since they're from the sample of 1000 files; the others are there to give an indication of the variance, because I couldn't be bothered to work out the standard deviation.

What this shows us is that we can't guess what kind of node a given "thing" will be, since none of these is above 0.5 (ie. 50%), but what it does show is that some types of node are far more common than others. We can see that Name nodes (which represent the "x" in "x = 5", for example) are used all over the place, such that around 30% of the nodes are Names. Getattr is probably the next-most-used node (which represents the ".foo" in "bar.foo = 8") and the frequency rapidly decreases until we get some very niche nodes like Ellipsis which don't occur at all. Note that some classes, like "Expr" and "Node" will always show zero since they're superclasses that aren't directly instantiated by the compiler.

So what does this mean for performance increases? It means that by putting the node types in order of frequency, we can make the most common cases match after only a few attempts, whilst sacrificing the performance of rarely used classes.

Since this script is part of the Python decompiler, it's Public Domain. If it's of any use to you, go ahead and take it :)

Friday 8 April 2011

Dear World

Have you *STILL* not realised that everything you ever put online is up for grabs? Just because someone spouts some bullshit phrases like "privacy settings", "passwords", "secret questions", etc. doesn't change this fact. What we do know is:

1) It's not known if computer security is even possible in theory.

2) Even if computer security turns out to be possible, then implementing it correctly is still a difficult problem.

3) Even if computer security is possible, and someone manages to implement it correctly, you have no idea if they're lying unless you have full access to all parts of the code and environment.

4) Even if computer security is possible, and someone manages to implement it correctly, and this can be verified, then you sure as hell shouldn't give them any stuff you care about, because they'll put in in a system that's theoretically sound, verified, correct and proven to be able to stop you ever getting it back. In short, you'd be fucked.

Whilst all software fails the first test, and most software fails the second test, Dropbox also fails the third and fourth. Don't use it, or anything like it, for anything private. If you still want to upload stuff then there are plenty of Free Software programs for doing so, ie. they pass tests 3 and 4.

Yes, I know that banks, credit card companies and the like send transactions over the Internet, but that doesn't mean that doing so is secure. It means that doing so is cheap. So cheap, in fact, that they can afford to pay for insurance to cover all of the fraud that happens on the system every day.

Update: DropBox are now being investigated for false advertising. What's the false claim they made? That they're secure. Specifically, DropBox claimed that their employees cannot access any of the data stored in their system, which turns out to be complete bollocks. The investigation is being made since DropBox has provided a simple, quick, insecure system but presented it as a simple, quick, secure system; whilst companies attempting to create real secure systems, which inevitably end up less simple and quick than DropBox, have been left in the dust by DropBox.

The moral of the story is to run your own clustered filesystem, or if speed isn't your main concern then use FreeNet.

Abstraction Fail

Loading Javascript in a Web page. Pretty high-level stuff. Isn't it fun to live in a world without endianness, memory allocation, GOTO, compilation, packets, out-of-order message arrival, unreliable streams, network addresses, filesystems and all of that other low-level crap? We can float in our clouds high above it all :)

Except when Facebook Connect doesn't work in IE because of an OS bug in the networking stack. The network layer in Windows XP can throw away incoming data if it is a) "chunked" (ie. split into sections) and b) compressed. Since missing data causes the decompression to fail, this results in compressed data being sent to the browser, which subsequently ignores it as incomprehensible gibberish. Thus Facebook Connect didn't work. Bloody Microsoft >:(

If you've been hit by this, there's no solution I know of. As a workaround, you may be able to try swapping from HTTP to HTTPS or HTTPS to HTTP in the URL you're referencing, but of course that will only work if a) the server you're contacting supports HTTP and HTTPS (Facebook does, thankfully), b) the one you switch to doesn't trigger this bug as well and c) you don't give a crap about using SSL.

Saturday 2 April 2011

PyMeta Stacks

As you may know if you've been following my blog, I've been working on a project for a few years called, rather glamorously, "python decompiler". I've released it into the Public Domain and you can find it on gitorious.

The idea is that compiling software (translating it from one form to another) is a mathematical operation, and that for every mathematical operation we would like to know its inverse. Version 2 of Python contains a compiler, which creates Abstract Syntax Trees (ASTs) out of Python code, but there is no reverse transformation to be found. That's what python decompiler does; it takes an AST and produces Python code that will compile into that AST, thus compile(decompile(compile(code))) is the same as compile(code). We can't quite make decompile(compile(code)) match the original code since the compiler throws away information ((((((like redundant brackets)))))).

So how does it work? I've used the PyMeta library, a Python implementation of the OMeta pattern matching system. This is applied to the AST, and pattern-matches against AST node objects. The only difficulty with doing this is that a OMeta is designed for pattern-matching against an iterable object, such as a string of text, a list of objects, etc. and I want to use it to pattern-match recursively inside objects, preferably without blowing those objects apart into a serialised form beforehand.

The original way I did this was to instantiate a new pattern matcher on every child object, so for example the Python code "1 + 2 * 3 / 4 - 5" will produce an AST a bit like "Sub(Add(Const(1),Div(Mul(Const(2),Const(3)),Const(4)),Const(5))". The classical way to pattern-match such a tree is to make a rule that matches "Sub(A,B)" where "A" and "B" are themselves other rules, and so on. This recursion of rules is great for digging out structures like the AST I've written out as text above, however in Python we've got a structured system of objects that a) doesn't need constructing (it's already structured) and b) has encapsulated innards, which makes pattern-matching against the structure tedious (although Python's introspection makes it reasonable, at least). Since Python is a dynamic language we would also like to avoid having to look at the inside of objects, since that would make the code less flexible than if it implies the contents. By making new pattern matchers, rather than defining new rules, we get arbitrary nesting ability and don't have to look inside objects (they look inside themselves, recursively). The drawback to this is that every single node in the AST needs its own pattern matcher, which takes ages to run (since OMeta pattern matchers are bootstrapped from the ground up in themselves) and eats up tons of RAM.

I received a couple of emails about python decompiler the other day, so it seems that people may be playing with it after all. To that end, I decided to have a bit of a think about improving its efficiency, and realised that I could get the required recursion if I turn the input stream into a stack. This is suprisingly easy to do, and straightforward to use (as long as you make sure the stack will be clean when you're finished, regardless of if your rules match or not!). We add the following function to a newly constructed grammar (which is a class in PyMeta):

def ins(self, val):
...self.input.data.insert(self.input.position+1, val)
...self.input.tl = None
...self.input.memo = {}
grammar.ins = ins

(I've written "..." to represent indentation. Silly blogger HTML editor.) 5 lines is all it takes, but now we can push on to a pattern matcher's input stream from within the grammar. We do this by calling Python code from within a rule, by wrapping it as !(code). Thus we just have to call !(self.ins(a)) and "a", whatever it is, will be the next thing on the input stream, so we can match against it. This turns the input stream into a stack, and makes OMeta recursive whilst only needing 1 pattern matcher object.

I've now converted python decompiler to use this form. I've run some quick tests by getting python decompiler to compile, decompile and recompile its own test file, and check the results for validity. The previous method managed, using the "time" utility and KDE's system monitor, take 44 seconds and 202MB of RAM. This is a completely stupid amount of resources for text processing, even in an interpreted language. After the change it now takes 38 seconds and 129MB RAM. That's not a great speed improvement (14%), but the memory usage is much better (a 36% reduction). I'm sure I can get this down even more with a little playing :)

Saturday 22 January 2011

Enigma Number 1628

Was reading through New Scientist and thought I'd have a go at their "Enigma" puzzle, which I've always glossed over since they're pretty difficult maths. However, since I've got a computer I thought I'd see what the answer is anyway.

The question is this: raising a number to the power of itself is like 5 to the power of 5, or 5**5 in Python; a 'reverse' number is a number with its digits backwards, so 45 is the reverse of 54; a number that ends with another means that the least significant digits of the former are the same as all of the digits of the latter in the same order. Given this, what is the 2 digit number x such that x raised to the power of x ends with x, the reverse of x raised to the reverse of x ends with the reverse of x, the reverse of x raised to the power of x ends with the reverse of x and the digits in x aren't the same?

Translating this into Python is quite simple. We start with all of the possible answers, ie. the numbers 00-99, which we get via range(100). Next we can split them into a pair of the "tens and units" as they used to say in primary school. We do this with [(a/10,a%10) for a in range(100)]. Next we can put these back into a list of numbers by undoing the separation of the digits, like this [x*10+y for x,y in [(a/10,a%10) for a in range(100)]]. Next we want to put our filters on this. Namely, the 2 digits x and y cannot be equal, so x!=y; the number raised to itself, (x*10+y)**(x*10+y), must ends with itself, ie. ((x*10+y)**(x*10+y))%100==x*10+y; the same must be true for the reverse, where we just swap x and y; and the reverse to the power of the number, (y*10+x)**(x*10+y), must end in the reverse, y*10+x, so ((y*10+x)**(x*10+y))%100==y*10+x. Putting it all together we get a list of answers:

[x*10+y for x,y in [(a/10,a%10) for a in range(100)] if x!=y and ((x*10+y)**(x*10+y))%100==(x*10+y) and ((y*10+x)**(y*10+x))%100==(y*10+x) and ((y*10+x)**(x*10+y))%100==(y*10+x)]

Easy :)

PS: The answers are 16, 61 and 57