Monday, 2 May 2011

Closures vs. First-class functions vs. Higher-order functions

This is a reply to http://notes-on-haskell.blogspot.com/2007/02/whats-wrong-with-for-loop.html which tries to make the point that closures are a good thing, but ends up not mentioning closures at all in the argument ;) My reply was too long for Blogger, so I've posted it here, since I think it's interesting on its own too.
Note that the examples here are written in Python, and I've put "...." to represent indentation, since the Blogger post editor is terrible at HTML (it second-guesses me every time, surely if there is a "HTML" tab then the "Composer" tab shouldn't be for HTML? But writing escaped HTML entities just makes them appear verbatim in their escaped form >:( )

As has been stated above, the article uses first-class and higher-order functions, without actually making use of closures.

The difference is that closures contain an environment, which can be modified by the code in the closure. Thus calling a closure over and over with the same arguments won't always give you the same results.

My Java's a little rusty, so I'll give some examples in Python. Let's say we want to sum a list of numbers. The first-class function approach, shown in the article, relies on the concept of a "sum" function for adding values, and a "reduce" function for walking the list:

def sum(x, y):
....return x+y

total = reduce(sum,my_list)

The way we would approach this with closures would be to define an "accumulate" closure. This is like the "sum" function but instead of taking 2 arguments and returning their sum, it takes 1 argument and adds this to its own internal state. We can then use "map" to apply it to my_list:

def make_accumulator():
....running_total = 0
....def acc(a):
........running_total += a
........return running_total
....return acc

accumulator = make_accumulator()

map(accumulator, my_list)

total = accumulator(0)

Python's scoping rules are a little weird, so I'll walk through this. First we create a first-class function object and call it "make_accumulator". Whenever this function is called, it creates 2 objects in its internal namespace; a number object called "running_total" and a function object called "acc".

Crucially, Python's name resolving works from the inside out: any code can access variables defined in a 'parent' namespace (as long as this name hasn't been overridden in the local namespace), but cannot access any local namespaces defined inside it (eg. the namespace of an inner function).

Thus "acc" has complete access to the "running_total" variable, and thus the function is free to increment running_total by whatever argument it is passed.
"acc" isn't yet a closure, since the body of "make_accumulator" is also free to change the value of running_total, although in our case it simply returns the "acc" function object.

Next we call "make_accumulator" and bind the function it returns to the variable "accumulator". It is actually this binding that makes "accumulator" a closure, rather than a regular function!

In order to be a closure, "acc" functions (like the one bound to "accumulator") need exclusive control over ('to close over') their internal environment, which in this case is "running_total". As long as "make_accumulator" is running, this is not the case. Once it finishes then the function obtains complete control over running_total and becomes a closure. If the result is discarded, however, the "acc" closure just gets garbage collected and is useless. However, if we bind it, like we do to "accumulator", we have a function with internal state. (Note that calling "make_accumulator" again will create new, independent instances of "running_total" and "acc", so our closures retain complete control of their own instances of running_total).

With this closure in hand we then run every value of my_list through it using "map". This returns a list of each intermediate result, but we don't care about them (except for the last one) so they're discarded.

To recover the final result we call the closure again, but with an identity value (0 is the identity for addition). This gives us our result.

Note that closures don't have to return their state when called. For example we could make a closure that returns its argument, but negates it on alternate calls:

def make_flipper():
....is_odd=True
....def flipper(x):
........is_odd = not is_odd
........if is_odd:
............return x
........else:
............return -1 * x
....return flipper

f = make_flipper()
print str(map(f,[0,1,2,3,4,5]))

This would output "[0,1,-2,3,-4,5]" (since -1 * 0 = 0)

Of course there are parallels to be made between closures (functions with internal values) and objects (values with internal functions ('methods')). It's been said that closures are a poor man's objects, and that objects are a poor man's closures. Still, they're another useful tool to have available, especially if they're done in a less clunky way than in Python (which seems like a useful side-effect of the scoping rules, rather than an explicit design decision).