Tuesday 11 August 2009

A Decent Compressed Filesystem At Last?

I have a 2.4GHz CPU, a 250GB hard drive and I want to store 29GB of zipped music I just downloaded. The music is in a variety of ProTracker formats, so what's the best way to store it?

I know for a fact that tracker songs are very easy to compress, since the format originates on computer systems like the Amiga where any compression more sophisticated than Run Length Encoding takes away valuable clock cycles which are often needed elsewhere. Such uncompressed, or crudely compressed, data is ripe for extra compression (hence why it is transported in a zip archive).

Actually playing this music is now nontrivial though, since whilst all of my music players support protracker, not all of them support reading them from zip files. The obvious thing to do is decompress them, but since trackers are highly compressible, this means that the 29GB archive will certainly become even more vast when decompressed, and that's just a waste of storage space. The solution is to use a compressed filesystem.

For those who are stuck on shit operating systems, I'll give you a little insight into how your computer works. Let's say we're running Microsoft Windows, and we insert a CD into our CDROM drive. In "My Computer" we may see our CDROM is called "D", and if we double click on it we see the things stored on it. However, we then go back to My Computer and right click on D, then tell it to eject. We take out the CD and close the drive. Now we tell it to eject again, and the drive opens. So, what is "D"? It's our CD, since we saw the files on it, but it's also the drive, since we can tell it to do things when our CD is nowhere in sight. Which is it really? Well, that depends completely upon what you want to do, since Windows is a very poor system indeed.

Let's imagine a similar thing on a UNIX system like Debian. Our CDROM will be something like /dev/hda. We can send it commands, for example "eject /dev/hda" (or do the same thing in whichever graphical environment you happen to use). Can we access the files on a disc through /dev/hda? No. What we can do, however, is take the data given to use by the drive, and reconstruct it somewhere. This is what we mean by mounting the filesystem. The filesystem is the particular way the ones and zeroes represent our files and folders (CDs use the standard ISO9660, which is often extended with certain proprietary formats from Microsoft to allow longer filenames. Ironic, considering that the original format only had short 8.3 filenames to make sure it would work on Microsoft's POS crapware), whilst mounting it means making it available to peruse. Here we can choose where we want to see things, so we can run a command like "mount /dev/hda /our_cd". Now if we go to the folder /our_cd the previous contents will be hidden (until the CD is unmounted) and the contents of the disk is accessible. /our_cd is a different file to /dev/hda. The same is true of hard drives.

In fact, filesystems can be anything. The FUSE (Filesystems in USErspace) driver allows regular programs to be written which can be accessed like filesystems. This is how the Wikipedia filesystem works (mount it somewhere and that folder becomes filled with files for each article), how the GMail filesystem works (data is stored online as a series of emails in a GMail account, which can be retrieved from anywhere and are automatically converted back to their original state), and many others. There are several FUSE filesystems which transparently compress and decompress their contents, ie. everything copied into the filesystem is sent to a compression program and saved somewhere, and everything read from the filesystem is sent through a decompression program before it reaches the destination. Thus any program, as long as it can load files, can load compressed files, and any program, as long as it can save files, can save compressed files.

Seems cool, but until recently the only ones I'd tested were pretty dire, most noticably CompFUSEd. INCREDIBLY slow and memory hungry, it was not worth using, and that was only when a few MB were put into it.

However, FuseCompress has recently been added to Debian, and I'm trying it out for these tracker modules. Whilst populating the filesystem is taking a while (I'm having to decompress all 122 thousand songs (since that's how we want them to be read and therefore written), then move them into the filesystem where they're recompressed (although using LZO this time, which is damned fast). All I can say is thank Guido for Python, since it makes automating such things a breeze :)

I thought I'd finish with a little introduction to compression, and what it actually is. Compression is completely based on Claude Shannon's Information Theory (also the basis of using switches to represent Boolean logic, ie. allowing a physical way of building Alan Turing's theoretical "computers"). Information, measured in bits, is irreducible, you can't throw away any bits without losing some information (compression which does this is called "lossy" compression, for example the Vorbis audio codec. However, the algorithms are crafted in such a way that information is only thrown away when it is imperceptible to us, in the case of Vorbis we can't hear the difference), but the key thing to know is that each bit of information is not necessarily mapped one-to-one with each bit of whatever material is being used to store it (eg. magnetic domains on a hard drive). A bit of information is required to describe a situation where there is a 50/50 chance of the next bit being 0 or 1, but this is only true for random sequences, or those which appear random. A sequence like "AAAAAAAAAAAAAAAAAAAA" is not random (each bit isn't independent of the previous), and thus does not require all of the 160 bits that are being used to describe it. Information theory gives us a lower limit, saying how many bits are REQUIRED to store the given information, so we want our algorithms to approach this limit as much as possible.

A really simple type of compression is Run Length Encoding. This looks for repetition and replaces it with multiplication. For example, the sequence "ABCBAAAAAAAAABBCCCCCCCCDBDDDDB" could be compressed to "ABCB9ABB8CDB4DB". Our algorithm here is simply "if you find more than 2 of the same letter in a group then replace that group with its size followed by the letter", so a stream of "AAAAAAAAA" becomes "9A". To decompress this we use the algorithm "If you find a number, put that many of the next letter".

There is a slightly more general form of this, where instead of grouping similar things, we use a pointer. The pointer is a number which means "I am the same as whatever is this far behind me". In this way "AAAAAAAAA" can become "A12345678", where each number is telling us to get the next value from this far back (they all point back to the first A), however pointers can also point to pointers, so we could just as easily put "A11111111", since each of those "1"s becomes an "A", so it's perfectly valid for the next one along to point to it. This is easily compressible with the runlength encoding seen above (which is just a subset of this form of compression), but is more powerful. For example "ABABABABABABABABABABABA" cannot be compressed with runlength encoding, but using pointers we can compress it to "AB222222222222222222222". Now this is easily compressible. This works for any size of pattern, although since it is a search operation it can get quite slow for large search spaces, so files are usually split into more manageable chunks first.

A final form of compression I'd like to mention involves binary trees. Let's say we made a survey of the occurances of every letter in a file. We could say, for example, that "e" was the most common, followed by "a", followed by "s" and so on. Now we can compress these from their usual 8 bits to a much more compact form. First we define a binary tree, that is a tree where every non-leaf node (branch junction) has a left and a right branch. If we meet a "0" we will go down the left branch and if we meet a "1" we will go down the right branch. Our tree will begin with two branches, and we can stick "e" at the end of the right branch. On the left branch we put another node with two children, on the right we put "a" and on the left we put another node with two children. On the right of this we put "s" and on the left another node with two children, and so on. Now we can replace each letter by the path we must take in our tree to reach it (with our tree we know that "1" marks the end of a letter). Every time we find an "e" we simply put a "1", since that's how we get to "e" from the top of the tree (1 means go right) This saves 7/8 of the space every time. Every time we find an "a" we replace it with "01", which saves us 3/4 of the space, an "s" with "001", and so on. By constructing optimised trees (which is once again a search operation) we can get really good compression ratios.

Anyway, rant over since my script has finished moving all of the "A"s :D

Tuesday 4 August 2009

The Ugliest Hack I've Written So Far

raise :i ::= :a ?(a.__class__ == Raise) => 'raise '+', '.join([t[0] for t in [[e] for n,e in enumerate([a.expr3,a.expr2,a.expr1]) if e is not None or any([a.expr1,a.expr2,a.expr3][-(n+1):])] if (t[0] is not None and t.__setitem__(0,t[0].rec(i))) or (t[0] is None and t.__setitem__(0, 'None')) or True][::-1])

This is a 1-line PyMeta rule which means the following:

Define a rule named "raise" with an amount of indentation "i", which applies to anything, which we'll call "a", as long as "a" is a type of 'Raise'. Upon finding such a thing we should output a string 'raise ' followed by the first item of every list in the set of singleton lists of "a"'s attributes 'expr3', 'expr2' and 'expr1' when reversed which is either not equal to "None" or else comes after a non-None attribute, if the element of these lists is not None and swapping the first element for its contents recursively at the same indentation level, or if it is None then replacing it with the string "None".

If I were writing this normally it would be something much cleaner like:

def raise(i, a):
  if a.__class__ == Raise:
    to_return = 'raise '
    attribs = [a.expr3, a.expr2, a.expr1]
    to_keep = []
    not_end = False
    for att in attribs:
      if att is None or not_end:
        to_keep.append(att)
      elif (not att is None) and (not not_end):
        to_keep.append(att)
        not_end = True
    to_keep.reverse()
    to_return = to_return+', '.join(to_keep)
    return to_return

but the default PyMeta grammar only allows a single line of Python in the output. Whilst the point of OMeta is that I can subclass and rewrite it to work in whatever way I want, I can't get the hang of subclassing grammars yet, and hence this mind-bending, yet at least partially elegant, functional approach.

Tuesday 26 May 2009

Wish I had some free time

Just trying to publish stuff that's been in draft form for months, whether it's finished or not :P

Lectures are finally over for this semester (mine went on a week longer than everyone else's it seems) and my exams start next week (I only have 3, as opposed to the usual 6!). However, I've got projects overdue and some slipping off the radar altogether because of my stubborn insistence on doing things correctly. I didn't get very far at all with my second Numerical and Computational Physics assignment, since I can't get my head around the Runge Kutta method of approximating functions based on their derivative and some initial conditions. I did manage to use them during an assessment/exam for the same module (after attempting to use Microsoft Word and Excel, then giving up at the hopeless confusion they caused me and instead downloading and installing Gnumeric and Abiword which, whilst allowing me to actually get some bloody work done without flashing any fancy-yet-utterly-fucking-useless-because-I-don't-know-what-it-does bling at me, still took up about half an hour of my exam), but a spreadsheet is a horrible way to do any kind of programming, so my disgust at entering hard-coded functions was drowned out by the limited use of the actual tool.

Monday 27 April 2009

Experiment Dump

Over the past few years I've accrued a lot of failed, successful or otherwise abandoned experimental programs in my /home/chris/Files/Documents/Play/Programming directory (yes, I am somewhat OCD about file organisation). Since they might be interesting to some people, as they were obviously so interesting to me that I wanted to write them, I've started to upload them to FreeWebs. Here's a list, along with a brief description.

MaxInt.java - This is a simple test showing one of the reasons I hate Java. Java stores integer numbers in a fixed amount of space, 32 bits. 32 bits can be in one of 2^32 unique combinations, which Java divides down the middle. The middle combination, 10000000000000000000000000000000, is taken to be zero. The combination above, 10000000000000000000000000000001 represents 1, and so on up to 11111111111111111111111111111111, which is 2147483647. In the other direction, the combination below (01111111111111111111111111111111) represents -1, 01111111111111111111111111111110 represents -2 and so on until 00000000000000000000000000000000, which is -2147483648. The problem with this can be shown if you try to add 1 on to the biggest number, which gives 100000000000000000000000000000000, but since Java only allows numbers to be 32 bits long it only bothers looking at the last 32 bits, so it thinks that 2147483647 + 1 = -2147483648, which in my opinion is a fail. To add insult to injury, Java doesn't allow applications to compile unless they handle every possible exception they come into contact with, including those that will never be thrown or which aren't even used, yet Java is perfectly happy to let its own failures pass without comment, causing debug headaches.

To run this just compile it (for example with "javac MaxInt.java"), then run it (for example with "java MaxInt").

JavaGnucleon.tar.bz2 - This is a simple board game along the lines of Atoms on the Amiga. Players take it in turns to click on squares to "add an atom to them". Anyone can click on an empty square (indicated by a 0), but once a square has an atom in then it is owned by that player (and changes colour) and only that player can click on it from then on. Once a square gets as many atoms in it as it has nearest neighbours (not including diagonals) then it explodes, sending one atom to each neighbour and claiming them for the player. Chain reactions can occur if a square explodes and sends an atom to a neighbouring square, giving it enough to explode, and so on.

package-installer.tar.bz2 - This is a non-functional Java GUI for a package management tool I was working on a couple of years ago. It's similar to APTonCD, but as far as I know predates it a little.

AppPrefs.tar.bz2 - This is a non-functional Python/GTK GUI for choosing GNOME's default applications. It shows how the default browser could be chosen, giving a textual description of each one's unique features (ie. what makes it different) along with a screenshot. The idea is that users don't need to know or remember the names of the applications, they can read the description and look a the screenshot to see if it's the one they were looking for (these days Synaptic can show screenshots, which is awesome :D ). Also, I wanted to get rid of the IMHO broken idea of associating applications with filenames, such as files ending in ".mp3" and so on. File types should be determined using magic or libmagic, and users shouldn't have to care about the implementation. They should just be able to say "Music" or "Spreadsheet".

Bouncy.py - A very simple Python script which makes a square bounce around the screen based on some very dodgy Physics.

Some Nice Things

I've not posted for a while, due to a mixture of an increasing workload, the ability to let off a constant barrage of my thoughts to Identi.ca rather than build them up into a blog post, and my constant disdain for Web-based apps.

So what do I want to blog about? Nothing particularly structured, just some stuff that I find interesting. Keep in mind though, that my definition of interesting includes the fact that 12cm optical discs have increased their storage capacity by 2 orders of magnitude in the 27 years from the CD to the BluRay, whilst in the same time frame the capacity of 3 1/2" hard drives has gone up 12 orders of magnitude. (I'm writing an essay on Optical Data Storage for a Physics module :) )

For those of you who may remember Deluxe Paint on AGA capable Amigas I can heartily recommend that you check out Grafx2, which seems to work on pretty much every OS and has recently been added to Debian, so you can install it by ticking "grafx2" in any package manager, it will be downloaded and installed along with everything it depends on :) Doesn't seem to do animation yet, as far as I can tell, which is a shame.


Also recently added to Debian is Closed World Model, cwm. This is pretty special, since it takes cutting edge computer knowledge representation as used by the Semantic Web, and makes it accessible via a tool similar to UNIX's (and of course GNU's) classic sed tool. For example, you can use a command like "cwm --rdf inputfile1.rdf inputfile2.rdf --n3 inputfile3.n --rdf --think --pipe > output.rdf" to take at all of the knowledge from the RDF files inputfile1.rdf, inputfile2.rdf and inputfile3.n (in RDF-XML and Notation3 formats), comparing the knowledge they contain, and dumping all of the new knowledge it can infer into the RDF-XML file output.rdf. For example, inputfile1.rdf could contain statements that Chris Warburton is a student, Chris Warburton has a website http://www.freewebs.com/chriswarbo and that Chris Warburton has a brother David Warburton. inputfile2.rdf could say that Brothers are related and that Brothers share a Mother. inputfile3.n could say that David Warburton has a blog at http://fun-chips.blogspot.com and David Warburton has a mother Cheryl Warburton. cwd would then combine these and the output file would contain deductions such as David Warburton is related to a student, http://www.freewebs.com/chriswarbo is run by a student and Chris Warburton has a mother Cheryl Warburton.


This is pretty cool, since it commoditises the previously tricky area of RDF access, allowing it to be scripted, for example in the backend of Web sites, in the same way that Imagemagick has done to images (eg. for thumbnailing).


Pretty cool. Anyway, it's getting late so I should get some sleep now.


I'm going to post some of my programming experiments soon, so look out for them :)


Tuesday 24 March 2009

BBC News comments are broken

Tried to post on this http://news.bbc.co.uk/2/hi/uk_news/7955205.stm but it failed :( Didn't want to lose it though.


The harrowing trend to notice amongst government statements on these issues of technology, privacy and civil liberties is the focus on the meaningless technology arguments rather than the important freedom related ones.


To me the idea of "a profiling tool which examines a child's behaviour and social background to identify potential child offenders" makes a sickening mockery of the notions of innocent until proven guilty, freedom of speech and expression and equality. I don't care if it's encrypted or 'secure', or how much such a thing would cost, it simply shouldn't exist in the first place!


The same sidestepping of the main topic can be seen in most of these stories, even across the world. There was an article posted recently about Australia's Internet blacklist, and whether it is an offence to the human right to Free Speech. The conclusion was that such a blacklist might slow down the Internet, and wouldn't stop everything, which once again I don't much care about.


Technology is advancing ferociously, and will continue to do so. Making important decisions based on technological issues sets an unnerving precedent. In the Australia example, in a few years or decades time I'm sure Internet latencies will be so low that such a blacklist would be unnoticable. From the misdirected conclusions of that article then it should, since the technological issues raised will have been fixed.


In the case of these databases, if technological advances such as quantum entanglement cryptography fix the security concerns, and supercomputer-esque processing power and storage are available for pennies, does this mean that all such databases should be made? Of course it doesn't, yet that is the argument being put forth by the government.


I call to reject any spin-ridden arguments based on petty implementation details and keep the focus on where it matters, the reasons for and against even contemplating the possible existence of such systems.


Friday 6 February 2009

Learned Helplessness in Computing?

I know I should be revising, seeing that my Atomic and Laser Physics exam is mere hours away, but I ended up on another Wikipedia trek, and came across the article on learned helplessness. Reading through it, I found that I could make many connections with the currently depressing state of computing, and attributing it to the complexity and proprietaryness of software.

Learned helplessness is a much-studied psychological phenomenon, where a subject gives up trying to change their situation. An example cited consists of three groups of dogs; group A is a control group and are put into harnesses and left for the duration of the experiment; groups B and C are put into harnesses but are also given unpleasant electric shocks. Each group B dog has a lever in front of it which does nothing when activated, whereas each group C dog has a lever which turns off the shocks to that dog and one of the group B dogs. The dogs in group C learn that the lever turns off their shocks, and they use it whenever they start to get shocked. Group B dogs, however, learn that their lever does nothing, whilst their shocks seem to stop randomly (remember, each B dog is paired to a C dog's lever, so the B dogs don't know why their shocks stop).

After this stage of the experiment all of the dogs are put into part two. where they are unharnessed in a pen divided in two by a small partition. The half of the floor with a dog on is electrified, whilst the half without is normal. Dogs from groups A and C would hop over the partition, away from the electricity and thus away from the pain. They don't know that the other side's not electrified, but they have a go and find that it's not. The dogs from group B, however, just lie down on the electrified floor and whimper, as they are repeatedly electrocuted. They could hop over the partition, but don't bother trying. These dogs become depressed.

The conclusion of the experiment is that the sense of control is very important. Dogs in group B and group C got exactly the same shocks (since they were both controlled by group C's levers), but only group B got depressed. Essentially, they learned that nothing they did would stop the electricity, it just stopped randomly. They then applied this knowledge to the second situation and took the shocks, rather than trying the new possibility of jumping over the divide.

This can be seen in people, where some parents can end up neglecting their babies since they 'learn' that the child doesn't stop crying whether they give it attention or not, and thus ignore it, thinking they are helpless to stop its cries.

The psychological explanation for this is that the depressed subjects, in an attempt to rationalise the seemingly random lack of control, think of it as an inevitability ("Babies cry"), blame themselves ("I'm making it cry") and think of it as pervasive ("I'm a bad parent"). This learned helplessness digs a psychological hole which is notoriously difficult to break out of, and even causes feedback loops, for example a neglected child will cry more and have more problems than that of an attentive parent, thus reinforcing the "I'm a bad parent" and "I'm making it cry" beliefs. In fact, even knowledge of learned helplessness can make things worse, since it can act as a confirmation of the helplessness ("You've told yourself that you're helpless when you're actually not." "See? I TOLD you I was a bad parent!") and others can end up blaming the condition for things rather than the person ("It's not your fault that your baby's ill, you've learned to be helpless at looking after it." "Yes, you should probably take it away since I'm too learned-helpless to look after it.")

So, aside from knowing more being awesome, how does this apply to anything I'm interested in? Well I couldn't stop contrasting the explanations with computing. The dominant computing platform these days is Microsoft Windows which, although all software has bugs, seems to be full of them. A lot of these bugs are user interface related, where the required action to achieve the desired task is non-obvious, or a seemingly obvious action produces an unexpected result (which includes 'crashes', where a program disappears without the user telling it to). Although anyone more involved in software development would view these as bugs in the software which should be reported and fixed, frequently less technical users (which is the vast majority) view such things as inevitable ("Computers crash"), as their fault ("I made it crash") and pervasive ("I'm bad with computers"). Just look at the currently running adverts for the Which? PC Guide: A bunch of regular people saying how their computers keep messing up, and then an offer of a guide to show them how it's all their fault because they're doing it wrong.

Since I write software, I would say that the Which? PC Guide is a complete hack: It's fixing something in the wrong place. A broken piece of software should not be fixed by telling each and every user how to work around the broken bits, the software should be fixed so that nobody ever experiences those issues again. However, since it's proprietary software, nobody is allowed to fix it other than Microsoft (although there are numerous other hacks to work around the broken bits, some of which have created an entire industry, such as firewalls, anti-virus/spyware/adware programs, etc.).

The majority of computer users, however, do not think like me, since I am a group C dog: I know how to fix things. In fact, in human experiments into learned helplessness, it was found that people could concentrate more and solve problems more quickly in the presence of an annoying and distracting noise if they had a button which could turn it off, than those subjected to the noise without such a button, EVEN WHEN THE BUTTON WASN'T PRESSED. So on a Free Software system, where I know that it is possible for me to fix something if I truly wanted to, I don't get depressed, however on a proprietary system I frequently get annoyed, angry, irritated, etc. when the software behaves in undesirable ways.

For example, clicking a link that says "Download this program" in Idiot Exploiter 8 doesn't download the program, it just gives an subtle message under the toolbar that Internet Explorer has "protected" me by preventing the program from downloading and that I should click it to change that, and when clicked presents a menu with the option to download the program (how is this any different to the previous promise of a download?), which when clicked brings up a box asking if I want to save the program or run it, so I click run and when it's downloaded I get a warning saying that programs can do stuff to the computer, do I want to run it? I click run again (how is this any different to the previous promise of running the program?) and Windows pops up a message saying that the program is doing stuff, do I want to allow it to continue? I press continue an FINALLY get the the "first step" of the installer.

On Debian I could give a similar example that I can't get the Gdebi package installer to work, which means that I have to save packages and install them with the command "dpkg -i package_filename.deb", which can result in a broken setup if the newly installed package depends on other stuff, which means I need to install the stuff to fix it with "apt-get -f install" and press "y" to confirm it. This may seem annoying, but I know that if I wanted to fix it badly enough then I could, and would even be encouraged to do so (afterall, Gdebi works perfectly well in Ubuntu).

Whenever my wireless card messes up on Debian, on the other hand, I get incredibly frustrated and annoyed, and often need to walk away from my laptop and have a break, since I feel completely powerless over it. The wireless firmware I use is proprietary, since Broadcom don't tell anyone the language that their wifi chips speak (although clean-room reverse engineering of a Free Software replacement in Italy seems to be showing some promise), so even though I'm running completely Free Software applications on a Free Software kernel of Free Software drivers (in my case Linux), and can look at the code at any time to see what it's doing and possibly fix any problems, when it comes to my wireless card the disconnects are seemingly random, as I have no way of inspecting the firmware since it is proprietary. I therefore feel helpless to stop it disconnecting, and can't remedy the situation in any way other than disabling the Wifi, unloading the driver, reloading the driver, enabling the Wifi and trying to reconnect. If that doesn't work then all I can do is to try it again. In fact, I've even written a little script which does all of that whenever I run "restart-wireless". It's so bad that the developers of NetworkManager, the (currently) best network control system on Linux, do the same thing. If network manager's running and I get disconnected then I see the wireless network icon disappear and the wifi LED turn off. After a few seconds the Wifi LED comes back on, the Wifi icon comes back and it tries to connect. If it doesn't work then it happens again. It's depressing.

So there's one reason I think computing is in the sorry state that it is, people are being conditioned to think that "computers crash" (which conveniently keeps the cost of quality control down), that they aren't "using them properly" (which conveniently keeps the costs of having good designers down) and that they're destined to always be "clueless with everything computer-related" (which conveniently keeps people upgrading stuff they don't need to on the advice of the 'experts' selling the upgrades). This was caused by the proprietary software world, since with Free Software the volunteers, users, developers, companies and organisations which make and sell it actively encourage all users to "hop the partition" and 'scratch their own itches', since it results in less work and more Free software for those doing the encouraging. Whether it was intentional or not is debatable (never assign to malice that which can be explained by (in?)competance).

This unfortunately means that people like my Mum have some kind of internal off switch which is activated by the word "computer", so that when things like the broadband packages they are paying for are discussed with a sentence like "This one has a limit on how much we can do per month, so they'll charge more if we go over it, but this one doesn't" are met with a response such as "Well you know I don't understand these things" (which is the exact same sentence used for every attempt at explaining something, no matter how basic). It makes me *REALLY* frustrated when people don't bother to apply mentals skills which even five year olds possess, simply because they know computers are involved. Discuss the exact same thing with phone contracts, or even the price of meat per kilo, and they'll readily discuss the merits of each option, and even go into the small print, but with computers they've learned to be helpless, and thus think they have no control over anything related, and feel much more comfortable being extorted by monthly bills twice the size of what they could have, where the value calculations are worked out by someone else, than they do with having to confront some computer-related thinking for a few minutes.

Another big cause of computer-helplessness is a genuine problem with computing today, Free Software or not. Empirical evidence does say that just the presence of control, whether or not it is used, is the important bit (like my access to the source code for everything I use), but it's still a chore to actually make use of that control.

As an example, a few years ago the Nautilus file manager changed so that icons got bounding boxes. This meant that before the change if I clicked in a transparent corner of a circular icon then nothing would get selected, but after the change the circular icon would be selected because I'd clicked within the bounding box. This is a good thing usability-wise, but I was rather annoyed with the way it interfered with my specific setup. I had cut out images and assigned them as icons to the various folders in my Home folder, stretched them rather large and arranged them manually so that they filled the Nautilus window without overlapping, so that clicking the visible parts would select the icon, whilst clicking on a transparent part would 'fall through' and select one visible below. I was very proud of this, and it had taken quite a while to do. Then, after an update, all of the icons got bounding boxes and thus clicks in transparent areas no longer fell through, making selecting and double-clicking things unusable. I had to make all of the icons small again, and arrange them in a grid, destroying the previous awesomeness. I took it upon myself a few months ago to bring back the no-bounding-box Nautilus as a well-buried option, so I got the source code to the most recent version of Nautilus and looked through the version control history to find out when the change was made which added the bounding boxes (I think this is where it changed http://svn.gnome.org/viewvc/nautilus?view=revision&revision=9123 ) and replaced that section in the latest code with the old code, and it worked. However, this took a few days, since I've done very little C programming and never used GObject with C before, and I didn't even have to write any code (it was just copypasta). If I want to fix every bug I find it would take an intractable amount of time, even though I can fix any bug I want to individually.

There looks to be some promising stuff going on to rectify this at the Viewpoints Research Institute, an organisation funded by the US government with awesome Computer Scientists like Alan Kay. One of their aims is to "reinvent" computing, which basically involves making (yet another) computer system, but one which is as understandable (and hence small) as possible. They're aiming for a complete, working system in under 20,000 lines of code (for comparison, Windows has around 40,000,000), and have got some nice tools like their "Combined Object Lambda Architecture" programming system, which aims to be written in itself and be able to compile down as far as FPGAs (ie. rewiring the microchips themselves to represent the program), and OMeta which allows very compact and easy to understand programming language implementations (for example they have an almost-complete (missing "try"/"catch" and "with") Javascript interpreter which is only 177 lines of code), which, like their COLA, is written in itself. This allows COLA-based implementations of other languages to make their system, with new languages so easy to define that each part can be made in a tailor-made language, even being redefined in places where it makes things more comprehensible.

Hopefully having more understandable and approachable code will mean it is easier to find and fix bugs, so that nobody has to experience them for long. It might also help to reduce the number of people who teach themselves to be helpless at computing, although as for the ones who are already learned-helpless it will take a lot of effort on their part to break out of it, which won't be helped by proprietary companies trying to dress up their shit code as some kind of magical snake oil which cannot be obtained from anywhere else, or be written by mere mortals (which GNU set out to disprove with UNIX and has done a pretty fine job), and the media displaying binary all over the place whenever the internals of computers is mentioned.

OK, I think I should carry on revising now, as I've gone off on a bit of a rant, but damn it my blog's still not boring or crap! :P

Friday 30 January 2009

Retarded = backwards

It seems that our literally retarded government is mulling over the idea of a "broadband tax" of 20 quid per year for everyone with broadband (which is actually everyone in the country, since they also want universal broadband access) which will be given to the "music industry" and the "film industry".

With this in mind, perhaps now is the time we can finally recover our failing Ice House economy with Ice House taxes on all fridges? Our TV channels and film studios can receive tax on VHS tapes and DVDs? Our dwindling equine propulsion industry could benefit from a tax on all internal combustion engines, electric motors, petrol and diesel. Our whale oil industry could benefit from a tax on light bulbs, flourescent tubes and LEDs. The typewriter industry could have a tax on all computer sales, as can gramophone makers. Whilst we're at it, why not charge for Wikipedia and give all of the proceeds to the desperate Encyclopedia Britannica?

In fact, to encourage these ideas I propose that everyone who uses a fridge, a video or DVD player, any form of engine-powered travel, electric lighting or computer should be labeled as a criminal, since it is completely within my rights to make such accusations when there is no legal basis at all for it. Actually, criminal is too light a word, afterall some crimes are legitimate in certain circumstances when they're the lesser of two evils. The label I propose therefore must be something that's never legitimate, so that juries can have their minds made up for them beforehand rather than having to go through that tedious business of deciding guilt (which, after all that effort, might not even get me the result I want!). They should be called some kind of word which implies rape, murder and that sort of thing... how about serial killer? Yeah, that works. OK, now on with the spreading of my message with twisted logic, brainwashing and of course my all time favourite, the outright lie.

Here's one to get started with:

"You wouldn't strangle a toddler

You wouldn't stab a pregnant woman in the womb

Refridgeration is murder

Murder is a crime

Don't let the serial killers get away with it

Copyright the Respected Icehouse Association of America"

Or how about a more subtle brainwashing?

"He's the, kind of man that makes edits in-place,

He kills post men and shits on their face,

He wanks in the sandwiches you leave in the icebox,

He makes documents using a word processor, what a fucking cock,

He's a, Nerdy Nigel, a Nerdy Nigel

Nerdy Nigel word processes his documents

(Copyright the British Typewritographic Institute)"

Or, perhaps, we should actually EMBRACE technological innovation? ESPECIALLY innovation which looks set to destroy the crumbling monopolies of the 20th century's "music industry" (ie. the few rich sods who decide that you don't want to listen to the vast majority of bands and thus give millions to Britney Spears whilst decent acts end up working in McDonalds) and "film industry" (ie. the few rich sods who decide that potentially good, inventive ideas are too risky to back compared to more sequels of the same old shit). Giving them such a "tax" is not only COMPLETELY disruptive to the economy, but offers NEGATIVE incentive for them to do stuff, since their income could come straight from the tax without wasting any money doing any of that 'making stuff' kerfuffle.

Labour need a firm kick to the teeth for all of the bullshit they're shovelling over us. They're as conservative as the Conservatives, leaving the Liberal Democrats as the only viable way out (and that's still rather tenuous). The problem with the Lib Dems is that they don't seem to have any morals either, not in a 'Fuck you, I'm in charge' Labour way, but in an 'OK as long as you vote for us' way. I think forming policies specifically to cater to a minority so that they'll vote for you, regardless of how it affects the majority, isn't a particularly good thing (here I'm not using minority in an ethnic sense, but for instance their stance for legalising cannabis. Whilst this is obviously backed wholeheartedly by that minority who abuse cannabis, it's impact on the majority of the population is far from clear cut).

On a side note, I really really really wish that Java dies a quick death. It's a fucking terrible language, restrictive, verbose, full of boilerplate bollocks, full of glaring errors which, for some ungodly reason, are standard, making every Java environment broken (either in the sense that they're nonstandard but work properly, or they follow the standard but are full of fucking retarded shit like 2147483647 + 1 = -2147483648 (which *IS* an error, as anyone over the age of about 5 can tell you)). Object oriented my arse. What the hell are these "base types" then? What about methods? What about classes? Fuck off Java.

The End

Sunday 25 January 2009

PubSubClient back in development

I've been messing around with my pubsubclient library again, and the issue with publishing now appears to be fixed (either thanks to me or thanks to the ejabberd team if it was a bug in the server).

Anyway, it's working again, and it's now an opportunity to get more reply handlers and documentation written. Since I'm currently in the middle of my exam period this obviously should not be a full-steam-ahead effort, but I'm going to keep chipping away at the TODO list (which is now formalised into a not-yet-up-to-date SPEC-COMPLIANCE file). It's now just a case of putting in the work, since it's not hard thought involved, it's just extracting data from XML and putting it into sensible, pure Python representations.

The website is still around at http://pubsubclient.sourceforge.net/ and the code is still on GitHub at http://github.com/Warbo/pubsubclient/tree/master

Enjoy :)