Showing posts with label rant. Show all posts
Showing posts with label rant. Show all posts

Tuesday, 11 August 2009

A Decent Compressed Filesystem At Last?

I have a 2.4GHz CPU, a 250GB hard drive and I want to store 29GB of zipped music I just downloaded. The music is in a variety of ProTracker formats, so what's the best way to store it?

I know for a fact that tracker songs are very easy to compress, since the format originates on computer systems like the Amiga where any compression more sophisticated than Run Length Encoding takes away valuable clock cycles which are often needed elsewhere. Such uncompressed, or crudely compressed, data is ripe for extra compression (hence why it is transported in a zip archive).

Actually playing this music is now nontrivial though, since whilst all of my music players support protracker, not all of them support reading them from zip files. The obvious thing to do is decompress them, but since trackers are highly compressible, this means that the 29GB archive will certainly become even more vast when decompressed, and that's just a waste of storage space. The solution is to use a compressed filesystem.

For those who are stuck on shit operating systems, I'll give you a little insight into how your computer works. Let's say we're running Microsoft Windows, and we insert a CD into our CDROM drive. In "My Computer" we may see our CDROM is called "D", and if we double click on it we see the things stored on it. However, we then go back to My Computer and right click on D, then tell it to eject. We take out the CD and close the drive. Now we tell it to eject again, and the drive opens. So, what is "D"? It's our CD, since we saw the files on it, but it's also the drive, since we can tell it to do things when our CD is nowhere in sight. Which is it really? Well, that depends completely upon what you want to do, since Windows is a very poor system indeed.

Let's imagine a similar thing on a UNIX system like Debian. Our CDROM will be something like /dev/hda. We can send it commands, for example "eject /dev/hda" (or do the same thing in whichever graphical environment you happen to use). Can we access the files on a disc through /dev/hda? No. What we can do, however, is take the data given to use by the drive, and reconstruct it somewhere. This is what we mean by mounting the filesystem. The filesystem is the particular way the ones and zeroes represent our files and folders (CDs use the standard ISO9660, which is often extended with certain proprietary formats from Microsoft to allow longer filenames. Ironic, considering that the original format only had short 8.3 filenames to make sure it would work on Microsoft's POS crapware), whilst mounting it means making it available to peruse. Here we can choose where we want to see things, so we can run a command like "mount /dev/hda /our_cd". Now if we go to the folder /our_cd the previous contents will be hidden (until the CD is unmounted) and the contents of the disk is accessible. /our_cd is a different file to /dev/hda. The same is true of hard drives.

In fact, filesystems can be anything. The FUSE (Filesystems in USErspace) driver allows regular programs to be written which can be accessed like filesystems. This is how the Wikipedia filesystem works (mount it somewhere and that folder becomes filled with files for each article), how the GMail filesystem works (data is stored online as a series of emails in a GMail account, which can be retrieved from anywhere and are automatically converted back to their original state), and many others. There are several FUSE filesystems which transparently compress and decompress their contents, ie. everything copied into the filesystem is sent to a compression program and saved somewhere, and everything read from the filesystem is sent through a decompression program before it reaches the destination. Thus any program, as long as it can load files, can load compressed files, and any program, as long as it can save files, can save compressed files.

Seems cool, but until recently the only ones I'd tested were pretty dire, most noticably CompFUSEd. INCREDIBLY slow and memory hungry, it was not worth using, and that was only when a few MB were put into it.

However, FuseCompress has recently been added to Debian, and I'm trying it out for these tracker modules. Whilst populating the filesystem is taking a while (I'm having to decompress all 122 thousand songs (since that's how we want them to be read and therefore written), then move them into the filesystem where they're recompressed (although using LZO this time, which is damned fast). All I can say is thank Guido for Python, since it makes automating such things a breeze :)

I thought I'd finish with a little introduction to compression, and what it actually is. Compression is completely based on Claude Shannon's Information Theory (also the basis of using switches to represent Boolean logic, ie. allowing a physical way of building Alan Turing's theoretical "computers"). Information, measured in bits, is irreducible, you can't throw away any bits without losing some information (compression which does this is called "lossy" compression, for example the Vorbis audio codec. However, the algorithms are crafted in such a way that information is only thrown away when it is imperceptible to us, in the case of Vorbis we can't hear the difference), but the key thing to know is that each bit of information is not necessarily mapped one-to-one with each bit of whatever material is being used to store it (eg. magnetic domains on a hard drive). A bit of information is required to describe a situation where there is a 50/50 chance of the next bit being 0 or 1, but this is only true for random sequences, or those which appear random. A sequence like "AAAAAAAAAAAAAAAAAAAA" is not random (each bit isn't independent of the previous), and thus does not require all of the 160 bits that are being used to describe it. Information theory gives us a lower limit, saying how many bits are REQUIRED to store the given information, so we want our algorithms to approach this limit as much as possible.

A really simple type of compression is Run Length Encoding. This looks for repetition and replaces it with multiplication. For example, the sequence "ABCBAAAAAAAAABBCCCCCCCCDBDDDDB" could be compressed to "ABCB9ABB8CDB4DB". Our algorithm here is simply "if you find more than 2 of the same letter in a group then replace that group with its size followed by the letter", so a stream of "AAAAAAAAA" becomes "9A". To decompress this we use the algorithm "If you find a number, put that many of the next letter".

There is a slightly more general form of this, where instead of grouping similar things, we use a pointer. The pointer is a number which means "I am the same as whatever is this far behind me". In this way "AAAAAAAAA" can become "A12345678", where each number is telling us to get the next value from this far back (they all point back to the first A), however pointers can also point to pointers, so we could just as easily put "A11111111", since each of those "1"s becomes an "A", so it's perfectly valid for the next one along to point to it. This is easily compressible with the runlength encoding seen above (which is just a subset of this form of compression), but is more powerful. For example "ABABABABABABABABABABABA" cannot be compressed with runlength encoding, but using pointers we can compress it to "AB222222222222222222222". Now this is easily compressible. This works for any size of pattern, although since it is a search operation it can get quite slow for large search spaces, so files are usually split into more manageable chunks first.

A final form of compression I'd like to mention involves binary trees. Let's say we made a survey of the occurances of every letter in a file. We could say, for example, that "e" was the most common, followed by "a", followed by "s" and so on. Now we can compress these from their usual 8 bits to a much more compact form. First we define a binary tree, that is a tree where every non-leaf node (branch junction) has a left and a right branch. If we meet a "0" we will go down the left branch and if we meet a "1" we will go down the right branch. Our tree will begin with two branches, and we can stick "e" at the end of the right branch. On the left branch we put another node with two children, on the right we put "a" and on the left we put another node with two children. On the right of this we put "s" and on the left another node with two children, and so on. Now we can replace each letter by the path we must take in our tree to reach it (with our tree we know that "1" marks the end of a letter). Every time we find an "e" we simply put a "1", since that's how we get to "e" from the top of the tree (1 means go right) This saves 7/8 of the space every time. Every time we find an "a" we replace it with "01", which saves us 3/4 of the space, an "s" with "001", and so on. By constructing optimised trees (which is once again a search operation) we can get really good compression ratios.

Anyway, rant over since my script has finished moving all of the "A"s :D

Friday, 6 February 2009

Learned Helplessness in Computing?

I know I should be revising, seeing that my Atomic and Laser Physics exam is mere hours away, but I ended up on another Wikipedia trek, and came across the article on learned helplessness. Reading through it, I found that I could make many connections with the currently depressing state of computing, and attributing it to the complexity and proprietaryness of software.

Learned helplessness is a much-studied psychological phenomenon, where a subject gives up trying to change their situation. An example cited consists of three groups of dogs; group A is a control group and are put into harnesses and left for the duration of the experiment; groups B and C are put into harnesses but are also given unpleasant electric shocks. Each group B dog has a lever in front of it which does nothing when activated, whereas each group C dog has a lever which turns off the shocks to that dog and one of the group B dogs. The dogs in group C learn that the lever turns off their shocks, and they use it whenever they start to get shocked. Group B dogs, however, learn that their lever does nothing, whilst their shocks seem to stop randomly (remember, each B dog is paired to a C dog's lever, so the B dogs don't know why their shocks stop).

After this stage of the experiment all of the dogs are put into part two. where they are unharnessed in a pen divided in two by a small partition. The half of the floor with a dog on is electrified, whilst the half without is normal. Dogs from groups A and C would hop over the partition, away from the electricity and thus away from the pain. They don't know that the other side's not electrified, but they have a go and find that it's not. The dogs from group B, however, just lie down on the electrified floor and whimper, as they are repeatedly electrocuted. They could hop over the partition, but don't bother trying. These dogs become depressed.

The conclusion of the experiment is that the sense of control is very important. Dogs in group B and group C got exactly the same shocks (since they were both controlled by group C's levers), but only group B got depressed. Essentially, they learned that nothing they did would stop the electricity, it just stopped randomly. They then applied this knowledge to the second situation and took the shocks, rather than trying the new possibility of jumping over the divide.

This can be seen in people, where some parents can end up neglecting their babies since they 'learn' that the child doesn't stop crying whether they give it attention or not, and thus ignore it, thinking they are helpless to stop its cries.

The psychological explanation for this is that the depressed subjects, in an attempt to rationalise the seemingly random lack of control, think of it as an inevitability ("Babies cry"), blame themselves ("I'm making it cry") and think of it as pervasive ("I'm a bad parent"). This learned helplessness digs a psychological hole which is notoriously difficult to break out of, and even causes feedback loops, for example a neglected child will cry more and have more problems than that of an attentive parent, thus reinforcing the "I'm a bad parent" and "I'm making it cry" beliefs. In fact, even knowledge of learned helplessness can make things worse, since it can act as a confirmation of the helplessness ("You've told yourself that you're helpless when you're actually not." "See? I TOLD you I was a bad parent!") and others can end up blaming the condition for things rather than the person ("It's not your fault that your baby's ill, you've learned to be helpless at looking after it." "Yes, you should probably take it away since I'm too learned-helpless to look after it.")

So, aside from knowing more being awesome, how does this apply to anything I'm interested in? Well I couldn't stop contrasting the explanations with computing. The dominant computing platform these days is Microsoft Windows which, although all software has bugs, seems to be full of them. A lot of these bugs are user interface related, where the required action to achieve the desired task is non-obvious, or a seemingly obvious action produces an unexpected result (which includes 'crashes', where a program disappears without the user telling it to). Although anyone more involved in software development would view these as bugs in the software which should be reported and fixed, frequently less technical users (which is the vast majority) view such things as inevitable ("Computers crash"), as their fault ("I made it crash") and pervasive ("I'm bad with computers"). Just look at the currently running adverts for the Which? PC Guide: A bunch of regular people saying how their computers keep messing up, and then an offer of a guide to show them how it's all their fault because they're doing it wrong.

Since I write software, I would say that the Which? PC Guide is a complete hack: It's fixing something in the wrong place. A broken piece of software should not be fixed by telling each and every user how to work around the broken bits, the software should be fixed so that nobody ever experiences those issues again. However, since it's proprietary software, nobody is allowed to fix it other than Microsoft (although there are numerous other hacks to work around the broken bits, some of which have created an entire industry, such as firewalls, anti-virus/spyware/adware programs, etc.).

The majority of computer users, however, do not think like me, since I am a group C dog: I know how to fix things. In fact, in human experiments into learned helplessness, it was found that people could concentrate more and solve problems more quickly in the presence of an annoying and distracting noise if they had a button which could turn it off, than those subjected to the noise without such a button, EVEN WHEN THE BUTTON WASN'T PRESSED. So on a Free Software system, where I know that it is possible for me to fix something if I truly wanted to, I don't get depressed, however on a proprietary system I frequently get annoyed, angry, irritated, etc. when the software behaves in undesirable ways.

For example, clicking a link that says "Download this program" in Idiot Exploiter 8 doesn't download the program, it just gives an subtle message under the toolbar that Internet Explorer has "protected" me by preventing the program from downloading and that I should click it to change that, and when clicked presents a menu with the option to download the program (how is this any different to the previous promise of a download?), which when clicked brings up a box asking if I want to save the program or run it, so I click run and when it's downloaded I get a warning saying that programs can do stuff to the computer, do I want to run it? I click run again (how is this any different to the previous promise of running the program?) and Windows pops up a message saying that the program is doing stuff, do I want to allow it to continue? I press continue an FINALLY get the the "first step" of the installer.

On Debian I could give a similar example that I can't get the Gdebi package installer to work, which means that I have to save packages and install them with the command "dpkg -i package_filename.deb", which can result in a broken setup if the newly installed package depends on other stuff, which means I need to install the stuff to fix it with "apt-get -f install" and press "y" to confirm it. This may seem annoying, but I know that if I wanted to fix it badly enough then I could, and would even be encouraged to do so (afterall, Gdebi works perfectly well in Ubuntu).

Whenever my wireless card messes up on Debian, on the other hand, I get incredibly frustrated and annoyed, and often need to walk away from my laptop and have a break, since I feel completely powerless over it. The wireless firmware I use is proprietary, since Broadcom don't tell anyone the language that their wifi chips speak (although clean-room reverse engineering of a Free Software replacement in Italy seems to be showing some promise), so even though I'm running completely Free Software applications on a Free Software kernel of Free Software drivers (in my case Linux), and can look at the code at any time to see what it's doing and possibly fix any problems, when it comes to my wireless card the disconnects are seemingly random, as I have no way of inspecting the firmware since it is proprietary. I therefore feel helpless to stop it disconnecting, and can't remedy the situation in any way other than disabling the Wifi, unloading the driver, reloading the driver, enabling the Wifi and trying to reconnect. If that doesn't work then all I can do is to try it again. In fact, I've even written a little script which does all of that whenever I run "restart-wireless". It's so bad that the developers of NetworkManager, the (currently) best network control system on Linux, do the same thing. If network manager's running and I get disconnected then I see the wireless network icon disappear and the wifi LED turn off. After a few seconds the Wifi LED comes back on, the Wifi icon comes back and it tries to connect. If it doesn't work then it happens again. It's depressing.

So there's one reason I think computing is in the sorry state that it is, people are being conditioned to think that "computers crash" (which conveniently keeps the cost of quality control down), that they aren't "using them properly" (which conveniently keeps the costs of having good designers down) and that they're destined to always be "clueless with everything computer-related" (which conveniently keeps people upgrading stuff they don't need to on the advice of the 'experts' selling the upgrades). This was caused by the proprietary software world, since with Free Software the volunteers, users, developers, companies and organisations which make and sell it actively encourage all users to "hop the partition" and 'scratch their own itches', since it results in less work and more Free software for those doing the encouraging. Whether it was intentional or not is debatable (never assign to malice that which can be explained by (in?)competance).

This unfortunately means that people like my Mum have some kind of internal off switch which is activated by the word "computer", so that when things like the broadband packages they are paying for are discussed with a sentence like "This one has a limit on how much we can do per month, so they'll charge more if we go over it, but this one doesn't" are met with a response such as "Well you know I don't understand these things" (which is the exact same sentence used for every attempt at explaining something, no matter how basic). It makes me *REALLY* frustrated when people don't bother to apply mentals skills which even five year olds possess, simply because they know computers are involved. Discuss the exact same thing with phone contracts, or even the price of meat per kilo, and they'll readily discuss the merits of each option, and even go into the small print, but with computers they've learned to be helpless, and thus think they have no control over anything related, and feel much more comfortable being extorted by monthly bills twice the size of what they could have, where the value calculations are worked out by someone else, than they do with having to confront some computer-related thinking for a few minutes.

Another big cause of computer-helplessness is a genuine problem with computing today, Free Software or not. Empirical evidence does say that just the presence of control, whether or not it is used, is the important bit (like my access to the source code for everything I use), but it's still a chore to actually make use of that control.

As an example, a few years ago the Nautilus file manager changed so that icons got bounding boxes. This meant that before the change if I clicked in a transparent corner of a circular icon then nothing would get selected, but after the change the circular icon would be selected because I'd clicked within the bounding box. This is a good thing usability-wise, but I was rather annoyed with the way it interfered with my specific setup. I had cut out images and assigned them as icons to the various folders in my Home folder, stretched them rather large and arranged them manually so that they filled the Nautilus window without overlapping, so that clicking the visible parts would select the icon, whilst clicking on a transparent part would 'fall through' and select one visible below. I was very proud of this, and it had taken quite a while to do. Then, after an update, all of the icons got bounding boxes and thus clicks in transparent areas no longer fell through, making selecting and double-clicking things unusable. I had to make all of the icons small again, and arrange them in a grid, destroying the previous awesomeness. I took it upon myself a few months ago to bring back the no-bounding-box Nautilus as a well-buried option, so I got the source code to the most recent version of Nautilus and looked through the version control history to find out when the change was made which added the bounding boxes (I think this is where it changed http://svn.gnome.org/viewvc/nautilus?view=revision&revision=9123 ) and replaced that section in the latest code with the old code, and it worked. However, this took a few days, since I've done very little C programming and never used GObject with C before, and I didn't even have to write any code (it was just copypasta). If I want to fix every bug I find it would take an intractable amount of time, even though I can fix any bug I want to individually.

There looks to be some promising stuff going on to rectify this at the Viewpoints Research Institute, an organisation funded by the US government with awesome Computer Scientists like Alan Kay. One of their aims is to "reinvent" computing, which basically involves making (yet another) computer system, but one which is as understandable (and hence small) as possible. They're aiming for a complete, working system in under 20,000 lines of code (for comparison, Windows has around 40,000,000), and have got some nice tools like their "Combined Object Lambda Architecture" programming system, which aims to be written in itself and be able to compile down as far as FPGAs (ie. rewiring the microchips themselves to represent the program), and OMeta which allows very compact and easy to understand programming language implementations (for example they have an almost-complete (missing "try"/"catch" and "with") Javascript interpreter which is only 177 lines of code), which, like their COLA, is written in itself. This allows COLA-based implementations of other languages to make their system, with new languages so easy to define that each part can be made in a tailor-made language, even being redefined in places where it makes things more comprehensible.

Hopefully having more understandable and approachable code will mean it is easier to find and fix bugs, so that nobody has to experience them for long. It might also help to reduce the number of people who teach themselves to be helpless at computing, although as for the ones who are already learned-helpless it will take a lot of effort on their part to break out of it, which won't be helped by proprietary companies trying to dress up their shit code as some kind of magical snake oil which cannot be obtained from anywhere else, or be written by mere mortals (which GNU set out to disprove with UNIX and has done a pretty fine job), and the media displaying binary all over the place whenever the internals of computers is mentioned.

OK, I think I should carry on revising now, as I've gone off on a bit of a rant, but damn it my blog's still not boring or crap! :P

Wednesday, 12 November 2008

The Importance of Transparency

I know some people who read this can't be arsed with the technical posts, but I do use this blog to tell the world, including my friends, what I'm up to, so please bear with me :)

People > Data > Code > Hardware

That sequence represents two things. Firstly, if those are treated as arrows, it shows how computer programs are generally used. A person inputs some data and the code does something with the data by running on the hardware. Another way of looking at it is as an inequality. People are more than data, data are more than code and code is more than hardware.

Hardware is a lump of plastic, silicon, germanium, steel, etc. It only exists to run code, therefore code > hardware.

Code only exists to manipulate data, whether those data are numbers in a calculation, images to be displayed, music to be played, messages to be sent, etc. Therefore data > code.

Data is only kept around because it is of use to people. Despite our best efforts, hardware and software cannot appreciate the humour of a LOLCAT image. Therefore people > data.

This relationship can be seen in many areas. If I have the most awesome server ever, nobody gives a crap if my Web site is crap. Google's search engine started life on incredibly underpowered, unreliable hardware, but nobody noticed because the code was redundant and reliable. Mugshot.org may be coded better than Myspace.com, but nobody uses Mugshot and there are far more data in Myspace.

Hardware doesn't matter, so as much as possible should be as cross-platform as possible. I have Linux running on my desktops, my laptops and my 'phone. Windows will only run on x86 and x86-64 machines, which means no phones, no PDAs and very few embedded devices like set-top boxes and games consoles. If code is cross-platform then users don't need to give a shit about hardware, which makes life a hell of a lot easier.

Code doesn't matter as much as data, so as much as possible should be in standardised, implementable, documented formats. The spreadsheets I write in OpenOffice.org also work fine in Gnumeric and have live copies saved on Google Spreadsheets. Spreadsheets made in Microsoft Office 2007 can only be opened in Microsoft Office 2007, since everyone else's attempts at compatibility are flawed. If data is openly standardised then users don't need to give a shit about software, which makes life a hell of a lot easier.

This just leaves people and data, which are the only things that are important (code and hardware are just tools used by people to manipulate data).

Using the examples above, I can save a spreadsheet on my desktop and access it from anywhere in the world via Google Spreadsheets in the browser on my phone. The proprietary alternative is to only be able to use Microsoft Office 2007, which requires Microsoft Windows, which requires x86/64 hardware. A very cosy position to be in for Microsoft, but for the vast majority of the world who are not Microsoft employees, why give up so much? This isn't just a feature argument either, since Microsoft could make a browser-based spreadsheet system. The argument is WHY DO I HAVE TO WAIT FOR MICROSOFT? If you hand someone the keys to your data, you should expect to be taken for a very long ride, at the end of which you might not even have that data any more.

The same goes for Facebook and other proprietary applications. (Free Software doesn't always use standard formats, but the formats are at least documented to some small degree in the code. Proprietary apps give you no code.)

END COMMUNICATION

Sunday, 7 September 2008

Services, integration, communication and standards

These days it is possible to do pretty much anything over the Internet. There's eBay, online banking, PayPal, Flickr, OpenStreetMap, OpenDesktop, Email, chat, forums, Wikis, Facebook, scrobbling, blogs, etc. The big problem, in my mind, is that of technological walled gardens.

A walled garden is an area which cannot be accessed without permission, usually by registering some form of account. Facebook is a classic example of a walled garden. Walls are useful, since having everyone's bank accounts available without restriction would be a problem to say the least. A technological walled garden would be an enclosed, restricted area which can only be accessed via certain technological means. Technological walled gardens are often simpler to implement than open systems, but often the reason the garden operator does this is because they see this as a way to run a dot-com or Web-2.0 business.

Let's take an example, Yahoo! Mail, Windows Live Mail and Gmail, which are all walled gardens in the classical sense, an account is needed and the login details must be provided in order to access anything. The first two, however, are also technological walled gardens: whilst mechanisms to send, retrieve, check and manage email have been around for decades, from "get my mail" (POP3) and "send my mail" (SMTP) to more sophisticated "synchronise this lot of email with that lot" (IMAP) and are well defined, standardised, understood and implemented, in order to access Yahoo! Mail or Windows Live Mail you still need to log in via their website because they don't use any of these standards. Gmail supports them, which is how I can use Evolution and Kmail to manage my Gmail account. Yahoo and Microsoft specifically disable them (I know Yahoo used to allow POP3 and SMTP access, when they stopped I moved away from them) with the reasoning that Evolution and Kmail don't display adverts, whereas their websites do. Here interoperability and standardisation desired by customers (if it wasn't used then there's no point disabling it, since nobody would be unexposed to adverts and the POP/SMTP/IMAP server load would be zero) is sacrificed in order to force adverts onto users who don't want them. This of course doesn't even touch upon the flexibility of using an email client (screen readers and other accessibility for the disabled, offline access, complete choice of interface (including Web), etc.).

That is the major reason why I refuse to join Facebook, MySpace, etc. I cannot buy or download Facebook's software and run it on my own machine,and even if I managed to write my own there would be no way to make it talk to Facebook's own servers. Since the entire point of Facebook is the stuff in their database, this would make my software useless. Hence Facebook have created a technological walled garden: If I joined Facebook then I would be sending any data I entered into a blackhole as far as accessing it on my terms is concerned.

Last.fm is better, since although their server/database software is a trade secret (as far as I can tell), the Audio Scrobbler system they use to gather data is completely documented and has many implementations, many of which are Free Software (including the official player). The contents of their database is also available, and doesn't even require an account (I have tried to think of ways to submit similar artists without an account, such as submitting two tracks at a time and building the data from that, but I suppose spam would be too much of a problem). Only the artist recommendations/similarity, tags and thingsare available, but that's the entire reason I use it, fuck the site with all of its Flash, poor social networking, confusing messaging system and stuff, that's all fluff around the periphery of the useful information. Essentially last.fm is like Gmail: I can't get the code which runs it, but I can get my data in and out in standardised ways which can be implemented with Free Software. I could make my own server which synchronises with with their database via the available protocols, and thus get out everything that I put in.

Now, the issue of synchronisation is interesting. How do you keep two things synchronised? There are a few different approaches and each has its place:

Known, unchanging, unmoving data

Here HTML can be used, ie. a web page. This is fine for people, and for applications needing that data it can simply be copied into the application once. An example would be an "about" page.

Unknown, unchanging, unmoving data

Here HTML can still be used, but since the data is not know beforehand it can be hard for an application to get anything useful from it. RDFa can be used inside the HTML to label each piece of information, thus an application only needs to be told what to find and it will look through the labels until it does, regardless of page structure, layout, etc. An example would be a scientific paper.

Changing data which is accessed once at a time

Here RSS or ATOM can be used. This allows changes to be specified in the file. ATOM is a standard, but RSS is a dialect of RDF which means labelled data is possible. An example would be a changelog.

Changing data which is accessed on every change

Here XMPP PubSub can be used. This means that there is no checking for updates since the source will push any changes out to subscribers when they are made. This doesn't use a file, it uses a stream. This is what my library is designed to accomplish. An example would be a blog.

Two-way communication and instruction execution

Here systems such as JOLIE can be used, overlaying protocols like SOAP. This can be used for dynamically generated data like database queries and searches, as well as for sending instructions such as "Approve Payment". An example would be a shop.

Notice that the first technology, HTML, is the only one which needs a Web browser to access. RDFa, ATOM and RSS are all structured in a way that allows applications to handle them directly, no human is needed and thus many more possibilities are available. XMPP can also be structured, since RDF, ATOM and RSS can all be sent over XMPP, allowing machines to handle the data, but doing so in an as-needed basis which makes things more scalable. JOLIE covers a range of protocols which are all inerently machine-focused, they are executing instructions. This might be a "buy this laptop" instruction when a button is pressed, or a "search for this" instruction when a query is entered.

These technologies allow data and communication to break free of implementation and visualisation. The next Facebook doesn't have to be a centralised Web site, it can be a distributed system with many servers run by different people interacting with each other to allow a scalable, unowned network, like the Web but made of pure information without the overhead of layout, styles, widgets or interfaces. This is called the Semantic Web. All of the visualisation, interface and style can be swappable and implemented where it is wanted, for instance as desktop applications on user's machines, or as Web sites running on various servers. There is no reason why, in an interconnected world, I should have to visit Facebook.com in order to interact with Facebook.

Except, of course, that Facebook wants to force unwanted adverts onto their customers.

Friday, 5 September 2008

Efficient vs Approachable (choose your phrase carefully)

I am very interested in human computer interaction (HCI), and I like to follow the trends and experiment with new innovations, plus I try wherever possible to commend developers who make different and interesting things, regardless of whether they are any good or not, since experimentation is key to finding better ways of doing things than we have now. Afterall, the same processes which turned slime into us are responsible for birth defects and fatal genetic diseases, you can't have the good without the bad.

I find a big problem with the phrase "easy to use", since it covers many varied situations. "Easy to use" and "hard to use" can both describe the same thing at the same time, since the words used are so vague; "use" covers a lot of ground, as do "easy" and "hard". Therefore I try where possible to use more specific words. The most common ones I use are efficient/inefficient and approachable/confusing. Why are those more descriptive? Here's an example: Compare the Vi text editor to the Leafpad text editor.

Vi is easy to use since its keyboard commands mean you never have to take your hands off the keyboard. All formatting, editing and control functions can be accessed the same way, via a few button presses. This could also be called efficient.

Vi is hard to use since it doesn't have any buttons and nothing is labelled. The whole interface must be learned and you have to look in the manual just to find out how to quit. This could be described as confusing.

Leafpad is easy to use since buttons are labelled and organised in a menu. Different tasks are given different places, structuring the interface. This could also be called approachable.

Leafpad is hard to use since it requires a whole graphical environment just to start. To make selections and issue commands you need to constantly swap your hands between the keyboard and the mouse. Some things take longer to do than others, because they're buried in menus. This could also be called inefficient.

As you can see, both can be described as easy and hard at the same time, even though those words are mutually exclusive. That means the words easy and hard are wrong to use. Whilst we can't really draw much meaning from saying "Vi is easy but hard", we can say "Vi is efficient but confusing". Likewise we can say "Leafpad is approachable but inefficient". Both have their advantages and their disadvantages. To someone who spends a lot of time in a text editor, for example a programmer or a journalist, it would be sensible to invest the time to learn Vi. However, for a general audience Leafpad would probably be more appropriate, thus the default text editor on a mass market operating system should be Leafpad rather than Vi. The same argument can be used for Blender, since its own peculiarities make it very efficient for someone who knows what they're doing, but make it confusing and unwieldy for someone who doesn't want or need to spend the time learning the interface. The only difference here is that I don't know anything as capable as Blender to contrast it with.

Regardless of what experienced users and developers may prefer, the approachable option should always be the default. Saying "I prefer XYZ" doesn't particularly matter when you know how to use a package manager. The defaults should be chosen to suit those who don't know how to use a package manager, or even what one is, since those are the users who will be using the defaults no matter what (because they know of no alternatives). Please make their lives easier.

The end.

Tuesday, 2 September 2008

On Responsiveness

Modern computer systems are multitasking, multiprocessor, multiprocessing, multithreaded powerhouses. Why does an application's user interface become unresponsive when it is doing a task? This seems like a bad design decision on the part of the toolit makers. Programming in a toolkit like QT or GTK+ is already asynchronous, why doesn't the UI keep itself up to date?

For the record, AmigaOS gave its user interface a higher priority than the applications, so that if the user moved the mouse then no matter what the Amiga was doing it would stop and update the cursor position then carry on, the same for buttons being hovered over and depressed, etc. This makes a 20 year old machine *feel* more responsive than current state-of-the-art. My 2 year old laptop with a 2 month old desktop system makes me ANGRY at the scrollbars for not moving. That's just not on.

Wednesday, 28 May 2008

Won't somebody *PLEASE* think of the people?

I used to think that the 1984 connotations of the UK's new "Ministry of Justice" were just down to some completely dense marketting/PR/hip-and-trendy/waste-of-space type person, however I am inclined to rethink that and agree with the sky-is-falling crowd with every day of BBC headlines I read. Here's the latest, from the if-we-say-it's-stopping-paedophiles-then-anybody-against-it-is-obviously-a-paedophile-and-Sun-readers-can-torch-their-house department.

Basically, it is outlawing the output of sad, lonely people who sit in front of computers or sketchbooks and draw or photo-manipulate images, if those images portray children in a pornographic way.

Now, personally I have nothing against child pornography. Now, I'll probably have to tell a lot of people to calm down and read on at this point, because the media has a habit of indoctrinating armies of bigots who take prode in voluntary ignorance. OK, I'll continue. I have nothing against child porn. It is information, either bits on a computer disk/network or lines and pigment on a page/canvas, or magnetic fluctuations on a tape, etc. Child porn is bad because of the method used to make it, eg. the destruction and exploitation of innocent people's lives by sadistic individuals more concerned with satisfying their own twisted primal instincts than bothering to tell them to shut up so that they can spend their short time on this planet making a useful contribution to the human condition. Exploitation of anyone is bad. Destroying lives and shattering people's minds is bad. Brainwashing is bad. It is all incredibly terrible. HOWEVER, demonising the ends rather than the means is completely wrong, in the same way that demonising clothes is wrong even though they can be made in sweat shops.

The point I am making is that laws are arbitrary. Laws have been made, and continue to be made, to satisfy the morals of those with power (hopefully these days that is the people, however that doesn't mean the people cannot be brainwashed). For this reason it is crucial to make the distinction that laws are NOT the same as morals. This is made clear both by those who are prosecuted for acts which can be universally accepted as right and by those who's lives are spent corrupting, destroying and exploiting others, but who cannot be stopped by any laws.

So, if laws are not morals then it is crucial for people to have a set of independant morals in their lives. This is important as it is needed to shape the law as times change (since laws are a human invention, they are not inherent properties of the universe which always hold true, they are instead in a constant state of revision which we must all try to steer asymptotically closer to a "true" set of laws, preferrably from different directions (so that wrong-turns are avoided where possible)). Therefore, disregarding the law and what is says completely, what is wrong with filming children being sexually abused? Well, a hell of a lot, as I have already said. Now, what, disregarding the law and what is says completely, is wrong with drawing children being sexually abused? As far as I am concerned, nothing, since the reason child porn is so abhorrent is the acts that are performed, not their display, and thus minus the acts themselves there are just left some disgusting, depraved, sickening images. It is just as bad for children to be sexually abused in some remote shed in the middle of a forest where nobody outside those involved ever know as it is for it to be photographed or filmed. The real world does not follow cyberspace's "pics or it didn't happen" philosophy.

Notice that I said I find such images sickening, depraved and disgusting. That is based on my own set of values, since I am in absolutely no way into watching children being exploited like that, but those are my VALUES, not my MORALS. My morals are a question of right and wrong, with a lot of grey areas, whereas my VALUES are rankings of things that are important to me. Mandating that everyone must use Free Software would go completely against my morals, even though I value it. Values change over one's life and vary wildly between people, but morals are pretty similar worldwide (once they have been separated from values, which many people confuse). They usually embody things like killing is bad, robbing stuff is bad, rape is bad, etc. and these are fashioned into laws across the globe.

The problem comes when an easily influenced population takes the laws as their morals, and argue points based on the current law being 100% correct. This is where issues like copyright extension come up, since the morals that brought about copyright were that stuff should belong to everyone, but in order to get them to make stuff people should be compensated by a short-term monopoly (which was seen as a bad thing, but acceptable since it was a short-term bad for a long term good). This can be seen by statements like "to promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries" in the US constitution. However, over time these laws became what is 'right', rather than the thinking behind them, and thus copyright extension is now usually over 50 years for most areas and can be over a hundred for others. How does allowing someone to make their entire livelyhood from a single work when they are young "promote the progress of science and useful arts"? The answer is that it doesn't, and retroactive copyright extension is even more ludicrous (if the purpose of giving copyright for a time is to encourage people to make more stuff then a) how can extending it now encourage somebody to make more stuff 50 years ago? and b) how can extending it now encourage somebody to make more stuff if they are already dead?). However, copyright extension gets passed because it takes the current law, seen as 100% good, and extends it, which logically makes it 500% good. The same goes for child porn. Current laws say child porn is bad, which is a good thing since it discourages those who may abuse children, however with such extensions as the one currently being proposed it takes the thinking away and treats the law as 100% good, thus this new proposal makes it 500% good.

Now, I've spent a few hours writing this already and don't really have much more to say, and doubt that those who are inclined will already be out weilding pitchforks and not reading this far anyway, so I shall simply end it here and add some inappropriate labels.

Saturday, 23 February 2008

Some Slashdot musings.....

This was originally a comment of mine on Slashdot, in reply to someone saying how Microsoft's real customers are th NSA, FBI, CIA, etc., but I thought I'd put it here:

Spying is most effective when it targets the largest number of people. Windows is currently the most used desktop OS, thus is the most likely target. If that changes then it's no skin off their [NSA, FBI, etc.] nose, thus there's no value to Microsoft in going out of their way to support such 'customers' (apart from direct bribes, of course, such as, for example, disregarding certain anti-trust conviction requirements/punishments).

Personally I'd say their customers are Time Warner, Universal, EMI, etc. Those guys can choose Microsoft's Media Player/Centre/codecs/DRM or take their business elsewhere. By coaxing them into the Windows fold Microsoft has the ability to deliver a desirable (in mass audience terms) 'product' which, due to exclusive contracts, nobody else can offer, ie. buy Windows and you can get HD quality Hollywood films and CD quality big name music streamed to your machine. Linux can't offer that, Apple are currently closest (lagging behind on the movies) but wouldn't be in Microsoft's world, and nobody else can even *try* to compete, since it is not a technical problem, it is a legal one, and legalities can be arbitrarily created (there may be a chance in emerging economies, for example if a Linux program/distro/vendor set up a deal with Bollywood or far eastern film industries).

Establishing such services as standard, ie. 'That's what computers do', would cement Windows more firmly than any amount of closed-yet-reverse-engineerable protocols or formats would. Just look at WINE. The only way to stop interoperability is by making it illegal using measures such as the DMCA (which would be a cruel irony considering Microsoft's current issues in the EU), and labelling anyone attempting to deliver such newly established 'basic functionality' in an alternative fashion a 'pirate'.

These are measures to keep Windows users consensually enslaved because, after all, slaves don't have to worry about where their next meal is coming from. Some are waking up to the fact that their guaranteed meals aren't considered fit for kebab meat by those not locked in, but the vast majority have been brought up knowing that their masters know best and don't dare lose their meagre lot in life for the unknown promise of more.

I avoid Windows like the plague, and I also despise the increasingly prevalent 'content consumer' idea of human beings (eyeballs, wallets, page hits, bums on seats (although that one's been around for a very long time)). I find it dividing, since to me the idea of 'user generated content' being a surprising phenomenon doesn't make sense considering that every piece of 'content' out there has been made by someone.

To me the best way to stop Microsoft, spread Linux (or some other Free Software system), and in turn make the world a less depressing place (ie. beating armchair culture since TV is a depressant (which is why I don't have one). YouTube has the potential to go one way or the other. Currently is looking rather sad, which is just as well I don't have Flash) is to push the creation skills of Free Software. On Windows you might be able to stream more videos than it is possible to watch in a lifetime, but on Linux every tool you could want for drawing, animating, video editing, writing, music creation, 3D modelling, programming, etc. is right there, available out of the box, for the same price as the OS: Free. Experimenting with programming, for example, is enjoyable on Ubuntu since Python is built in by default and any libraries I might want are a Synaptic click away. On Windows (going the 'official' way) I need to fork out for each and every language compiler/IDE I want to try, have to scour the web for awkward-to-install libraries and pay again and again for simple little plugins and time savers. Forget 'Developers, developers, developers, developers', Windows is designed as a conduit for selling crap software to dumb users purposefully kept in the dark.

Fostering a creative, imaginative, educated but most of all proactive population is a worthy goal to work towards, rather than locking down every useful innovation to ensure a revenue stream from some very tired, apathetic eyeballs with wallets to some useless FUD campaigners in order to pay the lawyers used against those very same 'consumers'.