Sunday, 7 September 2008

Services, integration, communication and standards

These days it is possible to do pretty much anything over the Internet. There's eBay, online banking, PayPal, Flickr, OpenStreetMap, OpenDesktop, Email, chat, forums, Wikis, Facebook, scrobbling, blogs, etc. The big problem, in my mind, is that of technological walled gardens.

A walled garden is an area which cannot be accessed without permission, usually by registering some form of account. Facebook is a classic example of a walled garden. Walls are useful, since having everyone's bank accounts available without restriction would be a problem to say the least. A technological walled garden would be an enclosed, restricted area which can only be accessed via certain technological means. Technological walled gardens are often simpler to implement than open systems, but often the reason the garden operator does this is because they see this as a way to run a dot-com or Web-2.0 business.

Let's take an example, Yahoo! Mail, Windows Live Mail and Gmail, which are all walled gardens in the classical sense, an account is needed and the login details must be provided in order to access anything. The first two, however, are also technological walled gardens: whilst mechanisms to send, retrieve, check and manage email have been around for decades, from "get my mail" (POP3) and "send my mail" (SMTP) to more sophisticated "synchronise this lot of email with that lot" (IMAP) and are well defined, standardised, understood and implemented, in order to access Yahoo! Mail or Windows Live Mail you still need to log in via their website because they don't use any of these standards. Gmail supports them, which is how I can use Evolution and Kmail to manage my Gmail account. Yahoo and Microsoft specifically disable them (I know Yahoo used to allow POP3 and SMTP access, when they stopped I moved away from them) with the reasoning that Evolution and Kmail don't display adverts, whereas their websites do. Here interoperability and standardisation desired by customers (if it wasn't used then there's no point disabling it, since nobody would be unexposed to adverts and the POP/SMTP/IMAP server load would be zero) is sacrificed in order to force adverts onto users who don't want them. This of course doesn't even touch upon the flexibility of using an email client (screen readers and other accessibility for the disabled, offline access, complete choice of interface (including Web), etc.).

That is the major reason why I refuse to join Facebook, MySpace, etc. I cannot buy or download Facebook's software and run it on my own machine,and even if I managed to write my own there would be no way to make it talk to Facebook's own servers. Since the entire point of Facebook is the stuff in their database, this would make my software useless. Hence Facebook have created a technological walled garden: If I joined Facebook then I would be sending any data I entered into a blackhole as far as accessing it on my terms is concerned.

Last.fm is better, since although their server/database software is a trade secret (as far as I can tell), the Audio Scrobbler system they use to gather data is completely documented and has many implementations, many of which are Free Software (including the official player). The contents of their database is also available, and doesn't even require an account (I have tried to think of ways to submit similar artists without an account, such as submitting two tracks at a time and building the data from that, but I suppose spam would be too much of a problem). Only the artist recommendations/similarity, tags and thingsare available, but that's the entire reason I use it, fuck the site with all of its Flash, poor social networking, confusing messaging system and stuff, that's all fluff around the periphery of the useful information. Essentially last.fm is like Gmail: I can't get the code which runs it, but I can get my data in and out in standardised ways which can be implemented with Free Software. I could make my own server which synchronises with with their database via the available protocols, and thus get out everything that I put in.

Now, the issue of synchronisation is interesting. How do you keep two things synchronised? There are a few different approaches and each has its place:

Known, unchanging, unmoving data

Here HTML can be used, ie. a web page. This is fine for people, and for applications needing that data it can simply be copied into the application once. An example would be an "about" page.

Unknown, unchanging, unmoving data

Here HTML can still be used, but since the data is not know beforehand it can be hard for an application to get anything useful from it. RDFa can be used inside the HTML to label each piece of information, thus an application only needs to be told what to find and it will look through the labels until it does, regardless of page structure, layout, etc. An example would be a scientific paper.

Changing data which is accessed once at a time

Here RSS or ATOM can be used. This allows changes to be specified in the file. ATOM is a standard, but RSS is a dialect of RDF which means labelled data is possible. An example would be a changelog.

Changing data which is accessed on every change

Here XMPP PubSub can be used. This means that there is no checking for updates since the source will push any changes out to subscribers when they are made. This doesn't use a file, it uses a stream. This is what my library is designed to accomplish. An example would be a blog.

Two-way communication and instruction execution

Here systems such as JOLIE can be used, overlaying protocols like SOAP. This can be used for dynamically generated data like database queries and searches, as well as for sending instructions such as "Approve Payment". An example would be a shop.

Notice that the first technology, HTML, is the only one which needs a Web browser to access. RDFa, ATOM and RSS are all structured in a way that allows applications to handle them directly, no human is needed and thus many more possibilities are available. XMPP can also be structured, since RDF, ATOM and RSS can all be sent over XMPP, allowing machines to handle the data, but doing so in an as-needed basis which makes things more scalable. JOLIE covers a range of protocols which are all inerently machine-focused, they are executing instructions. This might be a "buy this laptop" instruction when a button is pressed, or a "search for this" instruction when a query is entered.

These technologies allow data and communication to break free of implementation and visualisation. The next Facebook doesn't have to be a centralised Web site, it can be a distributed system with many servers run by different people interacting with each other to allow a scalable, unowned network, like the Web but made of pure information without the overhead of layout, styles, widgets or interfaces. This is called the Semantic Web. All of the visualisation, interface and style can be swappable and implemented where it is wanted, for instance as desktop applications on user's machines, or as Web sites running on various servers. There is no reason why, in an interconnected world, I should have to visit Facebook.com in order to interact with Facebook.

Except, of course, that Facebook wants to force unwanted adverts onto their customers.

No comments: