Tuesday, 14 December 2010

Blog Dump 6: The Web of Code

The original World Wide Web consists of big lines of text, called pages, which, by virtue of some ill-defined, external set of rules, somehow manage to contain knowledge. The pages can point to other places on the Web for whatever reason, and to a person this is great. However, to a machine it's just a bunch of numbers with no discernable meaning.

Mixed in with this knowledge are various well-defined, machine-understandable properties which originally denoted the presentation of the text (eg. 'b' for a bold bit, 'i' for an italic bit, etc.) but which gradually changed to instead merely split it up in various ways (eg. 'em' for a bit that needs emphasising, 'strong' for a bit which is important, etc.). The knowledge in the text, however, is still lost to the machine processing it.

This means that a person has to be involved somewhere, requiring a load of unfortuate individuals to navigate through the shit user interface of the Web, reading everything they find, until they get the bit of knowledge they were after.

These days we have search engines which slightly lessen the burden, since we can jump to places which we, as people, have reason to believe are pretty close to our destination and thus shouldn't require much navigating from. Still, though, the machines don't know what's going on.

In the Web of Data we entirely throw away the concept of a page, since it's irrelevant to our quest for knowledge. Text can be confined to pages, but knowledge can't; knowledge is pervasive, interconnected, predicated, higher-order and in general we can't contain a description of anything to a single unit without reference to other things, the decriptions of which reference other things and we end up with our 'page' on one thing actually containing the entire sum of all knowledge. Since there's only one sum of all knowledge, we don't need to use a concept like 'page' which implies that there is more than one.

With the artificial limit of pages done away with we can put stuff anywhere, as long as we use unique names (Universal Resource Identifiers, URIs) to make sure we don't get confused about which bits are talking about what. Now we've got a machine-readable, distributed, worldwide database of knowledge: that's the Web of Data.

At this point many short-sighted people think that the next step is to rewrite the old Web on top of the Web of Data, so that both humans and machines can understand it and work in harmony. These people are, of course, sooooo 20th century.

Machines aren't intelligent (yet), so there's no way we could make a serious moral argument that they are our slaves. Therefore, why aren't they doing everything? There should be no relevant information kept back from the machine, and nothing it outputs should contain any new knowledge which can't already be found on the Web of Data. If we want to refine what it programatically generates, we should do so by adding new information to the Web of Data until it knows what we want, and thus nobody else need specify that data again.

To me, as a programmer, there is an obvious analogy to be made:

The original coding system consists of big lines of text, called files, which, by virtue of some well-defined, external set of rules, somehow manage to contain computation. The files can import other files in the system for whatever reason, and to a person this is great. However, to a machine it's just a bunch of calculations with no discernable meaning.

Mixed in with this computation are various well-defined, machine-understandable properties which originally denoted the representation of the data (eg. 'int' for a 32bit integer, 'double' for a 64bit rational, etc.) but which gradually changed to instead merely split it up in various ways (eg. 'class' for a bit that contains related parts, 'module' for a bit which is self-contained, etc.). The computation in the text, however, is still lost to the machine processing it.

This means that a person has to be involved somewhere, requiring a load of unfortuate individuals to navigate through the shit user interface of the system, reading everything they find, until they get the bit of calculation they were after.

These days we have search engines which slightly lessen the burden, since we can jump to places which we, as people, have reason to believe are pretty close to our destination and thus shouldn't require much navigating from. Still, though, the machines don't know what's going on.

In the Web of Code we entirely throw away the concept of a file, since it's irrelevant to our quest for computation. Text can be confined to files, but computation can't; computation is pervasive, interconnected, predicated, higher-order and in general we can't contain a serialisation of anything to a single unit without reference to other things, the serialisations of which reference other things and we end up with our 'file' on one thing actually containing the entire sum of all computation. Since there's only one sum of all computation, we don't need to use a concept like 'file' which implies that there is more than one.

With the artificial limit of files done away with we can put stuff anywhere, as long as we use unique names (Universal Resource Identifiers, URIs) to make sure we don't get confused about which bits are talking about what. Now we've got a machine-readable, distributed, worldwide database of computation: that's the Web of Code.

At this point many short-sighted people think that the next step is to rewrite the old coding system on top of the Web of Code, so that both humans and machines can understand it and work in harmony. These people are, of course, sooooo 20th century.

Machines aren't intelligent (yet), so there's no way we could make a serious moral argument that they are our slaves. Therefore, why aren't they doing everything? There should be no relevant information kept back from the machine, and nothing it outputs should contain any new calculation which can't already be found in the Web of Code. If we want to refine what it programatically generates, we should do so by adding new information to the Web of Code until it knows what we want, and thus nobody else need specify that process again.

What Is The Web of Code?

The Web of Code is code like any other. However, the operations it performs are not on memory, they are on things in Web of Code. Memory is just a cache. The Web of Code is as high-level and sparse as possible, describing only what it needs to and no more. If we want to alert the user then we alert the user, we do not want to display rectangles and render glyphs, so we do not display rectangles and render glyphs, these are low-level details which can be worked out through reasoning and search on the Web of Code.

No comments: