From: www.itworld.com
October 20, 2006 —
I remember once walking around the shelves of an enormous trade-only
food store. The size of the containers on the shelves created a weird
sensation. I felt as if I had been deposited onto the movie set of a low
budget science fiction film. A film featuring shrunken humans doing
battle with regular sized (but now enormous) mayonnaise containers. Come
to think of it, it could also have been a movie dealing with regular
sized humans doing battle with genuinely enormous mayonnaise containers.
This would have been closer to the real world but somehow a lot less fun
to think about.
In food distribution, it is no surprise that the raw material - be it
mayonnaise or any other foodstuff - is the same stuff regardless of the
size of the containers or the energy/complexity fabric of the
distribution network. Enormous containers or regular containers,
supertankers or egg cups, mayonnaise is always mayonnaise.
One of my most cherished analogies is that enterprise computing is a
glorified form of a distribution problem. How to get data (mayonnaise)
from producers (humans/applications) to consumers (humans/applications)
in the best way possible. The word 'best' here being a holder for an
enterprise defined metric - examples include least cost, fastest
end-to-end time, highest throughput, etc.
In the real world, we are completely comfortable with the idea that
although the raw material is the same, the size of the containers and
the machinery used to move the containers changes dramatically with
scale. In the world of bits and bites, we are less comfortable with this
notion. In a world of bits and bytes, do the same scaling issues from
the world of atoms apply in the distribution networks we create?
Clearly, some of the scaling issues do not apply. Geography for example,
is a major driver for the field of physical logistics. Moving stuff from
place to place involves moving things physically from one location on
the planet to another. In the world of bits, geography ceases to be a
defining factor. Two 'places' in virtual space can be right next to each
other in physical space but miles apart in other ways.
The most important yardstick for distance in the logistics of digital
data, is not geography, it is semantics. Two chunks of data are
logistically in the same place in virtual space if they have the same
meaning. Data payloads that share semantics flow friction free, distance
free in the virtual world.
Why does Microsoft Excel flow so easily from machine to machine in an
accounting practice? Because all machines therein are, logistically
speaking, in the same semantic place - they share an understanding of
Excel.
What about documents? Do they flow friction free in virtual space?
Definitely not. The semantics of a Microsoft Word document are not
shared by an OpenOffice document or by a FrameMaker document or by one
of the many variations of an HTML document. Documents are a logistical
problem in virtual space. Unlike, Microsoft Excel, no single application
has come to dominate the document space.
What about invoices? Do they flow friction free in virtual space?
Definitely not. Invoices are another logistical problem in virtual
space. Again, unlike spreadsheets, no single application has come to
dominate the space.
Now, it is not exactly news that application independent data logistics
- also known as data interchange - is a problem. We know how to solve
it. We solve it by creating standardized containers so that stuff can
flow up and down the logistic networks of our data flows.
Unfortunately, there is a problem here. A problem best illustrated by
analogy. You would never dream of stuffing a catering style barrel of
mayonnaise in your fridge. Such containers are aimed at a different part
of the logistics network. You need something tailored to your
environment that is compatible with the higher logistical levels but
suited for your particular purposes.
Similarly, when it comes to data interchange, you need something
suitable for use at whatever scale in the logistics network you are
working with. If you are, for example, a technical writer, you want to
work with a document model suitable for your needs, not for the needs of
all technical writers on the planet. If you are processing invoices, you
want to work with an invoice model suitable for you needs, not a model
that caters for the needs of all possible users of invoices in the world
of commerce.
This is the nub of the problem with electronic data interchange. We have
a marked tendency to mix up our logistical levels. We create standards
for addressing the interchange problem, Docbook and ebXML are examples
that spring to mind. These are, for good reasons, generalized to cover a
wide variety of uses. However, instead of creating small, fit for
purpose subsets or compatible variants for use at different levels in
the logistical network, we attempt to use the interchange format
directly.
The result, if you are an author working with Docbook or an
invoice-generating software program working with ebXML, is that you have
to worry about levels of the logistic network way above the one you are
working in. It is like scooping some Mayonnaise from a barrel of the
stuff wedged into your fridge. It's the right stuff but the container is
all wrong.
We need to learn from the patterns of logistics in the real world. We
need to create containership models that are fit for purpose, tailored
to different levels in the data logistics network, with clean routes to
pump payloads from one container type into another.
There is no 'one size fits all' container for data or indeed any one
vertical market form of data. I think it is about time we stopped trying
to create them. It's about time we embraced the notion that different
containers for data have their place at different points in the data
logistics network.
What we need to focus on, as producers and consumers at different parts
of this network, is the semantics of the data we need to move from
producers to consumers. It is about time we separated, in our minds, the
mayonnaise from the size of the jar it sits in. The former is the real
product, the latter is an artifact of the logistics network. This same
separation of concerns, can and in my opinion, should, be more prominent
in data logistics than it currently is.
There are certain portents that such a thing may appear on the horizon
some time soon, most probably trojan horsed into our collective
consciousness inside the harmless looking moniker 'Web Services'.
In geek parlance, I see a scale-free, services-oriented architecture
based on asynchronous, stateful XML messaging. Some day it will
hopefully have a name that is easier to pronounce.
ITworld.com