Part of my hometown (Dublin, Ireland) is known by its postal address:
Dublin 4. Dublin 4, or D4 as it is affectionately known, is a decidedly
up-market part of town, making an address in D4 a sought after thing.
People from D4 go to the opera, ship table napkins in from Paris, and
manicure their pet poodles. You get the picture.
Now, an Irish politician well versed in the arts of finances and
taxation famously described Dublin 4, not as a place, but as a state of
mind. D4 is an attitude, an approach, a way of seeing, and a way of
wearing Versace. No document, in the best tradition of logical
positivism, splits the universe into things that are D4 or not D4.
Things slip in and out of D4. The D4 concept evolves, mutates, it even
physically moves. Yet it remains D4.
The metaphysics of D4 is a subject somebody should write a book about,
preferably Robert Pirsig. While he is at it, he could look at some
other hard-to-pin down concepts that are more a state of mind than
physical realities. Like HTML and XML, for example.
Lets start with HTML. HTML is famously short on details regarding how
rendering should proceed and how the final rendered outcome of an HTML
document should appear. In a very deep and important sense, the
correctness of HTML is a function of how it looks in a Web browser. The
Web browser actually defines "correctness" for HTML.
Worryingly, the definition of "correct" for HTML is fluid. It evolves,
mutates, and it even physically moves as reference implementations of
browsers move. Yet HTML somehow remains HTML. Even though what we think
of as HTML in, say, Internet Explorer 6 is very different from what we
think of as HTML in, say, Lynx or Amaya.
The trouble starts, of course, when documents in HTML -- whatever that
means -- move from one environment to another -- one word processor to
another, one browser to another, and so on. All over the world,
unsuspecting semi-technical people say "just send us HTML" or "our
systems work with HTML" only to find that the markup hits the fan down
in engineering who have to actually work with the data.
Along comes XML. As well as generalizing the markup solution of HTML to
cover data in general rather than simply display-oriented data, it rids
the world of the "correctness" problem that HTML suffers. XML formally
states what is and is not correct. No longer can there be any doubt
about what is and what is not a correct document. The definition
of "correct" is not fluid. It does not evolve or mutate. It is fixed in
stone.
Or is it? Where do I get off saying that XML is a state of mind? Two
things. The first issue is what I call that old "may-or may-not
problem". Secondly, we encounter the "semantic food chain problem".
The "may-or may-not problem"
In numerous places throughout the XML standard, an XML processor "may-
or may-not" do something. These may seem harmless enough, but they are
far from harmless. Every time a developer hits a "may-or may-not"
feature, software must cater to the presence/absence of the feature.
Chain a bunch of loosely coupled XML processing systems together and
the effect of even simple "may-or may-not" differences creates a
rolling snowball effect of surprising changes to the underlying XML
data.
Interoperability -- one of the main raison d'etre of XML -- suffers
badly. Owing to the "may-or may-not problems", each component sub-
system can be 100% compliant with the XML standard and yet face
significant interoperability problems. In a world where the definition
of "correctness" is reliable, end-to-end communication of XML messages
(such as e-commerce for example), you can see why this can pose a
problem.
The "semantic food chain problem"
I used to wonder why some of the biggest IT vendors on the planet were
falling all over themselves to achieve 100% compliance with the basic
XML specifications. I had assumed that XML would be an embrace and
extend battleground like HTML, or SQL before it. This did not happen.
Why not? Did all these fiercely competitive forces suddenly decide to
be best buddies? Of course not, I had missed something very important.
100% compliance with the XML standard has become commoditized, thus
nullifying it as a commercial battleground. Not only that, but, owing
to the "may-or may-not" problem, locking a customer in and still
preaching 100% XML compliance is still entirely possible. However, the
main reason the vendors have not fought over the details of XML
compliance is that the real money is elsewhere.
XML standardizes (with some warts) a basic level of information
interchange. It does so mostly at the syntax level but also partly on
the semantics level (i.e., what the data means). However, the vendors
know full well that the true meaning of the data is dictated by the
systems that process it, not by the syntax of the data. In other
words, "correct" in the context of XML import/export has very little to
do with the syntax of the XML and everything to do with how the
sending/receiving system reacts to the data.
This part of the system is proprietary. The XML is "correct" if it gets
in and out of the proprietary systems without surprises -- not because
the XML standard prescribes it as correct. True correctness of XML
evolves, mutates, and physically moves as fashions and money move. Just
like Dublin 4.