Suppose you are traveling on the Trans-Siberian Railway. Suppose
something bad happens to the locomotion just East of Moscow and you are
told that it could be many days before you reach the next stop to
stretch your legs. So you sit back in your cabin, resigning yourself to
a long wait.
You need something to amuse yourself, but all you have at your disposal
is the attentions of a smiling XML technologist you share the cabin
with. So you search for a topic of conversation that will keep both of
you from going mad with boredom, a good meaty topic that will run and
run.... Should you find yourself in such a situation, I have topic of
conversation that fits the bill.
Suppose I show you this XML fragment:
<weight>12</weight>
Now I ask you the question, "What type is the piece of information known
as 'weight'?"
I am not exaggerating when I say that this simple sounding question
opens up vistas of debate that could fill a library with dissertations.
The concept of "type" is a real doozie; a harmless looking four-letter
word that packs a massive punch! More than enough to keep the
conversation flowing from Moscow to Vladivostok.
One possible answer is that the value of the variable "weight" is a
variable of type "string" with the value "12". Another possible answer
is that weight is a variable of type "positive integer" with the value
"12".
Now let us make a little modification and ask the same question: "What
type is the piece of information known as 'weight'?"
<weight>+0012</weight>
Is it the string "+0012" or is it still the positive integer 12?
I suspect you can see the slippery slope we are on here? The
"twelveness" of the weight variable is an interpretation of the
underlying characters "+0012" in the XML document. A form of semantic
Polaroid we are seeing the XML through. At its lowest level, XML has but
one single, universal type -- string. An XML document, is first and
foremost a string, all else is layers of interpretation.
The question of types in XML boils down to this: Is the twelveness of
the weight variable something that XML should be intimately concerned
with or should that be left to higher levels of processing?
If XML gets involved in such things, it effects not only XML but also
all its surrounding courtiers and hand maidens such as XPath, XLink,
XQuery, and XSLT. In order for them to act in concert on the "twelvness"
of the weight variable, they all need to share the same set of types.
In order to do this, something has to establish "twelveness" as a
concept that everyone can agree on. W3C XML Schema does this with its
set of data types. Then, the twelveness of the variable "weight" needs
to be exposed in a form that XPath, XQuery, etc... can see it. The Post
Schema Validation Infoset (PSVI) does this.
Unfortunately, nailing down "twelveness" comes at significant cost, as
anyone who has read the W3C XML Schema documents and PSVI specification
will tell you. So much so that some are inclined to think that maybe all
this data typing in the bowels of XML just isn't worth it.
The strongest form of this argument is that all this data typing stuff
is not really in keeping with the ethos of XML. Moreover, if what you
want is tight universal agreement of data types and a tight API for
talking to strongly typed data, XML is not a good place to start. Try
ASN.1 or Java Object Serialization!
Lets not get caught up in that debate here. Lets ask a more fundamental
question. Did XML succeed *precisely* because it did not have strong
data typing or *in spite* of not having strong data typing?
To get an answer to that question, we will need more time than a mere
Trans-Siberian Railway crossing, however slow, would afford. I can see
the advertisement now:
Wanted: XML geek to argue merits of data typing on trip to Alpha
Centauri. Strong constitution and sense of humor required.