Creating a schema in an XML application is always a fun experience,
regardless of the schema language being used. It is best performed near
a window because it helps to be able to stare out of one during the
inevitable waits for inspiration to call on you.
Inspiration is required because dreaming up names and classification
systems for names is, quite simply, a hard problem[1].
For each and every tag name in your schema, you will aspire to the
taxonomic sui generis; the definitive, indisputable mot juste. Your tag
names will oscillate from the laconic to the loquacious. No ontological
stone will be left unturned in your search for the perfect execution of
truth by tagging!
After a while, you will give up like everyone else who has ever tried
this path to perfection. When this happens to you (it will, trust me),
it helps to be near a Web browser so you can read a most excellent
classification of animals[2] from an ancient Chinese Encyclopedia that
goes likes this:
1. those that belong to the Emperor,
2. embalmed ones,
3. those that are trained,
4. suckling pigs,
5. mermaids,
6. fabulous ones,
7. stray dogs,
8. those included in the present classification,
9. those that tremble as if they were mad,
10. innumerable ones,
11. those drawn with a very fine camel hair brush,
12. others,
13. those that have just broken a flower vase,
14. those that from a long way off look like flies.
There! Doesn't that make you feel better?
As this ancient example beautifully illustrates, classifying and naming
things is hard enough, however XML modelers must suffer a further level
of complexity. This extra complexity is called *change*. How would
animal classification work if, in the memetic analog of Stephen Jay
Gould's evolution by punctuated equilibria[3], new forms suddenly emerge
and demand to be tagged in the data?
That would make things even more complex right? The trouble is, that
this is exactly what happens in real world applications on XML. Business
requirements change and with them, data changes shape, processes need to
evolve, die, or be replaced.
In this fluid situation, the "hard-wired" nature of XML schemata can
present a real stumbling block to successful system evolution.
The XML world, perhaps thanks to its SGML heritage, is predisposed to
thinking of validation as something that happens *before* data
processing really starts. A sort of "please wash your hands" prelude to
the main business process.
This form of thinking about validation results in XML schemas that try
and capture all the constraints on the data up front. A sort of
all-or-nothing validation on which everything hangs. Two things follow
from that. Firstly, the schemata get large and complex as they try and
pack in as much validation as possible in one fell swoop. Secondly, the
software to process data conforming to the schema gets more and more
complex[4].
The only constant is change. Given that, does it makes sense to try and
treat validation as a once off action? An action to be performed at a
single point it time, prior to most of the data processing? Would it not
make more sense to think of validation as a process? A sequence of
actions, performed through time, kanban by kanban[5], as data flows from
one from to another through a business process?
Thinking about it this way has a wonderful way of decomposing the
validation into pieces, each of which stands alone, each of which is
small, tidy, and easy to evolve. Then we can move on to worrying about
how we classify the validation processes. Best get near a window for
that one too.
NOTES
[1] http://www.itworld.com/nl/xml_prac/01242002/
[2] http://www.multicians.org/thvv/borges-animals.html
[3] http://www.edge.org/documents/ThirdCulture/g-Pt.1Intro.html
[4] http://www.itworld.com/nl/xml_prac/04042002/
[5] http://www.promodel.com/glossary/kanban.asp