"All our documents have at least one author. That is always true, surely?"
"There might be no authors at all! It depends on where the document is
in the workflow. Sometimes there is no author until the last minute."
"Actually, come to think of it, we have purposely published documents
without an author in the past."
"Ah, not really. It depends on how you look at it. A document with an
author field that just happens to be empty or a document with no author field."
"You are just splitting hairs."
"No I'm not. There is an important difference!"
"What about where we put the author information? Sometimes we need to
put author information after the publication date, sometimes after. We have
always had a mix in our published documents. How will we support that?"
"I don't see how we can capture everything we need about how that 200
page landscape table is formatted. We don't have enough layout control."
"If we used SGML rather than XML, some of our problems would be easier
to solve."
"No. What we really need is to dump XML XSD and go back to DTD.
We never should have listened to that content management consultant."
"You are both wrong! Relax NG is obviously the way to go..."
"What we need to do is get management to pay for top-of-the-range authoring
tools. That will solve our problems."
Sitting on front of Master Foo, using some of the poorest attempts at relaxed
seating positions Master Foo had ever seen, were four highly animated employees
of a manufacturing company from the valley below Pentimenti Mountain. To attract
their attention - and stop the cacophony that was at this stage beginning to
disturb the Koi in the fishpond - Master Foo tapped his green tea cup with a
USB key.
"Gentlemen", he began, raising both palms to face his visitors,
"Please can you slow down and speak one at a time? My poor old ears are
not what they used to be and my ability to follow N simultaneous dialogs is
not what it used to be."
The din subsided.
"All I have heard in the last five minutes is the four of you uttering
the phrase 'controlled vocabulary' over and over again. You do not seem to be
getting anywhere. Which, presumably, is the reason for this visit?"
"Yes, Master Foo.", the spokesman said. "We have made the journey
up Pentimenti Mountain to seek your guidance regarding how best to structure
our mountain of technical documentation. It has become clear to us that we should
be able to codify the structure of our documents and thereby increase the automation
we can apply. For example, we foresee being able to automatically generate web
pages, DVD-based libraries and so forth from a single master set of documents."
"Hmmm. Ok. Tell me more."
"Well, we have been doing some research and have had some external advice
and..."
"Yes?"
"Well, it looks as if we can only take fully advantage of all the great
tools out there - especially the XML tools - if we first formally describe the
structure of our documents. But..."
"But the 'structure' which you feel sure exists is proving difficult to
pin down? Difficult to reach consensus on?"
"Yes. That is it exactly. Every time we think we have it sorted, some
exception pops up that breaks the structural rules we are trying to create."
Master Foo's eyes widened, silently imploring the spokesperson to think it
through for himself.
"Hang on a minute. Master Foo. Just now you said we were all using the
phrase 'controlled vocabulary'. I never said that. We - none of us - said that.
What do you mean? I'm confused."
A quick look around at his colleagues confirmed the spokesperson's belief that
they too, were confused.
Master Foo sighed ever-so slightly. His face coming to rest with a faint hint
of a smile.
"Tell me. This 'structure' you seek for your documents what inheres within
that concept? What is its fundamental nature?"
"Well, when we say our documents have 'structure' we mean that they have
chunks that can be named. We believe that by naming those chunks we can better
automate the processing of large sets of documents."
"Yes. Chunks that can have names. Indeed. Anything else?"
"Well, we start at the top of the document and give the big chunks we
see names. We then look at those chunks and generally, they can be broken into
smaller pieces which themselves can have names. And so on. We build up a hierarchy
of chunks that way."
"Ah. And that hierarchy has a definable order? Author's names appearing
before the document title. Chapters before sections. Chapters inside parts,
that sort of thing?"
"Right!", the spokesperson exclaimed. With one look at Master Foo's
facial expression, his enthusiasm turned to concern.
"Wrong.", replied Master Foo. "Yes documents have structure
if by 'structure' you mean that they have identifiable chunks and those chunks
can be named to the benefit of the enterprise. But, it does not follow that
there is a natural, simple hierarchical order in which those chunks will occur
in a set of documents. It is one of those very appealing ideas in theory, that
does not work well in practice."
"But we have all these great tools we can use to model those hierarchical
structures. Things like DTDs and XSDs and Relax NGs. What about those Master
Foo?", one of the quartet asked.
"Using any of those tools, it is very easy to start with a simple hierarchical
structure. It will work fine for the documents from the middle of the structural
bell curve. However, over time, as you look at more and more documents, you
find yourself having to allow more and more document chunks in more and more
different places in more and more orders. Eventually your structure can degenerate
into a set of names. Named chunks that can all occur essentially anywhere, in
any order, within your documents. In extreme cases, the so-called structure
reduces to a mere..."
"Controlled vocabulary!", the spokesperson said.
"Precisely", replied Master Foo, reaching for his tea cup.
A shocked silence descended.
"Have we toiled in vain Master Foo?".
"No. You have arrived at a very important place. A large repository of
document dukkhaa opens up before you. Go now. Revisit your hierarchical document
structures. Be not afraid of open content models. Most importantly of all,
look at Schematron and ask yourself what parts of your structural rules
are better expressed outside of a hierarchical schema rather than within it.
Embrace the controlled vocabulary. Do not fight it. It is stronger than you
are and it has Murphy's Law on its side."
The quartet took their leave. Silent now. Their brows, furrowed an inch deeper
than when they had arrived.