ITworld.com
  Search  
ITworld Home Page ITworld Webcasts ITworld White Papers ITworld Newsletters ITworld News ITworld Topics Careers ITworld Voices ITwhirled Changing the way you view IT

Master Foo on structured documents

ITworld 12/17/2007

Sean McGrath, ITworld.com

"All our documents have at least one author. That is always true, surely?"

"There might be no authors at all! It depends on where the document is in the workflow. Sometimes there is no author until the last minute."

On this topic

"Actually, come to think of it, we have purposely published documents without an author in the past."

"Ah, not really. It depends on how you look at it. A document with an author field that just happens to be empty or a document with no author field."

"You are just splitting hairs."

"No I'm not. There is an important difference!"

"What about where we put the author information? Sometimes we need to put author information after the publication date, sometimes after. We have always had a mix in our published documents. How will we support that?"

"I don't see how we can capture everything we need about how that 200 page landscape table is formatted. We don't have enough layout control."

"If we used SGML rather than XML, some of our problems would be easier to solve."

"No. What we really need is to dump XML XSD and go back to DTD. We never should have listened to that content management consultant."

"You are both wrong! Relax NG is obviously the way to go..."

"What we need to do is get management to pay for top-of-the-range authoring tools. That will solve our problems."

Sitting on front of Master Foo, using some of the poorest attempts at relaxed seating positions Master Foo had ever seen, were four highly animated employees of a manufacturing company from the valley below Pentimenti Mountain. To attract their attention - and stop the cacophony that was at this stage beginning to disturb the Koi in the fishpond - Master Foo tapped his green tea cup with a USB key.

"Gentlemen", he began, raising both palms to face his visitors,

"Please can you slow down and speak one at a time? My poor old ears are not what they used to be and my ability to follow N simultaneous dialogs is not what it used to be."

The din subsided.

"All I have heard in the last five minutes is the four of you uttering the phrase 'controlled vocabulary' over and over again. You do not seem to be getting anywhere. Which, presumably, is the reason for this visit?"

"Yes, Master Foo.", the spokesman said. "We have made the journey up Pentimenti Mountain to seek your guidance regarding how best to structure our mountain of technical documentation. It has become clear to us that we should be able to codify the structure of our documents and thereby increase the automation we can apply. For example, we foresee being able to automatically generate web pages, DVD-based libraries and so forth from a single master set of documents."

"Hmmm. Ok. Tell me more."

"Well, we have been doing some research and have had some external advice and..."

"Yes?"

"Well, it looks as if we can only take fully advantage of all the great tools out there - especially the XML tools - if we first formally describe the structure of our documents. But..."

"But the 'structure' which you feel sure exists is proving difficult to pin down? Difficult to reach consensus on?"

"Yes. That is it exactly. Every time we think we have it sorted, some exception pops up that breaks the structural rules we are trying to create."

Master Foo's eyes widened, silently imploring the spokesperson to think it through for himself.

"Hang on a minute. Master Foo. Just now you said we were all using the phrase 'controlled vocabulary'. I never said that. We - none of us - said that. What do you mean? I'm confused."

A quick look around at his colleagues confirmed the spokesperson's belief that they too, were confused.

Master Foo sighed ever-so slightly. His face coming to rest with a faint hint of a smile.

"Tell me. This 'structure' you seek for your documents what inheres within that concept? What is its fundamental nature?"

"Well, when we say our documents have 'structure' we mean that they have chunks that can be named. We believe that by naming those chunks we can better automate the processing of large sets of documents."

"Yes. Chunks that can have names. Indeed. Anything else?"

"Well, we start at the top of the document and give the big chunks we see names. We then look at those chunks and generally, they can be broken into smaller pieces which themselves can have names. And so on. We build up a hierarchy of chunks that way."

"Ah. And that hierarchy has a definable order? Author's names appearing before the document title. Chapters before sections. Chapters inside parts, that sort of thing?"

"Right!", the spokesperson exclaimed. With one look at Master Foo's facial expression, his enthusiasm turned to concern.

"Wrong.", replied Master Foo. "Yes documents have structure if by 'structure' you mean that they have identifiable chunks and those chunks can be named to the benefit of the enterprise. But, it does not follow that there is a natural, simple hierarchical order in which those chunks will occur in a set of documents. It is one of those very appealing ideas in theory, that does not work well in practice."

"But we have all these great tools we can use to model those hierarchical structures. Things like DTDs and XSDs and Relax NGs. What about those Master Foo?", one of the quartet asked.

"Using any of those tools, it is very easy to start with a simple hierarchical structure. It will work fine for the documents from the middle of the structural bell curve. However, over time, as you look at more and more documents, you find yourself having to allow more and more document chunks in more and more different places in more and more orders. Eventually your structure can degenerate into a set of names. Named chunks that can all occur essentially anywhere, in any order, within your documents. In extreme cases, the so-called structure reduces to a mere..."

"Controlled vocabulary!", the spokesperson said.

"Precisely", replied Master Foo, reaching for his tea cup.

A shocked silence descended.

"Have we toiled in vain Master Foo?".

"No. You have arrived at a very important place. A large repository of document dukkhaa opens up before you. Go now. Revisit your hierarchical document structures. Be not afraid of open content models. Most importantly of all, look at Schematron and ask yourself what parts of your structural rules are better expressed outside of a hierarchical schema rather than within it. Embrace the controlled vocabulary. Do not fight it. It is stronger than you are and it has Murphy's Law on its side."

The quartet took their leave. Silent now. Their brows, furrowed an inch deeper than when they had arrived.

Sean McGrath is CTO of Propylon. He is an internationally acknowledged authority on XML and related standards. He served as an invited expert to the W3C's Expert Group that defined XML in 1998. He is the author of three books on markup languages published by Prentice Hall. Visit his site at: http://seanmcgrath.blogspot.com.

Read more of Sean McGrath's ITworld.com columns here.




Sponsored Links

Experience The Benefits Of Intel® vPro™ Technology
Get Built-In Security And Remote Management Capabilities. Meet Critical Business Challenges.
Great Deals On FUJITSU Notebooks @ Synnex!
SYNNEX RESELLERS - Check Out The Savings On Lifebook Notebooks, Tablet PCs, And Ultra-Mobile PCs!
Rebates On Motion Computing C5 Tablet PC!
SYNNEX RESELLERS – This Mobile Clinical Assistant Is Perfect For Any Health Care Provider.
FREE virus, spyware & adware scan
Find the malware your AV missed with the Sophos Threat Detection Test.
Enterprise IP Goes Mobile
To maximize full productivity, companies must integrate their mobile applications with the IP network.
» Buy a link now

Advertisements
Sponsored links
Locate Hidden Software on business PCs with this free tool
Top 5 Reasons to Combine App Performance and Security
KODAK i1400 Series Scanners stand up to the challenge
Bring harmony to your mix of UNIX-Linux-Windows computing environments
 Home   IT in the enterprise  Reports and announcements
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   Industry Standard   Infoworld   ITworld  
JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

DEMO   IDG Connect   IDG Knowledge Hub   IDG TechNetwork   IDG World Expo  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.