ITworld.com
  Search  
Menu Changing the way you view IT
The Real Cost of XML Tags
Sign up for XML IN PRACTICE
More Newsletters
 
 

XML IN PRACTICE --- 04/04/2002



Oscar Wilde infamously quipped that a cynic is someone who knows the cost of everything but the value of nothing. If this is true (and if I have my inference logic the right way round), then it can be concluded that all schema writers are idealists. They seem to know the value of everything (how to tag it) and the cost of nothing (money consumed by tags).
Advertisement
On this topic




Do you sense a short fuse burning? A sense that the writer is not at peace with the universe in general and XML schema writers in particular? You are darned right my fuse is burning. For weeks now, I have been cutting code to do some XML data processing that would have been much easier if only the schema creator had shown some understanding of the true cost of XML tagging.

"Ah", I hear you say, "You are being selfish again. You want all the XML to be perfect for your specific needs. Typical becubicled, besandled, bearded, belligerent, bellicose, back-end system development type."

Well, yes, of course I would like my life made as easy as possible, but hear this: Where do you think all the money goes on XML markup? In the Total Cost of Ownership calculations to do with XML tags, where are the real costs?

Did you say, "Data modeling/schema creation"?

Bzzzzzt. Thank you for playing, but that is not the correct answer.

The bulk of the costs are attributable to each and every lowly XML tag accumulated in data processing. Every time a developer writes an XPath expression, a SAX handler, or weaves a DOM NodeList, he or she is contributing to the XML tags' cost of ownership. Every time a developer backs off from cutting code because of the sheer complexity of the XML structure being manipulated, you are accumulating costs.

Where does the complexity come from? How complicated can processing a few cuddly little tags peppered with some good old PCDATA be?

Complexity Comes from the Schemas
A schema (be it DTD, XSD, or RNG) makes saying, "At this point, A or B or C or D can occur zero or more times", very easy. Sounds harmless, but look at it from a programmer's perspective. At this point in the document structure, elements A to D can occur any number of times in any order. Lets keep things simple by restricting the elements to having two values. That means that my "business logic" has to deal with 8 main branches at this point in the structure. Now add in another element E with the same two possible values. The main branches in my business logic now number 16.

Simply put, my software's so-called "state space" grows exponentially with the number of things that can occur at a given point in a document structure. The number of possible occurrences at any given point in the structure is directly related to the schema.

Anything that grows exponentially is bad news for software development (except of course caffeine levels). Introducing non-terminals into the schema can significantly reduce state space explosion. By non- terminals, I mean nodes that serve to scaffold sub-structures rather than to carry their own semantic meaning.

An example common in many document centric DTDs is "list". What is a list but a holder of list elements? Most developers hate to see the holding list element go from the schema as it gives them a hook to cleanly snag that starts and ends a list. When manipulating XML as a tree structure, it gives developers a single point to treat as a holdall for the list.

In data-centric XML, the "record" tag is an example of a non-terminal element. What is a record but a sequence of repeating fields? If the repetition can be worked out automatically, then isn't the record tag superfluous? Again, developers would hate to see the record tag go.

So, intermediate "non-terminal" tags are good and tags in general are bad (i.e., the less tags the better for everyone). Think of tags as global variables. The professor would tell you to minimize them because of the thorny logic and spaghetti code they foster. I look at SAX XML processing code and what do I see? Spaghetti!

In XML land, not only are the equivalent of "global variables" created with wild abandon, but their creators often see fit to invoice based on the number they create for you. An unfortunate schism exists in XML software development between the team that develops the schema and the team processing the XML that conforms to the schema. Too often, these are not the same teams.

This state of affairs does not encourage schema design that takes into account the needs of the software development phase, which, after all, is where most of the money will be spent.

What Can Be Done?
I remember an example of lateral thinking in a book by Edward De Bono. A company suspected of water pollution applied for permission to draw fresh water from the same stream it was pumping effluent into. Permission was granted on condition that the water in-take occur down- stream from the effluent discharge.

Stretching the analogy to its breaking point, those who wish to create schemas must work at a software development level with the XML they themselves have modeled. That will teach them the real cost of XML tagging. Then they will be grumpy too. Harrumph!

 



Sponsored links
Locate Hidden Software on business PCs with this free tool
Bring harmony to your mix of UNIX-Linux-Windows computing environments
Top 5 Reasons to Combine App Performance and Security
KODAK i1400 Series Scanners stand up to the challenge
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   IDG Connect   IDG World Expo   Industry Standard   Infoworld   ITworld   JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.