ITworld.com
  Search  
Menu Changing the way you view IT
XML is Just a Tree...Not!
Sign up for XML IN PRACTICE
More Newsletters
 
 

XML IN PRACTICE --- 01/17/2002



"XML is just a tree" is, perhaps, the most potent half-truth of early 21st century white paper executive summaries. Although the primary abstraction fostered by the XML family of standards is, indeed, that most information has a primary hierarchical structure, modeled in plain text using things called "tags", the de-jure reality is somewhat different. It does, however, approximate the de-facto reality quite nicely.
Advertisement
On this topic




Firstly, XML instances are not trees rooted at a single root element. One level above that is needed to house any stuff that precedes or succeeds the root element. Stuff before the top-level, start-tag is called the "prolog". Stuff after the top-level, end-tag is called the "epilog" (colloquially, the "epilog" does not actually have a name in the XML standard).

The prolog can bite you when you realize that DOCTYPE declarations, XSL style-sheet processing instructions, and character set information are all part of the prolog. The epilog stuff can bite you if you wish to concatenate XML instances into a single stream of data and split them apart later. It is not possible to tell if certain constructs (processing instructions, comments) are part of the epilog of on XML document or part of the prolog of the next!

So, the XML "tree" is actually a tree one level further up than most people conceptualize it, in order to deal with these prolog/epilog.

But this is only the first level at which the XML "tree" is not what most people conceptualize it as. XML is actually *two* tree structures nested perfectly one inside the other. The one we all know and love is called the logical structure -- the one composed of start-tags, end- tags, attributes, and data content. The other one, less well-known and significantly less well loved, is called the "physical structure" and is composed of "entities".

The entity structure allows you to assemble a logical tree from a collection of physical pieces -- typically files. These entities are typically introduced with an "&" and tail off with a ";" with an entity name in the middle. "amp", "lt" are two simple examples that are actually built into the XML standard. Here is another one:

<!DOCTYPE foo SYSTEM "foo.dtd" [
<!ENTITY bar SYSTEM "bar.xml"> ]> <foo> &bar; </foo>

The upshot of this structure from the parsers perspective is that the contents of the file bar.xml are spliced into the document to replace the "&bar;" reference. But here is the thing: the file bar.xml can itself contain entity references that can contain entity-references and so on. All in a perfectly nested tree structure.

These two trees -- the logical and the physical -- are far from equal in the XML world. The logical structure is at the top of most people's conceptualization of XML. The entity structure is more IT architect fodder. For those who discover the entity structure there is a temptation to use it and a sense of fear: "How come this stuff isn't more widely utilized? Is there a deep gotcha four levels deep into this entity theory?"

Personally, I believe the entity structure should be eschewed. Firstly, they are based on a "declare before use" model that is not the way the Web works. Secondly, they are tightly bound to DTDs in an unpleasant way. What has entity structure got to do with validation? Exactly! Thirdly, nobody seems to want what the entity structure has to offer. More than one developer of my acquaintance has looked at it and said "nah! Thanks but no thanks." Fourthly, XInclude, once it is cut down to size, will, I believe, provide most of the benefits with none of the syntactic baggage.

The benefits and drawbacks I see in XInclude as currently formulated will be the subject of a future article.

 



Sponsored links
Top 5 Reasons to Combine App Performance and Security
KODAK i1400 Series Scanners stand up to the challenge
Bring harmony to your mix of UNIX-Linux-Windows computing environments
Locate Hidden Software on business PCs with this free tool
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   IDG Connect   IDG World Expo   Industry Standard   Infoworld   ITworld   JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.