ITworld.com
  Search  
Menu Changing the way you view IT
Lies, Damned Lies and XML Markup
Sign up for XML IN PRACTICE
More Newsletters
 
 

XML IN PRACTICE --- 06/13/2002



XML tags tell you what data actually means. XML tags turn mere data into bright shiny information nuggets just dripping with juicy semantics. The truth is out there -- in the data where it belongs -- thanks to XML.
Advertisement
On this topic




Truth is, some of the most egregious lies I have ever seen were expressed in terms of fully validating, XML 1.0 compliant XML documents. Dripping with tags? Yes. Dripping with truth? Er, no.

Perhaps the most common form of XML lies is known in the trade as tag abuse [1]. Tag abuse occurs whenever a tag is used in such a way that it meets all the rules of XML yet does not fulfill the role intended by the application designer. WYSIWYG editing environments are a common cause of tag abuse.

Lets say that you, as an author, wish to emphasis a word by putting it into italics. Something like this:

<p>Insert the widget <Emphasis>very
carefully</Emphasis>.</p>

You ask your wonderful XML-aware editing tool what tags are valid at this point in the document and it pops up a list containing these options:

ReplaceableBattery
AmphibiousLandingCraft Table Emphasis

You notice that the ReplaceableBattery tag will render their contents in italic and, since it is first on the list and quickest to insert, you use it! Your document now contains this:

<p>Insert the widget <ReplaceableBattery>very carefully</ReplaceableBattery>.</p>

As an author, do you care? Using the first tag that comes to hand makes your life easier, the document prints okay and looks fine on the Web site....

In my first decade as a markup geek, I confess to being on the engineering-side of this argument. In a word, I was horrified. Tag abuse used to drive me crazy! What is it with those authors?!? They should get with the program and start marking up the data using the tags we engineers properly make available. No shortcuts!

Then I started to write books. Boy did that change my perspective on the problem! As a writer, I find XML -- indeed, any form of structured authoring -- a real pain. It gets in the way! When I am in full flow, trying to squeeze sensibly structured English out of my brain, the last thing I want is an interface that beeps at me and insists I select from long lists of available tags. Half of the time, I would not be in a position to pick a tag if I tried. Why? Because writing is a creative process. As the words flow through my fingers, I do not have a comprehensive ontological map of the territory. Its just words and ideas. The names of the right tags will be obvious, but only after the content has come into existence. Not before and certainly not during the content creation process. I used to abuse tags with the best of 'em!

The engineers, not the authors, need to get with the program and realize that XML markup cannot be an impediment to the creation of content. If it is, it will be subverted.

There is another reason -- this time a linguistic one -- that causes lies to creep into XML tagging. Natural languages, such as English, compliment vocabularies with grammars. In that sense, they are much like XML applications, which also compliment vocabularies (tags) with grammars (schemas). Ever wonder why the most commonly used constructs in English break the rules of English grammar?

Humans are fond of what Herman Zipf calls the Principle of Least Effort [2]. Basically, we humans will break rules left, right, and center in order to make communications easier. Whether in natural languages (English, French) or artificial languages (UBL, DocBook), the result is the same: We break the rules of grammars to make our life easier. Ergo, tag abuse happens so deal with it.

A third form of lies through markup is also a manifestation of human nature but of a different kind. What if, having searched the list of available tags in your schema, you cannot find one that suits your needs? Given the typically high cost of modifying schemata, authoring environments, downstream processes, etc..., the temptation to just pick a tag that is "close enough" is very great. Indeed, in some environments where XML data capture is invoiced based on character counts plus tags, there can be an economic impetus to just pick a valid tag and get on with it.

Watching schema designers lulled into a false sense of accomplishment as requests for schema changes dwindle into their applications is amusing. More often than not, it is not that the tags are perfect and comprehensive, but that the tag users have found ways around them. Deluded engineers think all is well because the documents pass the validating XML parser.

Oftentimes, if there is a doubt as to the correctness of a tag in a particular context, then the best thing to do is not enter a tag at all. Unfortunately, we humans typically prefer positive action. We like to pick a tag, any old tag will do because any is better than none. It's in our nature, which is unfortunate. With apologies to Wittgenstein, that which we cannot tag, we should pass over in silence.

NOTES

[1] SDATA Society for the Definitive Abolition of Tag Abuse: http://www.ucc.ie/sdata/

[2] http://pespmc1.vub.ac.be/ASC/PRINCI_EFFOR.html

 



Sponsored links
Locate Hidden Software on business PCs with this free tool
Bring harmony to your mix of UNIX-Linux-Windows computing environments
Top 5 Reasons to Combine App Performance and Security
KODAK i1400 Series Scanners stand up to the challenge
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   IDG Connect   IDG World Expo   Industry Standard   Infoworld   ITworld   JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.