XML and the document format mind bender

March 16, 2007, 01:01 PM —  ITworld.com — 

XML has been around now, in its final fully fledged form, for more years than I care to remember. Having played a small part in its original creation, thinking back that far makes me feel old.



Explaining the whys and wherefores of XML to non-technologists and technologists alike has always been an interesting challenge. One could be forgiven for thinking that the value proposition has at this stage been fully trashed out. Either you believe in the value proposition or you do not. Either you are applying XML sensibly in your business or you are not. Surely such matters would be well and truly baked at this point?



Not so. Not by a long shot unfortunately. Here is the problem in a nutshell: it is real hard to explain to non-technical folk why it is that keeping your information in XML is not - in itself - a guarantee that any sizable benefits will accrue.



As I have said before in this column [1] and elsewhere, any old nasty, crufty, effectively-proprietary-silo of information goo can be 100% XML compliant. Being in XML lifts information one small step up the information ladder - it is no longer completely opaque outside the four walls of the application that created it. However, it is quite a small step up what is quite a long ladder. Would I prefer to start with XML rather than a non-XML format for most document-centric IT tasks? Yes. For sure. Does XML - in and of itself - make it straightforward to move data from one application to another or to automate document processing? No. No it does not. It can, but it is not an automatic side effect of using XML.



Another nutshell (this is the mind bending one): information can be utterly, utterly application-specific and still 100% XML compliant. Your ability to work with data outside of the application that created it is an optional - highly desirable, but optional - attribute of an XML-based system. It can easily be the case that one proprietary application from one vendor is the only realistic tool for manipulating your XML data. It can easily be the case that without the application in question, the value of the data is significantly diminished and the rationale for using XML in the first place greatly reduced.



Many non-technical (and some technical) folk have difficulty understanding this fact. A common conversation goes something like this:



Slightly technical senior manager person who reads a lot of trade press: "We should move all our documents to XML because all sorts of great things will become possible...If you have time, I can walk you through the benefits..."



Non technical senior decision maker: "That sounds great but according to the blurb I read, our new word processor/DTP/Web Editing tool stores all its information in XML and/or seamlessly imports/exports to XML. So we get all these good things you mention for free as part of our next application upgrade? Excellent!"



Why is this a mind bender now? Because we are on the verge of a world in which all mainstream document-centric tools do XML natively. Most of them will store their files natively in XML. So if the word processor/DTP tools you know and love all do XML natively, why do you need to do anything at all to benefit from XML?



Explaining the flaws here is left as an exercise to the reader. Figuring out how to explain the important issues raised to non-technical senior management in the course of an elevator ride, is left as a Ph.D. thesis suggestion.



It is entirely possible to have all the benefits of XML and yet retain the ability to just use user friendly, commodity off-the-shelf tools. However doing so - especially with complex document-oriented information - requires something more than just slapping an XML label on the file format.



Having the cake and eating it too are what initiatives like ODF and XHTML are all about. To understand their significance you need an understanding of what interoperability really is and how you go about creating it. You need to understand what is realistic to expect a mere file format to do and what is not realistic. You need to understand the areas where XML's mantra of separating content from presentation has its practical limits [2] and how those limits are typically encountered at the boundaries where file format stops and application behavior starts. You need to understand that there is value in an XML file format even if application independence is not a goal. You need to understand that if application independence is a goal, it is really hard to write down in English what an application actually does to information - especially when WYSIWYG word processors author/edit information. Sometimes, meaning inheres in the running application code - not in the file format. You need to understand micro-formats and how semantics can be layered on top of information that is (erroneously) not considered "structured" by an entire generation of developers and IT architects.



If this stuff interests you I recommend starting with the anatomy of interoperability [3]. If this stuff does not interest you, at least take this piece of advice away with you: saying "it is in XML" is essentially equivalent to saying nothing. Any statement of the form "It is in XML therefore..." is a non-sequiter and needs to be questioned.



You need to look a level or two deeper if the real value proposition of XML (and it is real) is to be realized in your organization.




[1] http://open.itworld.com/nl/xml_prac/10042001/pf_index.html


[2] http://open.itworld.com/nl/xml_prac/07252002/


[3] http://www.robweir.com/blog/2007/02/anatomy-of-interoperability.html#links

 

ITworld.com

I like it!
Post a comment
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
Resources
White Paper

Symantec Backup Exec 12 and Backup Exec System Recovery 8 deliver industry leading Windows data protection and system recovery. Download this whitepaper to find out the top reasons to upgrade and how to get continuous data protection and complete system recovery.

Webcast

Data and system loss — from a hard drive failure, malicious attack, natural disaster, or simple human error — can happen anytime. Don’t leave your business vulnerable. Make sure you have a secure recovery strategy in place. Symantec's latest backup and system recovery technology can efficiently restore critical applications, individual emails and documents and even restore your entire system in minutes in the event of a loss.

White Paper

Businesses face a growing challenge to ensure that the IT environment is properly protected. Backup Exec 12 integrates with other applications in the Symantec family of products, to complement your current data protection strategy, keep your data securely backed up and make it recoverable when you need it most.

Free stuff

Crimeware: Understanding New Attacks and Defenses
By Markus Jakobsson, Zulfikar Ramzan
Published Apr 6, 2008 by Addison-Wesley Professional. Part of the Symantec Press series.
Enter now! | Official rules | Sample chapter

Securing VoIP Networks: Threats, Vulnerabilities, and Countermeasures
By Peter Thermos, Ari Takanen
Published Aug 1, 2007 by Addison-Wesley Professional.
Enter now! | Official rules | Sample chapter

Featured Sponsor

AISO founders envisioned a Web hosting company that was environmentally friendly. While the company employed energy-efficient innovations like solar panels, its infrastructure produced unacceptable power and cooling requirements. Find out how AISO leveraged AMD technology to overcome their challenge in this case study white paper.

In this whitepaper, Scalar explores the opportunity to change the landscape with respect to mission critical databases built around Oracle. Leveraging technologies such as Linux, high-end commodity processing power and Oracle RAC technology to architect, design, build and maintain database infrastructure that delivers maximum availability, reliability and performance at a fraction of traditional cost.

On a typical day, weather.com, the Web site for The Weather Channel in Atlanta, serves up between 15 million and 20 million page views. But in September 2004, when back-to-back hurricanes ransacked Florida, the peak traffic on one day more than tripled: over 70 million page views by more than 7 million unique visitors. Read the full success story now.

More Resources