From: www.itworld.com
June 2, 2008 —
First up, a scope warning for this article. This is the first part of a two-part piece about XML. Here I focus on uses of XML in areas such as application configuration files, the exchange of structured, machine-oriented data, that sort of thing. In the second part, I discuss XML in document-centric applications such as content/document management and Web publication systems.
Much has been written and continues to be written about the "angle bracket tax". Now let us start by calling a spade a spade. XML is not a silver bullet and if you unilaterally spray it over your application space you can get into trouble. No amount of pretty-printing an XML file containing a SOAP message will make it look pretty to an application developer's eyes. No amount of pretty-printing a complex ANT script or a CFML script will make the conditional logic that these things often contain, easy to read or easy to process programmatically.
For many applications of XML you will come across, there appears to be a better, more optimal non-XML based solution possible. For any given data representation requirement you as an application developer/designer might have, there is a "better" syntax than XML for representing it. XML is sub-optimal for everything, or so it sometimes seems.
In my opinion, that is not a weakness of XML, it is a key strength. A strength that, if used wisely, pays significant dividends. However, it must be used wisely to be effective.
The most important thing is to ensure that you use XML to solve the parsing problems that you do not want to take on yourself. Tagging data can really cut down on the amount of work you have to do but only if the tags are in the right places. For example, the following example does not really help you process name in your application:
|
<name>Sean Mc Grath </name> |
The problem is that your application must do the tricky part - splitting the first name from the second name. This would be much more useful:
|
<name> <first>Sean </first> <second>Mc Grath </second> </name> |
The second most important thing to do is not complicate your life by using complicated XML processing APIs. There are times when you absolutely must use event-oriented parsing techniques like SAX but most of the time, you don't. Life is much easier if you load up the XML document and "walk" it node-by-node or "pull" parts of it, token by token using pull parsing.
The third most important thing to do is to consider internationalization. Now if you are happy to live in a US-ASCII world, this doesn't apply to you but for everyone else, listen up. Detecting and properly handling character encoding is hard and ugly. XML - for all its sub-optimality - provides a workable framework in which to handle character encoding without too much heartache. Believe me, you do not want to end up re-inventing yet another Unicode encoding detection algorithm.
The fourth most important thing is to create a schema and use it for validating your XML files. I have lost count of the number of XML applications I have encountered where the developers use a non-validating parse of the XML and then proceed to custom code up basic structure rules that can easily be handled using a schema language. My own preference is Relax NG but DTDs are still very useful and, if it is the only choice open to you, XSD is better than nothing.
Speaking of Relax NG, another important thing to do is to not limit yourself to XML syntax as your application grows. Now what does that mean? Well, as your application grows, the density and complexity of the markup you want to apply in the data files will most likely grow too. There may come a point where you see a better non-XML syntax for capturing the data. There may come a point where you are very confident about the future direction of your application and be willing to take on the task of developing a custom parser for your data.
If/when this happens, do not throw away the XML notation. Instead, treat the XML as the verbose, "compiled" form of your data files. Use the XML form to allow you to more easily inter-operate with other systems and more easily develop tool chains that can leverage XSLT, XQuery, Author/Edit systems and so on.
Relax NG compact syntax is a nice example of this idea. I write my schemas in RNC but automatically convert to/from XML using trang. I convert into XML so that I can batch process hundreds of schemas using XSLT and XQuery tool chains. I convert into compact syntax for authoring and reading.
In fact, XQuery provides another example of this "compiled" idea. XQuery is a non-XML syntax for the most part but it has a full XML representation known as XQueryX. I write XQueries using XQuery native syntax but use XQueryX for batch query processing. I also use XQueryX when I want to use XQuery to query repositories of XQuery expressions. (That might hurt your head a little but think about it and I think you will see the power underlying this "compile to XML from compact syntaxes" idea.)
The Semantic Web provides another example and also an interesting example of extending the concept even further[8]. Firstly, if you are going to play with RDF, N3 is so much nicer than the XML equivalent and can easily be converted back and forth to the XML notation. Secondly, I am seeing increasing evidence that the "semantic shadows" concept I wrote about some time ago is gathering steam. I.e. keep your data structures optimal for the problem at hand and "compile" down to the more general, but also unwieldy triple-oriented representation.
In conclusion, it is real easy to see a better way to do something once it has been done once. A better Java? Everyone has an opinion. A better SQL? Everyone has an opinion. It is good to have opinions but it's also good to avoid Monday morning quarterback syndrome. XML's sub-optimality in many ways is your friend, not your enemy. It is a prairie dog, not an exquisitely tuned Amazonian butterfly, with all the survival attributes of the former but not of the beauty of the latter. Use it wisely and it will serve you well. In part 2, I tackle the vexed question of XML in document-centric applications such as content/document management and Web publication systems.
ITworld.com