XML: How to get the benefits without the heartache, part 1

By Sean McGrath, ITworld.com |  Development, XML Add a new comment

First up, a scope warning for this article. This is the first part of a two-part piece about XML. Here I focus on uses of XML in areas such as application configuration files, the exchange of structured, machine-oriented data, that sort of thing. In the second part, I discuss XML in document-centric applications such as content/document management and Web publication systems.

Much has been written and continues to be written about the "angle bracket tax". Now let us start by calling a spade a spade. XML is not a silver bullet and if you unilaterally spray it over your application space you can get into trouble. No amount of pretty-printing an XML file containing a SOAP message will make it look pretty to an application developer's eyes. No amount of pretty-printing a complex ANT script or a CFML script will make the conditional logic that these things often contain, easy to read or easy to process programmatically.

For many applications of XML you will come across, there appears to be a better, more optimal non-XML based solution possible. For any given data representation requirement you as an application developer/designer might have, there is a "better" syntax than XML for representing it. XML is sub-optimal for everything, or so it sometimes seems.

In my opinion, that is not a weakness of XML, it is a key strength. A strength that, if used wisely, pays significant dividends. However, it must be used wisely to be effective.

The most important thing is to ensure that you use XML to solve the parsing problems that you do not want to take on yourself. Tagging data can really cut down on the amount of work you have to do but only if the tags are in the right places. For example, the following example does not really help you process name in your application:


<name>Sean Mc Grath </name>


The problem is that your application must do the tricky part - splitting the first name from the second name. This would be much more useful:


<name> <first>Sean </first> <second>Mc Grath </second> </name>


The second most important thing to do is not complicate your life by using complicated XML processing APIs. There are times when you absolutely must use event-oriented parsing techniques like SAX but most of the time, you don't. Life is much easier if you load up the XML document and "walk" it node-by-node or "pull" parts of it, token by token using pull parsing.

The third most important thing to do is to consider internationalization. Now if you are happy to live in a US-ASCII world, this doesn't apply to you but for everyone else, listen up. Detecting and properly handling character encoding is hard and ugly. XML - for all its sub-optimality - provides a workable framework in which to handle character encoding without too much heartache. Believe me, you do not want to end up re-inventing yet another Unicode encoding detection algorithm.

The fourth most important thing is to create a schema and use it for validating your XML files. I have lost count of the number of XML applications I have encountered where the developers use a non-validating parse of the XML and then proceed to custom code up basic structure rules that can easily be handled using a schema language. My own preference is Relax NG but DTDs are still very useful and, if it is the only choice open to you, XSD is better than nothing.

    Add a comment

    Post a comment using one of these accounts
    Or join now
    At least 6 characters

    Note: Comment will appear soon after you have activated your account.
    Obscene/spam comments will be removed and accounts suspended.
    The information you submit is subject to our Privacy Policy and Terms of Service.

    ITworld LIVE

    DevelopmentWhite Papers & Webcasts

    White Paper

    HP NonStop SQL Fundamentals whitepaper

    This whitepaper offers a detailed look into the fundamentals of HP NonStop SQL solutions. See how this system delivers unprecedented levels of application availability with fail-safe data integrity and meets the needs of enterprises with large-scale business critical applications.

    White Paper

    Nebraska Medical Center case study

    See how the Nebraska Medical Center implemented a SQL solution to make information more readily available to streamline operations, improve patient care and facilitate medical research with an enterprise solution running on HP NonStop servers.

    White Paper

    Concepts of NonStop SQL/MX

    For DBAs and developers who are familiar with Oracle solutions and want to learn about NonStop SQL/MX, this whitepaper provides an overview of the similarities and differences between the two products-with a specific focus on implementation.

    White Paper

    6 Things Your CIO Needs to Know About Requirements

    If your organization is not predictably successful on technology projects, there is likely an issue in requirements. CIOs must take action and own requirements maturity improvement. There are 6 main things a CIO must know about requirements.

    Webcast On Demand

    User Experience Monitoring

    In this webinar, you will learn hints & tips for improving end-user response times from Forrester Research analyst, Jean-Pierre Garbani.

    Sponsor: Nimsoft

    See more White Papers | Webcasts

    Ask a question

    Ask a Question