ITworld.com
  Search  
Menu Changing the way you view IT
I CDATA, Do You CDATA?
Sign up for XML IN PRACTICE
More Newsletters
 
 

XML IN PRACTICE --- 02/08/2001



Mark Johnson

Sometimes, you want the XML parser to leave your text alone. XML requires the ampersand ('&') and less-than ('<') be represented by the general entities & and <, respectively. This restriction can make for some tedious typing, and hard-to-read, harder-to-write XML. Putting the the previous sentence in an XML file, would require you to encode it like this:
Advertisement
On this topic




"XML requires the ampersand ('&') and less-than ('<') be represented by the general entities &amp;and &lt;, repectively."

Pretty bad, huh? It gets worse. Imagine you want something looking like XSLT in your XML parser output. For example, this XSLT template implements a new tag '<tag/>' that formats everything within it in bold code font:

<!-- Format contents of "tag" as a tag -->
<xsl:template match="tag"> <b><code><<xsl:apply-templates/>></code></b> </xsl:template>

But imagine writing a document to show this XSLT rule in the output document, just as it looks above. It would have to be encoded like this:

<!-- Format contents of "tag" as a tag -->
<xsl:template match="tag"> <b><code>&lt;<xsl:apply- templates/>&gt;</code></b> </xsl:template>

Yuck! You can see the encoding requirement make for pretty awkward XML.

Fortunately, an easy XML trick called a "CDATA section" gives you a temporary reprieve from the & and < encoding rules. (I mean, & and <). A CDATA section starts with the delimiter '<![CDATA[' and ends with the delimiter ']]>'. It can occur anywhere in an XML document that character data can occur. So, the XSLT rule above can be encoded as:

<![CDATA[
<!-- Format contents of "tag" as a tag --> <xsl:template match="tag"> <b><code><<xsl:apply-templates/>></code></b> </xsl:template> ]]>

Much better!

The CDATA section tells the XML processor to pass through anything inside, verbatim - no parameter substitution, no whitespace processing. The XML processor doesn't parse what's inside a CDATA section, except to look for the CDATA section's closing delimiter ']]>'. So, you can include text just as you want it to appear to processors downstream from the XML parser.

Data inside a CDATA section is just plain character data. The XML parser clips the text out of the CDATA section, pastes the enclosed text block into its output, and then "forgets" a CDATA section ever existed. To programs using XML parsers, CDATA sections are indistinguishable from any other block of text. So you can't, for example, write an XSLT rule that matches only text in CDATA sections. CDATA sections are just a notational convention for temporarily disabling XML's input parsing.

One final point: Don't confuse a CDATA section:

<![CDATA[Hello, XML!]]>

with using the CDATA keyword in a DTD ATTLIST:

<!ATTLIST Address City CDATA #REQUIRED>

or with #PCDATA in an element definition:

<!ELEMENT Address (#PCDATA|'none')*>

The three notations are completely separate concepts.

Now, a pop quiz: How could you represent the string '<![CDATA[Hello, XML!]>>' in an XML document? If you understand the following answer (which is just one way to do it), then you understand CDATA sections:

<![CDATA[<![CDATA[Hello, XML!]]>]]>

 

Mark Johnson is president of Elucify Technical Communications, a Colorado-based training and consulting company dedicated to clarifying novel or complex ideas through clear explanation and examples.

Sponsored links
Top 5 Reasons to Combine App Performance and Security
Locate Hidden Software on business PCs with this free tool
KODAK i1400 Series Scanners stand up to the challenge
Bring harmony to your mix of UNIX-Linux-Windows computing environments
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   IDG Connect   IDG World Expo   Industry Standard   Infoworld   ITworld   JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.