ITworld.com
  Search  
 Home  Newsletter Archive  XML IN PRACTICE
The XML Declaration
Sign up for XML IN PRACTICE
More Newsletters
 

XML IN PRACTICE --- 04/19/2001



Mark Johnson

In last week's newsletter, I presented the types of entities that can be found in an XML document. This week, I'll explain the "XML declaration", which should occupy the first line of every XML file. As in last week's newsletter, I'll set off XML vocabulary in *asterisks*, to make it stand out.

Every XML file should start with an *XML declaration*, which indicates several pieces of information that an XML-processing program uses to parse the file. The XML declaration indicates that a document is XML, what *XML version* the document uses, the *encoding* for the document, and whether the document is *standalone*. (I'll explain what these things mean in a moment.) A typical XML declaration might look like this:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>

The *XML version* number indicates the version of the XML spec to which the document conforms. The only current valid version number is "1.0" since that's the only official version of the XML specification. The version number is the only mandatory attribute of the XML declaration; in other words, the minimum XML declaration looks like this:

<?xml version="1.0"?>

A document's *encoding* describes how the program processing the XML document should interpret the bytes in the file. A character set defines how sequences of one or more bytes map to characters for display. XML handles character sets in a general manner as it was designed to be international. The XML specification mentions the following character strings, and spells out what character sets these strings encode

Character Set Strings
------------- ------- Unicode (ISO/IEC 10646) UTF-8, UTF-16, ISO-10646-UCS-2, Unicode (ISO/IEC 10646) ISO-10646-UCS-2 ISO 8859 ISO-8859-1 .. ISO-8859-9 JIS X-0208-1997 ISO-2022-JP, Shift_JIS, EUC-JP

XML processors typically recognize other encodings, too. ASCII is a subset of UTF-8, for example. The encodings' names are case-insensitive by definition. Most commercial products should be able to handle ASCII, ISO-8859-1, the UTF encodings, and probably some of the JIS encodings. The Annotated XML specification recommends choosing one of those.

The *standalone* document declaration (SDD) is the third possible element in the XML declaration. The standalone declaration indicates whether the document contents can be fully interpreted without getting information from elsewhere. Certain declarations in the DTD (for example, external entity declarations) can affect the document's content when XML processing program reads it. For example, if your document uses an entity defined in an external file, then the document isn't "standalone". The XML processor has to read and use the DTD to properly interpret the document contents. The value of the declaration must be either 'yes' (if the document itself contains all of the data needed to interpret it), or 'no'. Like the encoding, the standalone declaration is optional.

As a final note, you'll notice that I said every XML file "should" start with an XML declaration. That's because an XML declaration is optional. The XML specification doesn't absolutely require the declaration, since a great deal of SGML and HTML already exists as well- formed (or nearly-well-formed) XML. Absolutely requiring the XML declaration would have made these otherwise-compliant legacy files non- well-formed. Therefore, the specification leads recommend the XML declaration, instead of requiring it. Tim Bray, one of the XML specification editors, says, "You should definitely use an XML declaration unless you have a *really* good reason not to." In addition, many popular XML parsers treat the absence of a declaration as an error; so if you don't have a declaration, your file won't parse, much less validate.

 

Mark Johnson is president of Elucify Technical Communications, a Colorado-based training and consulting company dedicated to clarifying novel or complex ideas through clear explanation and examples.

www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   Industry Standard   Infoworld   ITworld  
JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

DEMO   IDG Connect   IDG Knowledge Hub   IDG TechNetwork   IDG World Expo  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.