Mark Johnson
XML was designed so instances of XML (XML files, XML messages, etc...)
can be "self-describing". They contain DTDs describing the file’s
internal structure allowing the parser to chop the input stream up into
identifiable pieces (identifiable via the DTD, that is), and then do
useful things with those pieces.
For example, here's a little XML document containing both a document-
type declaration (everything inside the <!DOCTYPE>), and an XML
document:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Artists [
<!-- This is a document type definition (DTD) for Artists -->
<!ELEMENT Artists (Artist)*>
<!ELEMENT Artist (Name,Born,Died)>
<!ATTLIST Artist Continent (AF|AN|AS|AU|EU|NA|SA) #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Born (#PCDATA)>
<!ELEMENT Died (#PCDATA)>
]>
<Artists>
<Artist Continent="EU">
<Name>Van Gogh, Vincent</Name>
<Born>1853</Born>
<Died>1890</Died>
</Artist>
<Artist Continent="SA">
<Name>Kahlo, Frida</Name>
<Born>1907</Born>
<Died>1954</Died>
</Artist>
</Artists>
The document type declaration (<!DOCTYPE>) contains the document type
definition. But what if you had a few thousand of these documents and
you wanted to change the DTD? You'd have to edit every document to
update the DTD.
For this reason, and many others, keeping DTDs separate from the
documents they describe can be very useful. When many documents must
conform to the same DTD, you can separate the DTD from the XML
documents and share the single DTD between documents. If you need to
change the DTD, you can change it in one place and all the files that
use it will have a new DTD.
The trick uses the SYSTEM keyword in the document type declaration to
define an external DTD. The resulting DTD chunk comes from a file
called an "external subset" in SGML lingo.
Our example above can be separated into two files. First, there's the
DTD file "Artists.dtd":
<?xml version="1.0" encoding="UTF-8"?>
<!-- This is a document type definition (DTD) for Artists -->
<!ELEMENT Artists (Artist)*>
<!ELEMENT Artist (Name,Born,Died)>
<!ATTLIST Artist Continent (AF|AN|AS|AU|EU|NA|SA) #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Born (#PCDATA)>
<!ELEMENT Died (#PCDATA)>
And here's the resulting XML file, using the SYSTEM keyword to identify
the external document type declaration:
<?xml version="1.0"?>
<!-- Here's the external DTD declaration: -->
<!DOCTYPE Artists SYSTEM "Artists.dtd">
<!-- Here's the XML: -->
<Artists>
<Artist Continent="EU">
<Name>Van Gogh, Vincent</Name>
<Born>1853</Born>
<Died>1890</Died>
</Artist>
<Artist Continent="SA">
<Name>Kahlo, Frida</Name>
<Born>1907</Born>
<Died>1954</Died>
</Artist>
</Artists>
The <!DOCTYPE> line in the XML document above keeps your DTD in a
separate document from the XML. In future newsletters, I'll explore
additional reasons for separating the two, and show you some tricks for
making DTDs more flexible and extensible.