Mark Johnson
A Document Type Definition (DTD), as you probably know, constrains what
elements and attributes may appear in an XML document and how they
relate to one another. The DTD expresses rules about how elements must
be ordered, elements that contain other elements, the possible values
of attributes, and so on. A program that uses XML can apply a DTD to an
XML document to ensure that the document contents follow the rules.
For example, imagine your veterinarian has a brand-new clinic
management system that uses XML. Now imagine you have a balding cat.
The veterinarian diagnoses the cat with alopecia and enters the
diagnosis into the system. The clinic management system creates the
following XML document and sends it (as a message) to a billing system
somewhere on the network:
<.Episode ID = "5" PATIENT = "804591">
<.Date>2001-06-22T14:31:00.000-06:00<./Date>
<.Diagnoses>
<.Diagnosis>
<.ICD9>704.00<./ICD9>
<.Desc>Alopecia, unspecified baldness<./Desc>
<./Diagnosis>
<./Diagnoses>
<./Episode>
This document describes an "episode of care" for your cat, including
the date of the episode, the ICD9 code (a standardized vocabulary for
diagnoses), and an English description of the condition.
The billing system can use a DTD to ensure that an incoming Episode
follows the rules for Episodes. The DTD for an Episode might look
something like this:
<.!ELEMENT Episode (Date, Diagnoses)>
<.!ATTLIST Episode
ID CDATA #REQUIRED
PATIENT CDATA #REQUIRED>
<.!ELEMENT Date (#PCDATA)>
<.!ELEMENT Diagnoses (Diagnosis)*>
<.!ELEMENT Diagnosis (Code,Desc?)>
<.!ELEMENT Code (ICD9)>
<.!ELEMENT ICD9 (#PCDATA)>
<.!ELEMENT ICD10 (#PCDATA)>
<.!ELEMENT Desc (#PCDATA)>
This DTD provides the structure for the XML document, including element
nesting, attributes, possible attribute values, and number of
occurrences of each element. However, the DTD "language" isn't powerful
enough for use with data like these, for a several reasons:
- No way to constrain data. For example, there's no way to ensure
that ICD9 diagnostic codes match a specific format.
- No primitive data types. No way to indicate the Episode
attributes ID and PATIENT are numbers.
- No sophisticated data types like date.
- Non-XML syntax. The DTD is written, not in XML, but in its own
peculiar syntax. So there's one more language to learn, and one
more thing to get wrong.
- Weak namespace control.
Next week, you'll see an example of XML Schema, recently made a
recommendation by the W3C. XML Schema solves the above problems and
more, but at a cost of great complexity – one of the problems XML was
supposed to solve.