Mark Johnson
In 1999, the W3C (World Wide Web Consortium) began an effort to create
a standard "XML Schema" language to supercede DTDs. XML Schema was
recently (1 May 2001) made into a W3C Recommendation. You can read the
specification at the W3C Web site listed in the resource list below.
In addition to what DTDs provide, XML Schema has additional useful
features:
- A large set of basic scalar data types
- Sophisticated user-defined data types
- Namespace support
- Extensibility
- XML syntax (instead of "DTD syntax")
Last week, I showed you a DTD for a diagnostic message in a healthcare
system and discussed its drawbacks. Here it is again:
<.!ELEMENT Episode (Date, Diagnoses)>
<.!ATTLIST Episode
ID CDATA #REQUIRED
PATIENT CDATA #REQUIRED>
<.!ELEMENT Date (#PCDATA)>
<.!ELEMENT Diagnoses (Diagnosis)*>
<.!ELEMENT Diagnosis (Code,Desc?)>
<.!ELEMENT Code (ICD9)>
<.!ELEMENT ICD9 (#PCDATA)>
<.!ELEMENT ICD10 (#PCDATA)>
<.!ELEMENT Desc (#PCDATA)>
The DTD constrains the input structurally, but doesn't provide type
information or a type system. It's in a funky non-XML format and
namespace control (if it were being used) is limited.
Now here is the XML Schema document that replaces the DTD:
<.?xml version="1.0" encoding="UTF-8"?>
<.xsd:schema targetNamespace="http://www.elucify.com/Newsletters/NL21"
xmlns="http://www.elucify.com/Newsletters/NL21"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<.!-- Define Episode -->
<.xsd:element name="Episode" type="EpisodeType"/>
<.!-- Define complex types used in Episode -->
<.xsd:complexType name="EpisodeType">
<.xsd:sequence>
<.xsd:element ref="Date"/>
<.xsd:element name="Diagnoses" type="DiagnosesType"/>
<./xsd:sequence>
<.xsd:attribute name="PATIENT" type="xsd:positiveInteger"/>
<.xsd:attribute name="ID" type="xsd:positiveInteger"/>
<./xsd:complexType>
<.xsd:complexType name="DiagnosesType">
<.xsd:sequence>
<.xsd:element name="Diagnosis" type="DiagnosisType"
minOccurs="1" maxOccurs="unbounded"/>
<./xsd:sequence>
<./xsd:complexType>
<.xsd:complexType name="DiagnosisType">
<.xsd:sequence>
<.xsd:element ref="ICD9"/>
<.xsd:element ref="Desc" minOccurs="0" maxOccurs="1"/>
<./xsd:sequence>
<./xsd:complexType>
<.!-- Defines format of an ICD9 code -->
<.xsd:simpleType name="ICD9Type">
<.xsd:restriction base="xsd:string">
<.xsd:pattern value="\d{3}\.\d{1,2}"/>
<./xsd:restriction>
<./xsd:simpleType>
<.!-- Define elements -->
<.xsd:element name="Date" type="xsd:dateTime"/>
<.xsd:element name="Desc" type="xsd:string"/>
<.xsd:element name="ICD9" type="ICD9Type"/>
<./xsd:schema>
More complicated? Sure. But look at what you gain with that additional
complexity. The "ICD9" user-defined type lets you control the contents
of the string by matching a regular expression. You can control the
number of occurrences with minOccurs and maxOccurs. Many simple types,
like positiveInteger and dateTime, are pre-defined and new simple types
can be derived from them. The entire schema is defined in XML and,
thus, is parsable by a standard XML parser. Since it's XML, the schemas
can be extended and composed with one another.
XML Schema is not easy to learn in its entirety. It's a huge data
modeling language stemming from a controversial development. But now
that it's a W3C Recommendation, you can expect to start seeing it
everywhere. For example, the emerging standards for Web Services are
converging on XSD as the standard schema language for data types.
Fortunately, getting up to speed on the basics of XML Schema is not
that hard. Check out the resources below to start learning about XSD
today.