Mark Johnson
The XML specification states that XML documents must be "well-formed"
to be considered XML. The specification also talks about "valid"
documents. Of course, an XML document might be "well-formed" without
being "valid".
A well-formed document complies with all of the well-formedness
constraints in the XML 1.0 specification document. These constraints
include things such as:
- Tag nesting may not overlap. For example, <a><b></a></b> is not
well-formed, because the "b" tag does not close before its
enclosing "a" tag.
- Special characters, such as < and &, must be represented
as "entities", which keeps the XML parser from getting
computed. You've probably seen entities in HTML. They look
like: &.
- All references to external information must be resolved. For
example, other files included in any XML file must be present at
the time of parsing.
A valid document matches the grammar defined in its Document Type
Definition (DTD). A DTD describes the XML file’s required structure in
order to be valid. A DTD optionally appears in the top of an XML file
and describes the valid tag names, the tags’ order, the allowed
attributes’ values, and so on.
A file might be well-formed, but still not comply with the rules
described in the DTD. Valid files, however, are well-formed and match
the DTD defined grammar. All valid documents are well-formed, but not
vice-versa. Non-validated documents don't even have Document Type
Definitions.
You can read the XML 1.0 specification for yourself at:
http://www.w3.org/TR/1998/REC-xml-19980210
For more on validation and DTDs, see:
http://faq.oreillynet.com//XML/fetch.pl?
CompanyID=414&ContentID=174&FaqID=149&word=writing%20a%
20dtd&faq_template=http://faq.oreillynet.com//XML/searchfaq.html&topic=&
back_refr=http://faq.oreillynet.com//XML/&topicname=SGML/HTML%20authors