XML for the absolute beginner
Modeling information structure in XML
So far, we've looked at XML as a way of representing data as
human-readable documents, and we've spent some time discussing
formatting. But XML's real power is in its ability to represent
information structure -- how various pieces of information relate to
one another -- in much the same way a database might.
Structured documents of the type we've been looking at have the
property that all of their elements nest inside one another, as in
Listing 5 above. Instead of looking at a document as a file, though,
consider what happens if we look at the structure of the tags as a
tree:

Figure 3. The recipe represented as a tree structure
The figure above shows the recipe as a tree of document tags. The child
nodes of a document nest within the parent node. What if there were a
way to automagically convert an XML document into a tree of
objects in a programming language -- like, oh, say, Java
maybe? And what if these objects all had properties that could be set
and retrieved -- such as the list of each element's children, the text
each object contained, and so on. Wouldn't that be interesting?
The Document Object Model (DOM) Level 1 Recommendation (see
href="#resources">Resources), created by a W3C committee, describes
a set of language-neutral interfaces capable of representing any
well-formed XML or HTML document.
With the DOM, HTML and XML documents can be manipulated as objects,
instead of just as streams of text. In fact, from the DOM point of
view, the document is the object tree, and the XML, HTML, or
what have you is simply a persistent representation of that tree.
The availability of the DOM makes it much simpler to read and write
structured document files, since standard HTML and XML parsers are
written to produce DOM trees. If these objects have GUI
representations, it's easy to see how to create an application that
reads structured document files (XML or HTML), lets the user edit the
structure visually, and then save it in its original format. Programs
that interface with existing Web sites become much easier to write,
because once the document is parsed, you're working with objects native
to your programming language.
One of the earliest popular uses for the Document Object Model is
Dynamic HTML, where client-side scripts manipulate and display (and
redisplay) an HTML document in response to user actions. Dynamic HTML
manipulates the client-side document in terms of the scripting
language's binding to the DOM structure of the document being
displayed. For instance, a <BUTTON> object might,
when clicked, reorder a table on the same page by sorting the
<TR> (table row) nodes on a particular column.
But aside from all this browers-document-Web technology, the DOM
provides a common way of accessing general data structures from
structured documents. Any language that has a binding (that
is, a specific set of interfaces that implement the DOM in that
language) can use XML as an interface for storing, retrieving, and
processing generic hierarchical (and even nonhierarchical) object
structures.
How DOM and XML work together
The DOM opens the door to using XML as the lingua franca of
data interchange on the Internet, and even within applications. Tim
Berners-Lee, discussed earlier and commonly known as the "inventor
of the World Wide Web," says that, these days, it's important to
understand that if a system you're designing survives, it will someday
be used as a module in another system. So it's best to design it that
way from the start. The DOM is completely described in IDL, the
Interface Definition Language used in CORBA, so it's connected to
existing software interoperation standards.
Let's think a moment about how DOM with XML would be useful in
programming a database system. First, represent your database schema
as a set of DOM objects. Want a document that describes that schema? No
problem: write it out as XML. Use XSL to format the XML as HTML and
you've got a complete, browseable schema reference that's always up to
date. Want to automatically construct SQL for updating your relational
database from a record set coming into your system? Just traverse your
database's (DOM) schema tree, matching up the names of the columns from
the record set with those of the schema, and build an SQL UPDATE
statement as you go. What's that you say? The schema has changed, and
the record set you've received doesn't match up with the new schema?
You can write code to handle that, or present the user with error
messages that state exactly what's wrong. You even might be able to use
XSL to refactor the DOM tree of your record set into something matching
the new schema.
Finally, it's time to start programming in Java! In the next section,
we're going to examine the Java bindings of the DOM and see how to use
the DOM in a Java program.
Page 4. Cascading Style Sheets: not just for HTML anymore
Page 5. Modeling information structure in XML
Printer-friendly (all-in-one) version
XML and Java: General XML resources
- "XML, Java and the Future of the Web." by Jon Bosak. The paper that started it all, at least from a Java programmer's point of view. Definitely worth a read, even if it's a bit dated. Jon is commonly considered to be the father of XML. Funny how all of these technologies seem to have paternity
http://metalab.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
- "Media-Independent Publishing: Four Myths about XML" Jon Bosak
http://metalab.unc.edu/pub/sun-info/standards/xml/why/4myths.htm
- Robin Cover's XML-SGML site is, according to my SGML buddies, the bible of XML resources
http://www.oasis-open.org/cover/
- The W3C's XML resource page lets you cheer from the sidelines as XML technology proposals develop into recommendations, or join in the fray on their active mailing lists
http://www.w3.org/XML/
- OASIS, the Web site of the Organization for the Advancement of Structured Information Standards, offers general news and information about XML
http://www.oasis-open.org
- The Graphics Communications Association, host of the XTech '99 conference (March 11 to 13, 1999, San Jose, CA) and the upcoming XML Europe '99 conference in Granada, Spain, (April 26 to 30, 1999) has a Web site packed with XML information
http://www.gca.org/
- XML.com is great for watching trends and digging up XML news
http://www.xml.com
- Textuality hosts Tim Bray's site. Check it out for a look at the "big picture" of how XML fits into the structured document universe -- and for a look at Lark, Tim's nonvalidating XML processor
http://www.textuality.com/
- The XML FAQ
http://www.ucc.ie/xml/
- IBM's XML Web site is an outstanding supplement to alphaWorks
http://www.software.ibm.com/xml/index.html
- "XML and Java: The Perfect Pair" by Ken Sall (Internet.com, November 1998) provides information about XML, Java, and why these two are a match made in heaven
http://wdvl.com/Authoring/Languages/XML/Java/index.html
- Generally Markup, Richard Lander's Web site may be of interest to you if you haven't yet read enough about markup languages
http://pdbeam.uwaterloo.ca/~rlander/
- The Mulberry Technologies Web site is a good resource for commercial training in XML, as well as general XML and SGML consulting by seasoned SGML experts
http://www.mulberrytech.com
- The Web Developer's Virtual Library Series on XML offers good summaries of various XML technologies, as well as annotated indices of XML software
http://wdvl.com/Software/XML
- Microsoft's Site Builder Network provides a series of articles called "Extreme XML," one of which appears in the following link. While some of it focuses on Microsoft-only, Windows-only technology, there's still some great stuff here
http://www.microsoft.com/sitebuilder/magazine/xml.asp
- Webmonkey has a good series of articles introducing readers to XML. The index is at
http://www.hotwired.com/webmonkey/xml/?tw=xml
- "What the ?xml!" by L.C. Rees offers an interesting take on XML and why it's necessary -- nicely written and entertaining to boot
http://www.geocities.com/SiliconValley/Peaks/5957/wxml.html
- ArborText's white paper, "XML for Managers," is guaranteed to inform even the most pointy-haired of bosses
http://www.arbortext.com/Think_Tank/XML__Resources/XML_for_Managers/xml_for_managers.html
- "The XML Revolution" by Dan Connolly is a quick backgrounder on XML (Nature)
http://helix.nature.com/webmatters/xml.html
- W3C's CSS page will get your started learning about CSS
http://www.w3.org/Style/CSS/
- "Cascading Style Sheets Designing for the Web" by Hakom Wium Lie and Bert Bos (Addison-Wesley, 1997) Sample chapters from the book appear at
http://www.awl.com/cseng/titles/0-201-41998-X/liebos/
- The W3C's XSL page
http://www.w3.org/Style/XSL/
- Read (and comment on) the W3C's XSL Working Draft (currently dated December 16, 1998)
http://www.w3.org/TR/WD-xsl
- "The Extensible Style Language: Styling XML Documents" (WebTechniques Magazine) XSL tutorial information and examples
http://www.webtechniques.com/features/1999/01/walsh/walsh.shtml
- Microsoft's XML and XSL tutorial site is especially interesting because of the recent release of client-side XSL in Internet Explorer 5.0. Extensive and excellent
http://www.microsoft.com/xml
- If you're still using IE 4.0, you can still experiment with XML, using Microsoft's internal DOM
http://www.microsoft.com/xml/articles/xmlmodel.asp
- If you want to experiment with XSL, try downloading IBM's LotusXSL. It's all Java, and for the time being, it's free
http://www.alphaworks.ibm.com/tech/LotusXSL
- Or, you can try James Clark's XT XSL engine, downloadable from
http://www.jclark.com/xml/xt.html
- "XTech '99: Java and the XML wave" by Mark Johnson (JavaWorld, April 1999) offers the most current information on the contest
http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xtech.html
- The definitive description of SAX is available online. You can also download free SAX software here
http://www.megginson.com/SAX/index.html
- The W3C information page for the Document Object Model appears on the W3C site
http://www.w3c.org/DOM/
- Among other things, you'll find the W3C Recommendation for DOM Level 1
http://www.w3.org/TR/REC-DOM-Level-1/
- The Java bindings for DOM, for both XML and HTML, are in this Recommendation appendix
http://www.w3.org/TR/REC-DOM-Level-1/java-language-binding.html
- A great DOM tutorial by William Robert Stanek appears on PC Magazine Online in "Object-Based Web Design." This tutorial includes a discussion of using DOM with IDL, CORBA's Interface Definition Language
http://www8.zdnet.com/pcmag/pctech/content/17/13/tf1713.001.html
- The Dynamic HTML Resource page contains several links to DHTML articles
http://www.hotwired.com/webmonkey/dynamic_html/?tw=dynamic_html
- Epicentric, Inc.
http://www.epicentric.com
- More XML (and other Java) technology than you can shake a stick at is available at IBM's alphaWorks
http://alphaworks.ibm.com
- Version 2 of IBM's excellent XML parser package, xml4j, is available for download. This package includes several parsers, both validating and nonvalidating
http://www.alphaworks.ibm.com/tech/xml4j
- See also IBM's exciting Bean Markup Language project, which uses XML to represent and manipulate JavaBeans
http://www.alphaworks.ibm.com/tech/bml
- Another free Java XML parser was written by the indefatiguable James Clark, download at
http://www.jclark.com/xml/xp/index.html
- XEENA is IBM alphaWorks's DTD-guided XML editor. You want it, you need it, you gotta have it
http://www.alphaworks.ibm.com/tech/xeena
- Mozilla.org is the open source community's effort to extend the Netscape source code. Find out about it at
http://www.mozilla.org
- Information about XML and CSS in Mozilla appears at
http://www.mozilla.org/rdf/doc/xml.html
- You can read about Sun's XML and Java initiatives at
http://www.sun.com/990310/java_xml.jhtml
- In addition, Java Project X includes source code downloadable from
http://developer.java.sun.com/developer/earlyAccess/xml/index.html
- ArborText has a suite of sophisticated tools for editing SGML, XML, and XSL
http://www.arbortext.com/Products/products.html
- Oracle8i from Oracle corporation uses XML inside the Oracle core
http://www.oracle.com/xml/
- Download Oracle's free XML for Java parser
http://technet.oracle.com/direct/3xml.htm
- Microsoft's Internet Explorer 5.0, released this month, implements part of the XSL spec. You can find it on Microsoft's Web site -- and also just about anywhere else
http://www.microsoft.com/windows/ie/default.htm
- You can also download a beta release of Microsoft's XML Notepad editor (limited to running only on Microsoft Windows)
http://www.microsoft.com/xml/notepad/download.asp
- Vervet Logic of Bloomington, IN, has announced XML <PRO>, a commercial XML editor
http://www.vervet.com/
- Majix, to transform XML to HTML via XSL, is available at
http://www.tetrasix.com/
- If your French is rusty, you might want to try the English-language site at
http://www.tetrasix.com/english/default.htm
- Read about the history of HTML here. It's part of an online book, so there's no telling for how long it will be available
http://ei.cs.vt.edu/~wwwbtb/hardcopy/book/chap4/origins.html - The two chapters listed below (of the book "HTML Unleashed" by Rick Darnell, et al., also cover some of the technical background of these languages.
- SGML history
http://www.webreference.com/dlab/books/html/3-2.html
- XML history (such as it is)
http://www.webreference.com/dlab/books/html/38-0.html
- SGML history
- Nothing to do on Friday night? Why not read up on the history of SGML? Charles Goldfarb, considered by many to be the "father of SGML," reminisces publicly at
http://www.sgmlsource.com/Goldfarb/history/index.htm
- Useful XML and SGML information appears at Goldfarb's Web site, including a comprehensive XML book list
http://www.sgmlsource.com
http://www.bluestone.com
http://www.patents.ibm.com/patlist?icnt=US&patent_number=5860073
http://www.b4uby.com/granny/gsoup.htm
http://75music.org/best/docs/keepers.htm