From: www.itworld.com
XML for the absolute beginner
by Mark Johnson
February 5, 2001 —
Modeling information structure in XML
So far, we've looked at XML as a way of representing data as
human-readable documents, and we've spent some time discussing
formatting. But XML's real power is in its ability to represent
information structure -- how various pieces of information relate to
one another -- in much the same way a database might.
Structured documents of the type we've been looking at have the
property that all of their elements nest inside one another, as in
Listing 5 above. Instead of looking at a document as a file, though,
consider what happens if we look at the structure of the tags as a
tree:

Figure 3. The recipe represented as a tree structure
|
The figure above shows the recipe as a tree of document tags. The child
nodes of a document nest within the parent node. What if there were a
way to automagically convert an XML document into a tree of
objects in a programming language -- like, oh, say, Java
maybe? And what if these objects all had properties that could be set
and retrieved -- such as the list of each element's children, the text
each object contained, and so on. Wouldn't that be interesting?
The Document Object Model (DOM) Level 1 Recommendation (see
href="#resources">Resources), created by a W3C committee, describes
a set of language-neutral interfaces capable of representing any
well-formed XML or HTML document.
With the DOM, HTML and XML documents can be manipulated as objects,
instead of just as streams of text. In fact, from the DOM point of
view, the document is the object tree, and the XML, HTML, or
what have you is simply a persistent representation of that tree.
The availability of the DOM makes it much simpler to read and write
structured document files, since standard HTML and XML parsers are
written to produce DOM trees. If these objects have GUI
representations, it's easy to see how to create an application that
reads structured document files (XML or HTML), lets the user edit the
structure visually, and then save it in its original format. Programs
that interface with existing Web sites become much easier to write,
because once the document is parsed, you're working with objects native
to your programming language.
One of the earliest popular uses for the Document Object Model is
Dynamic HTML, where client-side scripts manipulate and display (and
redisplay) an HTML document in response to user actions. Dynamic HTML
manipulates the client-side document in terms of the scripting
language's binding to the DOM structure of the document being
displayed. For instance, a <BUTTON> object might,
when clicked, reorder a table on the same page by sorting the
<TR> (table row) nodes on a particular column.
But aside from all this browers-document-Web technology, the DOM
provides a common way of accessing general data structures from
structured documents. Any language that has a binding (that
is, a specific set of interfaces that implement the DOM in that
language) can use XML as an interface for storing, retrieving, and
processing generic hierarchical (and even nonhierarchical) object
structures.
How DOM and XML work together
The DOM opens the door to using XML as the lingua franca of
data interchange on the Internet, and even within applications. Tim
Berners-Lee, discussed earlier and commonly known as the "inventor
of the World Wide Web," says that, these days, it's important to
understand that if a system you're designing survives, it will someday
be used as a module in another system. So it's best to design it that
way from the start. The DOM is completely described in IDL, the
Interface Definition Language used in CORBA, so it's connected to
existing software interoperation standards.
Let's think a moment about how DOM with XML would be useful in
programming a database system. First, represent your database schema
as a set of DOM objects. Want a document that describes that schema? No
problem: write it out as XML. Use XSL to format the XML as HTML and
you've got a complete, browseable schema reference that's always up to
date. Want to automatically construct SQL for updating your relational
database from a record set coming into your system? Just traverse your
database's (DOM) schema tree, matching up the names of the columns from
the record set with those of the schema, and build an SQL UPDATE
statement as you go. What's that you say? The schema has changed, and
the record set you've received doesn't match up with the new schema?
You can write code to handle that, or present the user with error
messages that state exactly what's wrong. You even might be able to use
XSL to refactor the DOM tree of your record set into something matching
the new schema.
Finally, it's time to start programming in Java! In the next section,
we're going to examine the Java bindings of the DOM and see how to use
the DOM in a Java program.
Resources
There are so many XML resources on the Web, I've had to categorize. The first section here is the most useful, since the documents are either high-level summaries or excellent link sites. Apologies to anyone who was omitted.
XML and Java: General XML resources
- "XML, Java and the Future of the Web." by Jon Bosak. The paper that started it all, at least from a Java programmer's point of view. Definitely worth a read, even if it's a bit dated. Jon is commonly considered to be the father of XML. Funny how all of these technologies seem to have paternity
http://metalab.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
- "Media-Independent Publishing: Four Myths about XML" Jon Bosak
http://metalab.unc.edu/pub/sun-info/standards/xml/why/4myths.htm
- Robin Cover's XML-SGML site is, according to my SGML buddies, the bible of XML resources
http://www.oasis-open.org/cover/
- The W3C's XML resource page lets you cheer from the sidelines as XML technology proposals develop into recommendations, or join in the fray on their active mailing lists
http://www.w3.org/XML/
- OASIS, the Web site of the Organization for the Advancement of Structured Information Standards, offers general news and information about XML
http://www.oasis-open.org
- The Graphics Communications Association, host of the XTech '99 conference (March 11 to 13, 1999, San Jose, CA) and the upcoming XML Europe '99 conference in Granada, Spain, (April 26 to 30, 1999) has a Web site packed with XML information
http://www.gca.org/
- XML.com is great for watching trends and digging up XML news
http://www.xml.com
- Textuality hosts Tim Bray's site. Check it out for a look at the "big picture" of how XML fits into the structured document universe -- and for a look at Lark, Tim's nonvalidating XML processor
http://www.textuality.com/
- The XML FAQ
http://www.ucc.ie/xml/
- IBM's XML Web site is an outstanding supplement to alphaWorks
http://www.software.ibm.com/xml/index.html
XML and JavaTutorials and trainingCascading Style SheetsExtensible Style Language (XSL)Upcoming XSL contest
Though the details aren't yet worked out, Sun Microsystems will soon announce a call for proposals for a $30,000 grant to develop a client-side processor for full XSL implementation in Mozilla. It will also announce, in conjunction with Adobe, a contest (first prize $40,000, second prize $20,000) to develop a pure-Java, server-side processor of the entire XSL language, to format XML to PDF (Adobe's document format). Keep watching the Java Developer Connection (requires free registration), and Mozilla sites for the eventual announcements.
Simple API for XML (SAX)
Document Object Model (DOM)
Dynamic HTMLSoftware
- Epicentric, Inc.
http://www.epicentric.com
- More XML (and other Java) technology than you can shake a stick at is available at IBM's alphaWorks
http://alphaworks.ibm.com
- Version 2 of IBM's excellent XML parser package, xml4j, is available for download. This package includes several parsers, both validating and nonvalidating
http://www.alphaworks.ibm.com/tech/xml4j
- See also IBM's exciting Bean Markup Language project, which uses XML to represent and manipulate JavaBeans
http://www.alphaworks.ibm.com/tech/bml
- Another free Java XML parser was written by the indefatiguable James Clark, download at
http://www.jclark.com/xml/xp/index.html
- XEENA is IBM alphaWorks's DTD-guided XML editor. You want it, you need it, you gotta have it
http://www.alphaworks.ibm.com/tech/xeena
- Mozilla.org is the open source community's effort to extend the Netscape source code. Find out about it at
http://www.mozilla.org
- Information about XML and CSS in Mozilla appears at
http://www.mozilla.org/rdf/doc/xml.html
- You can read about Sun's XML and Java initiatives at
http://www.sun.com/990310/java_xml.jhtml
- In addition, Java Project X includes source code downloadable from
http://developer.java.sun.com/developer/earlyAccess/xml/index.html
- ArborText has a suite of sophisticated tools for editing SGML, XML, and XSL
http://www.arbortext.com/Products/products.html
- Oracle8i from Oracle corporation uses XML inside the Oracle core
http://www.oracle.com/xml/
- Download Oracle's free XML for Java parser
http://technet.oracle.com/direct/3xml.htm
- Microsoft's Internet Explorer 5.0, released this month, implements part of the XSL spec. You can find it on Microsoft's Web site -- and also just about anywhere else
http://www.microsoft.com/windows/ie/default.htm
- You can also download a beta release of Microsoft's XML Notepad editor (limited to running only on Microsoft Windows)
http://www.microsoft.com/xml/notepad/download.asp
- Vervet Logic of Bloomington, IN, has announced XML <PRO>, a commercial XML editor
http://www.vervet.com/
- Majix, to transform XML to HTML via XSL, is available at
http://www.tetrasix.com/
- If your French is rusty, you might want to try the English-language site at
http://www.tetrasix.com/english/default.htm
History
Miscellaneous links
- Bluestone Software has recently made a splash with pure-Java XML application servers, and a freely downloadable Swing package called XwingML
http://www.bluestone.com
- Everyone (except Microsoft) is pretty freaked out about the US Patent Office awarding Microsoft a patent for certain kinds of functionality in style sheets. What happens with this patent, and its impact on developing technology, remains to be seen. Judge for yourself by reading the patent at
http://www.patents.ibm.com/patlist?icnt=US&patent_number=5860073
- The title of the sample recipe is actually the title of a very funny song by William Bolcom. Similar recipes may be found at
http://www.b4uby.com/granny/gsoup.htm
- The song appears on a compact disc (with other odd songs) available from the Public Radio Music Source at
http://75music.org/best/docs/keepers.htm
JavaWorld