From: www.itworld.com

XML for the absolute beginner

by Mark Johnson

February 5, 2001 —

 

Modeling information structure in XML

So far, we've looked at XML as a way of representing data as
human-readable documents, and we've spent some time discussing
formatting. But XML's real power is in its ability to represent
information structure -- how various pieces of information relate to
one another -- in much the same way a database might.


Structured documents of the type we've been looking at have the
property that all of their elements nest inside one another, as in
Listing 5 above. Instead of looking at a document as a file, though,
consider what happens if we look at the structure of the tags as a
tree:









Figure 3. The recipe represented as a tree structure





The figure above shows the recipe as a tree of document tags. The child
nodes of a document nest within the parent node. What if there were a
way to automagically convert an XML document into a tree of
objects in a programming language -- like, oh, say, Java
maybe? And what if these objects all had properties that could be set
and retrieved -- such as the list of each element's children, the text
each object contained, and so on. Wouldn't that be interesting?


The Document Object Model (DOM) Level 1 Recommendation (see href="#resources">Resources), created by a W3C committee, describes
a set of language-neutral interfaces capable of representing any
well-formed XML or HTML document.


With the DOM, HTML and XML documents can be manipulated as objects,
instead of just as streams of text. In fact, from the DOM point of
view, the document is the object tree, and the XML, HTML, or
what have you is simply a persistent representation of that tree.


The availability of the DOM makes it much simpler to read and write
structured document files, since standard HTML and XML parsers are
written to produce DOM trees. If these objects have GUI
representations, it's easy to see how to create an application that
reads structured document files (XML or HTML), lets the user edit the
structure visually, and then save it in its original format. Programs
that interface with existing Web sites become much easier to write,
because once the document is parsed, you're working with objects native
to your programming language.


One of the earliest popular uses for the Document Object Model is
Dynamic HTML, where client-side scripts manipulate and display (and
redisplay) an HTML document in response to user actions. Dynamic HTML
manipulates the client-side document in terms of the scripting
language's binding to the DOM structure of the document being
displayed. For instance, a <BUTTON> object might,
when clicked, reorder a table on the same page by sorting the
<TR> (table row) nodes on a particular column.


But aside from all this browers-document-Web technology, the DOM
provides a common way of accessing general data structures from
structured documents. Any language that has a binding (that
is, a specific set of interfaces that implement the DOM in that
language) can use XML as an interface for storing, retrieving, and
processing generic hierarchical (and even nonhierarchical) object
structures.


How DOM and XML work together

The DOM opens the door to using XML as the lingua franca of
data interchange on the Internet, and even within applications. Tim
Berners-Lee, discussed earlier and commonly known as the "inventor
of the World Wide Web," says that, these days, it's important to
understand that if a system you're designing survives, it will someday
be used as a module in another system. So it's best to design it that
way from the start. The DOM is completely described in IDL, the
Interface Definition Language used in CORBA, so it's connected to
existing software interoperation standards.


Let's think a moment about how DOM with XML would be useful in
programming a database system. First, represent your database schema
as a set of DOM objects. Want a document that describes that schema? No
problem: write it out as XML. Use XSL to format the XML as HTML and
you've got a complete, browseable schema reference that's always up to
date. Want to automatically construct SQL for updating your relational
database from a record set coming into your system? Just traverse your
database's (DOM) schema tree, matching up the names of the columns from
the record set with those of the schema, and build an SQL UPDATE
statement as you go. What's that you say? The schema has changed, and
the record set you've received doesn't match up with the new schema?
You can write code to handle that, or present the user with error
messages that state exactly what's wrong. You even might be able to use
XSL to refactor the DOM tree of your record set into something matching
the new schema.


Finally, it's time to start programming in Java! In the next section,
we're going to examine the Java bindings of the DOM and see how to use
the DOM in a Java program.









Page 1.


Page 2. Origins of XML


Page 3. Make up a markup


Page 4. Cascading Style Sheets: not just for HTML anymore

Page 5. Modeling information structure in XML


Page 6. XML and Java

Printer-friendly (all-in-one) version



Resources

There are so many XML resources on the Web, I've had to categorize. The first section here is the most useful, since the documents are either high-level summaries or excellent link sites. Apologies to anyone who was omitted.

XML and Java: General XML resources
XML and JavaTutorials and training
Cascading Style Sheets
Extensible Style Language (XSL)
Upcoming XSL contest

  • Though the details aren't yet worked out, Sun Microsystems will soon announce a call for proposals for a $30,000 grant to develop a client-side processor for full XSL implementation in Mozilla. It will also announce, in conjunction with Adobe, a contest (first prize $40,000, second prize $20,000) to develop a pure-Java, server-side processor of the entire XSL language, to format XML to PDF (Adobe's document format). Keep watching the Java Developer Connection (requires free registration), and Mozilla sites for the eventual announcements.

    Simple API for XML (SAX)
    Document Object Model (DOM)
    Dynamic HTMLSoftware
    History
    Miscellaneous links

    • Bluestone Software has recently made a splash with pure-Java XML application servers, and a freely downloadable Swing package called XwingML

      http://www.bluestone.com
    • Everyone (except Microsoft) is pretty freaked out about the US Patent Office awarding Microsoft a patent for certain kinds of functionality in style sheets. What happens with this patent, and its impact on developing technology, remains to be seen. Judge for yourself by reading the patent at

      http://www.patents.ibm.com/patlist?icnt=US&patent_number=5860073
    • The title of the sample recipe is actually the title of a very funny song by William Bolcom. Similar recipes may be found at

      http://www.b4uby.com/granny/gsoup.htm
    • The song appears on a compact disc (with other odd songs) available from the Public Radio Music Source at

      http://75music.org/best/docs/keepers.htm