Mark Johnson
Programs that process parts of XML documents, instead of entire XML
documents, often require the users specify the parts to be processed.
For example, let's say we have a telephone directory program that can
select and print entries from an XML phone directory, and it looks
something like this:
<directory>
<entry>
<name>
<first>Jennifer</first>
<last>Jones</last>
</name>
<phone AREACODE="711">555-2000</phone>
</entry>
<entry>
<name>
<first>Jim</first>
<last>Jones</last>
</name>
<phone AREACODE="202">555-1000</phone>
</entry>
<!-- and so on -->
</directory>
How would the user specify, "What are the phone numbers of everyone
named Jones in area code 711?"; or maybe, "What are the area codes of
everyone named Jennifer?" XPath, that's how! XPath specifies questions
like this about an XML document.
An XPath expression locates a set of nodes in the XML document tree by
the document nodes' tags, their relative position to one another, and
their contents. Such a limited space doesn't allow for a complete
exposition of XPath, but I can give you a feel for how XPath works. I
also encourage you to explore XPath on your own.
What is a document "tree"?
Think of an XML document as a hierarchical tree, with the top-level tag
(or "node") at the root of the tree. (Never mind that the tree only has
one root, and that root is at the top -- programmers aren't usually
botanists.) The top-level node has "descendants" -- those tags
contained within the top tag, as do each node below that level. Every
node in the tree has a unique parent -- its enclosing tag -- except the
top-level node. In the example above, <document> has two <entry>
descendants; <entry> has two descendants, <name> and <phone>; and so on
down the tree.
XPath can specify the information you want to retrieve from the XML
document by specifying the nodes that contain the desired information.
A basic XPath expression indicates tag names to be matched by their
literal names, separated by slashes. For example, the following XPath
expression refers to all nodes in the document, at any depth (hence
the "//"), with the tag name <entry>:
//entry
Applying this XPath expression to the small XML document above would
return the document's two <entry> nodes (including everything those
nodes imply). Basically, the expression "points" at every <entry> node
in the document.
You can give the expression some context by using a single "/"
character. The following expression, applied against the sample above,
would return the same set of two <entry> nodes:
/directory/entry
The leading slash indicates the top of the tree. This expression
literally means, "any <entry> node that is a child of a <directory>
node, which itself is the document's top-level node". Likewise, the
following expression matches all first names in the above document
("all <first> nodes at any depth below a top-level <directory> node"):
/directory//first
The * operator matches all nodes in a particular position. Applied to
the document above, the following XPath expression matches all <first>
and <last> nodes:
//name/*
The expression means, "any node (indicated by '*') that is a direct
descendant of a <name> node". You can also match the attribute values
using "@" and square brackets. For example:
//phone[@AREACODE="711"]
The above expression would match the <phone> node in the second
<entry>. The expression @AREACODE="711" inside the brackets indicates
that only nodes matching that expression should be selected.
XPath can even specify some pretty complex, arbitrary relationships.
For example:
//phone[../name/@first = ../name/@last]
The above expression means, "give me the phone number of everyone whose
first name is the same as their last name" (".." represents "the
parent of the node").
XSL, XML's style language, selects document pieces for formatting using
XPath. As XPath's use becomes more common, you're bound to see XPath
expressions. If you're creating an application that manipulates XML
documents, then you should consider how XPath can make your application
more powerful.