ITworld.com
  Search  
 Home  Newsletter Archive  XML IN PRACTICE
Selecting Document Nodes With XPath
Sign up for XML IN PRACTICE
More Newsletters
 

XML IN PRACTICE --- 03/08/2001



Mark Johnson

Programs that process parts of XML documents, instead of entire XML documents, often require the users specify the parts to be processed. For example, let's say we have a telephone directory program that can select and print entries from an XML phone directory, and it looks something like this:

<directory>
<entry> <name> <first>Jennifer</first> <last>Jones</last> </name> <phone AREACODE="711">555-2000</phone> </entry> <entry> <name> <first>Jim</first> <last>Jones</last> </name> <phone AREACODE="202">555-1000</phone> </entry> <!-- and so on --> </directory>

How would the user specify, "What are the phone numbers of everyone named Jones in area code 711?"; or maybe, "What are the area codes of everyone named Jennifer?" XPath, that's how! XPath specifies questions like this about an XML document.

An XPath expression locates a set of nodes in the XML document tree by the document nodes' tags, their relative position to one another, and their contents. Such a limited space doesn't allow for a complete exposition of XPath, but I can give you a feel for how XPath works. I also encourage you to explore XPath on your own.

What is a document "tree"?
Think of an XML document as a hierarchical tree, with the top-level tag (or "node") at the root of the tree. (Never mind that the tree only has one root, and that root is at the top -- programmers aren't usually botanists.) The top-level node has "descendants" -- those tags contained within the top tag, as do each node below that level. Every node in the tree has a unique parent -- its enclosing tag -- except the top-level node. In the example above, <document> has two <entry> descendants; <entry> has two descendants, <name> and <phone>; and so on down the tree.

XPath can specify the information you want to retrieve from the XML document by specifying the nodes that contain the desired information. A basic XPath expression indicates tag names to be matched by their literal names, separated by slashes. For example, the following XPath expression refers to all nodes in the document, at any depth (hence the "//"), with the tag name <entry>:

//entry

Applying this XPath expression to the small XML document above would return the document's two <entry> nodes (including everything those nodes imply). Basically, the expression "points" at every <entry> node in the document.

You can give the expression some context by using a single "/" character. The following expression, applied against the sample above, would return the same set of two <entry> nodes:

/directory/entry

The leading slash indicates the top of the tree. This expression literally means, "any <entry> node that is a child of a <directory> node, which itself is the document's top-level node". Likewise, the following expression matches all first names in the above document ("all <first> nodes at any depth below a top-level <directory> node"):

/directory//first

The * operator matches all nodes in a particular position. Applied to the document above, the following XPath expression matches all <first> and <last> nodes:

//name/*

The expression means, "any node (indicated by '*') that is a direct descendant of a <name> node". You can also match the attribute values using "@" and square brackets. For example:

//phone[@AREACODE="711"]

The above expression would match the <phone> node in the second <entry>. The expression @AREACODE="711" inside the brackets indicates that only nodes matching that expression should be selected.

XPath can even specify some pretty complex, arbitrary relationships. For example:

//phone[../name/@first = ../name/@last]

The above expression means, "give me the phone number of everyone whose first name is the same as their last name" (".." represents "the parent of the node").

XSL, XML's style language, selects document pieces for formatting using XPath. As XPath's use becomes more common, you're bound to see XPath expressions. If you're creating an application that manipulates XML documents, then you should consider how XPath can make your application more powerful.

 

Mark Johnson is president of Elucify Technical Communications, a Colorado-based training and consulting company dedicated to clarifying novel or complex ideas through clear explanation and examples.

www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   Industry Standard   Infoworld   ITworld  
JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

DEMO   IDG Connect   IDG Knowledge Hub   IDG TechNetwork   IDG World Expo  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.