ITworld.com
  Search  
Menu Changing the way you view IT
Elementary Entity Terminology
Sign up for XML IN PRACTICE
More Newsletters
 
 

XML IN PRACTICE --- 04/12/2001



Mark Johnson

Reading SGML-related specifications such as the W3C's XML, XSL, or XML schema recommendations can be frustrating and confusing. Over the past thirty years, the SGML community has developed a jargon to describe XML concepts. Terms like "notation" or "external unparsed entity" are perfectly clear to an SGML expert, but confusing to the uninitiated. Since XML, XSL, and related technologies descend from SGML, their specifications are written using this jargon. In this week's newsletter, I'll cover some terminology basics so you can read the specifications for yourself.
Advertisement
On this topic




Keep in mind that when I say "XML" in this letter, I actually mean, "XML and similar technologies". I'll set off each term below with

  • asterisks*, so you'll know when you're seeing XML terminology. First, we'll address some basic concepts

Structure
XML documents have both logical and physical structure. The *logical structure* is simply the elements (and attributes) in the document and their order. The *physical structure* is the arrangement of physical data sources (like filenames or URLs) to produce the logical structure.

For example, you've probably seen something like this in an XML document:

<!DOCTYPE PurchaseOrder SYSTEM "PurchaseOrder.dtd">

This XML document's DTD is in an external file, so the document's physical structure involves both the original XML document and this external file. The logical structure is simply the element contents after the physical dependencies have been resolved.

Entities
XML documents use storage units called *entities* to arrange physical structures to produce a logical structure. Entities define blocks of text for reuse in documents or in DTDs, and include data from other storage units (such as files). Several characteristics determine an entity's type. Every entity is either *internal* or *external*;

  • parsed* or *unparsed*; and a *general entity* or a *parameter entity*.

An *internal entity* is defined in a document's prolog (along with or within the DTD), and is not associated with any external file or data source. An *external entity* is also defined in the prolog, but depends on some external file or data source. For example:

<!ENTITY Alpha "Á"> <!-- Internal --> <!ENTITY Chars SYSTEM "chars.dtd"> <!-- External -->

A *parsed entity* is parsed by the XML processor, and its contents are part of the document's logical structure. An *unparsed entity* is a reference to data that may or may not be XML. Each unparsed entity is associated with a *notation*, which indicates what sort of processor can access the unparsed entity. All internal entities are parsed, whereas external entities may be parsed or unparsed.

A *general entity* is used to represents text in the body of a document, and a *parameter entity* represents text in a DTD. To use a general entity, de-reference it with an ampersand (&). For example:

<!-- Define an internal (general parsed) entity --> <!ENTITY Copyright "Copyright (c) 2001"> <!ENTITY Author "Mark Johnson">

<!-- Use the entity -->
<TITLE> &Copyright; by &Author. All rights reserved. </TITLE>

A *parameter entity* is used (and defined) with a Percent symbol (%), and can only be used within a DTD like this:

<!ENTITY % Colors "Red|Green|Blue|Black|Brown"> <!ELEMENT SHOES EMPTY> <!ATTLIST SHOE SIZE (#PCDATA) COLOR %Colors;> <!ATTLIST TIE TYPE "Bola|Cravat|Ascot" COLOR %Colors;>

In the sample above, the parameter entity Colors (which is also an internal, parsed entity) is defined with the ENTITY keyword and used to COLOR for both SHOE and TIE. Trying to use a parameter entity outside of a DTD is an error.

The resources below will provide you with enough additional detail to come "up to speed" on entity terminology. Meanwhile, I'll cover more XML terminology in upcoming newsletters.

 

Mark Johnson is president of Elucify Technical Communications, a Colorado-based training and consulting company dedicated to clarifying novel or complex ideas through clear explanation and examples.

Sponsored links
Locate Hidden Software on business PCs with this free tool
KODAK i1400 Series Scanners stand up to the challenge
Top 5 Reasons to Combine App Performance and Security
Bring harmony to your mix of UNIX-Linux-Windows computing environments
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   IDG Connect   IDG World Expo   Industry Standard   Infoworld   ITworld   JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.