ITworld.com
  Search  
Menu Changing the way you view IT
How to cheat with XML schemas
Sign up for XML IN PRACTICE
More Newsletters
 
 

XML IN PRACTICE --- 08/15/2002

No one said creating a schema would be easy, but expending the extra effort to design it right the first time will payoff in the long term.



Before I say anything else, please notice that the word "schemas" in the title of this column has a lowercase "s". The schema cheat techniques that follow are equally applicable to all the schema languages out there, including DTDs, RelaxNG, and W3C XML Schema.
Advertisement
On this topic




Also, by way of important preamble, I should point out that creating a schema for a corpus of information is a "No Pain, No Gain" process -- it has to hurt to be of any use. Cheating will only cause you more pain in the end. Having said all that, if you really want to avoid (actually, defer) the pain, here are two excellent, time honored, techniques for doing so:

1. Model the hard bits as attributes
2. Make all elements optional

Model the Hard Bits as Attributes
Schema languages, by and large, focus on providing mechanisms for expressing constraints on the order of XML elements. For example, expressing the constraint, "If there is an element X there must be an element Y immediately after it", is possible in all three common schema languages. It is also possible to express the constraint, "An element A must be followed by either one or more B elements, or one or more C elements", and so on.

With attributes, the expressible constraints are much simpler, often no more than mandatory/optional occurrence constraints. This limitation on attribute constraints is the basis for this first cheat.

Let's say you are faced with modeling a collection of customers and a collection of partners that have complex inter-relationships known as "deals". You could create an element "customer" and an element "partner". Then you can express the order that mixtures of the two can occur in valid deal elements. You might end up with something like this in your schema (using pseudo DTD syntax):

deal = (partner,customer) | (partner,partner) |
(partner,customer,customer?) | (customer,partner,customer)*

This says that a deal involved either a partner/customer pair, two partners, a partner/customer pair with possibly more customers or a series of triples containing two customers and a partner.

Now, using the magic of attributes, you can make the complexity of this model go away. We can think of partners and customers as "actors" in a deal. In this way of thinking, partners and customers become special cases of a more general purpose thing we are calling an "actor". We can model an actor as a thing that has a "type" attribute that can be one of "partner" or "customer", i.e.

<actor type = "partner"> / <actor type = "customer">

And now the schema model for a deal looks like this:

deal = actor+

There! Isn't that a lot simpler?

Make All Elements Optional
To illustrate this cheat, we will use the same example as before, i.e. modeling a potentially complex combinations of partners and customers making up deals.

This time, we keep partners and customers in separate tags rather than using one element with a type attribute to distinguish them. So, unlike in the last example where we used:

<actor type = "partner"> / <actor type = "customer">

we will use:

<partner> / <customer>

Now, using the trivial observation that all deals consist of either partners or customers, we can model any complex deal like this:

deal = (partner|customer)+

This says that a deal consists of one or more partners or customers.

There! Isn't that a lot simpler!

Yes and no, for both cheats. Yes, in both cases the cheat models work fine. Your boss (or you) can be thrilled with the ease with which all conceivable combinations of partners and customers can be catered for in the model of a deal in the XML schema. Any scenario you dream up can be captured in XML form and validated to be 100% XML and schema language compliant. Great!

In reality, what has happened is that the burden of providing useful, meaningful validation of the structure of your information has simply moved. It has moved further along the workflow. It has moved over to the

  • programmers* creating Java programs, .NET web services, XSLT stylesheets - to process the data.

Such a shift of data validation into code is almost always a bad idea. In both cheats presented here, you end up with schemas that do not tell you very much about the real structure of your data. Consequently, over and over again, the programs that process the data, must check the constraints that the schemas do not check. The result is that validation ends up buried inside programs. Worse, it ends up being duplicated and buried inside programs. Over and over again.

If this happens, the real pain will occur in your wallet. You will end up paying over and over again to enforce the same constraints in multiple places in your systems. Then you will pay more to have them changed when the business requires the addition/modification of the constraints. All as a direct result of making the schema designs simpler.

No pain, no gain.

 



Sponsored links
Locate Hidden Software on business PCs with this free tool
KODAK i1400 Series Scanners stand up to the challenge
Top 5 Reasons to Combine App Performance and Security
Bring harmony to your mix of UNIX-Linux-Windows computing environments
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   IDG Connect   IDG World Expo   Industry Standard   Infoworld   ITworld   JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.