Attributes Versus Elements: The Never-ending Choice

Few topics re-occur more frequently, wherever XML developers

congregate, than the attributes versus elements debate. The more

experience you have of developing XML systems, the murkier the waters

surrounding this question. The innocent sounding question can, and

does, spark off debates that touch everything from pragmatism to

epistemology to mereology and back again.

Most developers start out by thinking that having both elements and

attributes is useful and, furthermore, that situations best suited to

one of the other are sort of, well, obvious. This is about the time

that "rules of thumb" force their way into your head such as "if it

appears on the printed page, then it should be an element, otherwise an

attribute"; or, "if it has a fixed number of atomic values, then use an

attribute, otherwise use an element" and so on.

As you get more familiar with XML, the distinction between an element

and an attribute becomes more slippery. Attributes cannot contain

markup and are thus guaranteed to be atomic, whether this is either

good or bad depends on your point of view. Elements are flexible and

hierarchical and can have zero or more textual elements in them. Again,

either good or bad depending on how you look at it.

Somewhere along the line, it occurs to you that attributes and elements

are often interchangeable:

...

Can be written as:

1234 ...

For a while perhaps, you start using elements exclusively and only

hoisting content up into attributes if it is required by some specific

program or process. You develop a taste for modern schema languages

that blur the distinction between elements and attributes almost to the

point of disappearance. For example, the RelaxNG schema languages allow

elements and attributes to be used practically interchangeably because

the structure of constraint expressions stays largely the same syntax.

For example:

Is trivially changed to:

Then you start thinking about all the cool stuff you can express in

RelaxNG that cannot be expressed in, say, DTDs and conclude,

conclusively, that attributes are more trouble then they are worth....

Then one day an epistemologist rains on your parade by

pronouncing "attributes are the essence of markup". You see this:

...

Is really syntactic sugar for this:

...

Arrggghhh!

In this model, there is only one tag! -- an UBER-tag with an attribute

called "type" that is used to hold tag names. Most annoyingly, the

scheme works too.

So much for the simple relationship between attributes and elements.

Just think, in some parallel universe the world has been conquered by

HTML. In their HTML version 4.0, a new tag was added in 1998

called "tag". It has an attribute "type" that can hold any name you

like: "invoice", "pullquote", etc.... User groups have hailed this

break through in HTML that allows industry vocabularies to be cleanly

added to the HTML tag set. HTML Supersets such as ebHTML and NewsHTML

are taking that world by storm....

I wonder, do they use entities?

What’s wrong? The new clean desk test
Join the discussion
Be the first to comment on this article. Our Commenting Policies