Mark Johnson
Few topics re-occur more frequently, wherever XML developers
congregate, than the attributes versus elements debate. The more
experience you have of developing XML systems, the murkier the waters
surrounding this question. The innocent sounding question can, and
does, spark off debates that touch everything from pragmatism to
epistemology to mereology and back again.
Most developers start out by thinking that having both elements and
attributes is useful and, furthermore, that situations best suited to
one of the other are sort of, well, obvious. This is about the time
that "rules of thumb" force their way into your head such as "if it
appears on the printed page, then it should be an element, otherwise an
attribute"; or, "if it has a fixed number of atomic values, then use an
attribute, otherwise use an element" and so on.
As you get more familiar with XML, the distinction between an element
and an attribute becomes more slippery. Attributes cannot contain
markup and are thus guaranteed to be atomic, whether this is either
good or bad depends on your point of view. Elements are flexible and
hierarchical and can have zero or more textual elements in them. Again,
either good or bad depending on how you look at it.
Somewhere along the line, it occurs to you that attributes and elements
are often interchangeable:
<invoice id = "1234">
...
</invoice>
Can be written as:
<invoice>
<id>1234</id>
...
</invoice>
For a while perhaps, you start using elements exclusively and only
hoisting content up into attributes if it is required by some specific
program or process. You develop a taste for modern schema languages
that blur the distinction between elements and attributes almost to the
point of disappearance. For example, the RelaxNG schema languages allow
elements and attributes to be used practically interchangeably because
the structure of constraint expressions stays largely the same syntax.
For example:
<element name = "invoice">
<element name = "id">
<text/>
</element>
</element>
Is trivially changed to:
<element name = "invoice">
<attribute name = "id">
<text/>
</attribute>
</element>
Then you start thinking about all the cool stuff you can express in
RelaxNG that cannot be expressed in, say, DTDs and conclude,
conclusively, that attributes are more trouble then they are worth....
Then one day an epistemologist rains on your parade by
pronouncing "attributes are the essence of markup". You see this:
<invoice id = "1234">
...
</invoice>
Is really syntactic sugar for this:
<tag type = "invoice" id = "1234">
...
</tag>
Arrggghhh!
In this model, there is only one tag! -- an UBER-tag with an attribute
called "type" that is used to hold tag names. Most annoyingly, the
scheme works too.
So much for the simple relationship between attributes and elements.
Just think, in some parallel universe the world has been conquered by
HTML. In their HTML version 4.0, a new tag was added in 1998
called "tag". It has an attribute "type" that can hold any name you
like: "invoice", "pullquote", etc.... User groups have hailed this
break through in HTML that allows industry vocabularies to be cleanly
added to the HTML tag set. HTML Supersets such as ebHTML and NewsHTML
are taking that world by storm....
I wonder, do they use entities?