Mixed content myopia
Have you ever had a debate with someone only to find that they are
missing an important piece of knowledge? I humbly submit that the
concept of "mixed content" in XML is one such concept.
I have been involved in debates about XML processing techniques that
seemed to be going around in circles. More often than not, the
disagreement stemmed from a different conceptual model of XML processing
and, more often than not, that difference revolved around the important
concept of mixed content in XML. If one party to a debate sees it in
their mind-map of XML and the other does not, communication problems are
likely to ensue.
What follows is a debug trace of a conversation bug attributable to
mixed content myopia:
A: You know what XML is?
B: Sure. XML is a way of adding tags to text to denote the text's
meaning. Tags come in pairs -- one at the start and one at the
end.
A: Okay, but what about the text that is not tagged?
B: What do you mean, "not tagged"? If you add tags, you tag up all
the text; there is no text outside of the tagging. Adding tags
allows you to retrieve the values of those tags (i.e., the text
between the start-tag and the end-tag) using APIs.
A: So you think of tag names as field names then?
B: Yes. Furthermore, XML APIs should allow me to read the value of
any particular field.
A: Ah, okay. I see where you are coming from. Oh dear! Is that the
time? Must dash!
The fact is, not all text is guaranteed to be tightly tagged in XML.
Text occurring in, so-called, "mixed content models" can sit outside of
any direct tagging. Further, such freestanding text can be freely
intermixed with tags.
Such mixed content models offer both advantages and disadvantages. Being
a natural model for narrative text offers a big advantage. For example,
the p element in XHTML can contain mixed content as shown in the
following fragment:
This is a para with a link in it
The value of the a element is the text "link", but what is the value of
the p element? Is it the text before the a start-tag? The text before
and after the a element? Does it include the word "link", which is in
the a sub-element?
We could remove this ambiguity by fully tagging the text like this:
in it
Sign up for ITworld's Daily newsletter
Follow ITworld on Twitter @IT_world
Esther Schindler
If the comments are ugly, the code is ugly
claird
SVG a graphics format for 21st century
pasmith
Take Chrome OS for a test spin
Sandra Henry-Stocker
Solaris Tip: Have Your Files Changed Since Installation?
jfruh
Android fragments vs. the iPhone monolith
mikelgan
What Gizmodo missed about the Pro WX Wireless USB disk drive
Where Google Chrome security fails: the password
I heard mention that the Chrome OS will have some sort of encryption available a la bitlocker. If it's possible to encrypt personal data using another password or key, then it may have potential for very secure data.... And Ubuntu has an 'encrypt home directory' option, perhaps google should follow suit.
- Dann
Join the conversation here
Quick, practical advice for IT pros. Made fresh daily.
Want to cash in on your IT savvy? Send your tip to tips@itworld.com. If we post it, we'll send you a $25 Amazon e-gift card.













