Separating Content from Presentation: Easier Said Than Done

We say it so often we don't even listen to the words any more. "Separate

content from presentation" with XML. Information should not be

inter-twingled with any one presentation as the presentations may

(indeed will) change over time. New browsers, new devices of all sorts,

oodles of document formats -- all can be targeted from a single source

XML document. So sayeth the fundamental lore of XML.

Although this works fine for the most part, there are some situations

where it is not really possible in practice. There are some cases where

the content and the presentation are so deeply intertwined that

separating one from the other is far from straightforward.

Four cases spring to mind: graphics, tables, mathematics, and advanced


Lets deal with the graphics problem first. Graphics come in two main

flavors -- vector and bitmap. Vector graphics are quite in keeping with

the ethos of separating content from presentation espoused by XML.

Vectored formats concentrate on capturing the semantics of an image --

in terms of fundamental units such as lines, circles, fills and so on.

The objects are laid out on a virtual space. At the point on rendering,

a mapping between the virtual space and the physical space is made to

achieve the best possible rendering given the limitations of the

rendering device e.g. area in pixels, color depth and so on.

Bitmapped graphics on the other hand, are intimately tied to a

particular rendering in terms of pixel area and color depth. Bad things

typically happen if you try and resize bitmapped images as the pixels in

the image do not encode any semantics about what the image represents.

In short, they cannot be repurposed to different shapes, sizes or color

schemes without significant loss in quality.

So, surprise, surprise, when it comes to graphics and XML, vector

formats are much more in tune with the ethos of separating content from

presentation. However, there are times when a bitmapped approach is the

only one available. The images may represent facsimiles of documents

that have a legal status and therefore must be reproducible *exactly*.

No amount of philosophical twaddle about the benefits of generating

renderings from logical representations of the information will cut it

in a court of law.

We turn now to tables. If I had a penny for every second by cerebral

cortex has pondered the mysteries of tables, I would be writing this

from my hideaway Island in the Pacific via a dedicated 1GB satellite


Tables have a quantum feel to them. The more you look, the more the act

of looking seems to affect their very nature. At first glance, it would

seem possible to dissect most tables into semantic structures but as you

look closer, this possibility typically sails into the sunset leaving

you with a model of rows and columns and alignments and tab stops and

vertical offsets and spans and ... So much for separating the content

from the presentation! With tables, the best you can do most of the time

is intertwine the content with the presentation in reasonably

well-delineated structures.

Mathematics? Ha! A real beauty this one. The physical form of

mathematical equations is probably the most condensed blend of content

and presentation ever invented. My advice? Embed TeX in your XML. Until

such time as producers of mathematics use XML for markup, all attempts

at reproducing the presentation of mathematics from an XML source are

labors of love -- most likely unrequited.

And lastly we come to typography. It never ceases to amaze me how

quickly users of Web browsers have become accustomed to what is,

frankly, a 20 year retrograde step in the presentation of information in

typographic form. I suspect it is quite exasperating for lovers of fine

typography to see what passes for high quality presentation on the Web.

Kerning, ligatures? Um, the web doesn't even do tab stops!

The mismatch between the typographic effects used for paper production

and those used for on-line production can hit hard when faithful

replication of paper typography is required on the Web. Perhaps the

classic case of this is legal material such as Bills and Acts produced

by Governments. To date, these have always existed in paper form before

they exist in electronic form and the paper form is "normative" relative

to the electronic. I.e. if in doubt, consult the paper. It is the paper

sheets that are read and approved by legislators. It is the paper sheets

that are installed by a suitably authorized person to become law. The

paper is king.

In order to replicate the look and feel of legislation, some significant

trickery is required in HTML markup. Take the concept of a section of

legislation for example. Sections have numbers. Section numbers are

rendered to the left of the first paragraph of the section. In Word,

Framemaker, etc. this is achieved by creating a negative first line

indent in the paragraph. How about on the Web? Well, the concept of

negative first line indents is quite new to the Web so the standard way

of making it work across all browsers is to nest tables within tables.

One table as the outer "shell" to hold the section number and the body

of the section. The other table to contain the body of the section,

nested inside the outer table.

It is extraordinarily tricky to make the electronic version exactly the

same as the paper version as a result of this fundamental difference in

approach to paragraph layout. As for footnotes, side-headings,

watermarks, 3-inch high integral signs spanning multiple cells in a

table column... don't get me started on those.

Yes, it makes sense, for all sorts of reasons, to separate content from

presentation. Yes, XML is a great technology for helping you achieve


However, sometimes, the medium is an inextricable part of the message.

The next time someone tries to sell you a line like "just separate the

content from the presentation with XML" be warned -- it is not

necessarily that simple.

ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon