From: www.itworld.com

Separating Content from Presentation: Easier Said Than Done

by Sean McGrath

July 24, 2002 —

 

We say it so often we don't even listen to the words any more. "Separate
content from presentation" with XML. Information should not be
inter-twingled with any one presentation as the presentations may
(indeed will) change over time. New browsers, new devices of all sorts,
oodles of document formats -- all can be targeted from a single source
XML document. So sayeth the fundamental lore of XML.

Although this works fine for the most part, there are some situations
where it is not really possible in practice. There are some cases where
the content and the presentation are so deeply intertwined that
separating one from the other is far from straightforward.

Four cases spring to mind: graphics, tables, mathematics, and advanced
typography.

Lets deal with the graphics problem first. Graphics come in two main
flavors -- vector and bitmap. Vector graphics are quite in keeping with
the ethos of separating content from presentation espoused by XML.
Vectored formats concentrate on capturing the semantics of an image --
in terms of fundamental units such as lines, circles, fills and so on.
The objects are laid out on a virtual space. At the point on rendering,
a mapping between the virtual space and the physical space is made to
achieve the best possible rendering given the limitations of the
rendering device e.g. area in pixels, color depth and so on.

Bitmapped graphics on the other hand, are intimately tied to a
particular rendering in terms of pixel area and color depth. Bad things
typically happen if you try and resize bitmapped images as the pixels in
the image do not encode any semantics about what the image represents.
In short, they cannot be repurposed to different shapes, sizes or color
schemes without significant loss in quality.

So, surprise, surprise, when it comes to graphics and XML, vector
formats are much more in tune with the ethos of separating content from
presentation. However, there are times when a bitmapped approach is the
only one available. The images may represent facsimiles of documents
that have a legal status and therefore must be reproducible *exactly*.
No amount of philosophical twaddle about the benefits of generating
renderings from logical representations of the information will cut it
in a court of law.

We turn now to tables. If I had a penny for every second by cerebral
cortex has pondered the mysteries of tables, I would be writing this
from my hideaway Island in the Pacific via a dedicated 1GB satellite
link.

Tables have a quantum feel to them. The more you look, the more the act
of looking seems to affect their very nature. At first glance, it would
seem possible to dissect most tables into semantic structures but as you
look closer, this possibility typically sails into the sunset leaving
you with a model of rows and columns and alignments and tab stops and
vertical offsets and spans and ... So much for separating the content
from the presentation! With tables, the best you can do most of the time
is intertwine the content with the presentation in reasonably
well-delineated structures.

Mathematics? Ha! A real beauty this one. The physical form of
mathematical equations is probably the most condensed blend of content
and presentation ever invented. My advice? Embed TeX in your XML. Until
such time as producers of mathematics use XML for markup, all attempts
at reproducing the presentation of mathematics from an XML source are
labors of love -- most likely unrequited.

And lastly we come to typography. It never ceases to amaze me how
quickly users of Web browsers have become accustomed to what is,
frankly, a 20 year retrograde step in the presentation of information in
typographic form. I suspect it is quite exasperating for lovers of fine
typography to see what passes for high quality presentation on the Web.
Kerning, ligatures? Um, the web doesn't even do tab stops!

The mismatch between the typographic effects used for paper production
and those used for on-line production can hit hard when faithful
replication of paper typography is required on the Web. Perhaps the
classic case of this is legal material such as Bills and Acts produced
by Governments. To date, these have always existed in paper form before
they exist in electronic form and the paper form is "normative" relative
to the electronic. I.e. if in doubt, consult the paper. It is the paper
sheets that are read and approved by legislators. It is the paper sheets
that are installed by a suitably authorized person to become law. The
paper is king.

In order to replicate the look and feel of legislation, some significant
trickery is required in HTML markup. Take the concept of a section of
legislation for example. Sections have numbers. Section numbers are
rendered to the left of the first paragraph of the section. In Word,
Framemaker, etc. this is achieved by creating a negative first line
indent in the paragraph. How about on the Web? Well, the concept of
negative first line indents is quite new to the Web so the standard way
of making it work across all browsers is to nest tables within tables.
One table as the outer "shell" to hold the section number and the body
of the section. The other table to contain the body of the section,
nested inside the outer table.

It is extraordinarily tricky to make the electronic version exactly the
same as the paper version as a result of this fundamental difference in
approach to paragraph layout. As for footnotes, side-headings,
watermarks, 3-inch high integral signs spanning multiple cells in a
table column... don't get me started on those.

Yes, it makes sense, for all sorts of reasons, to separate content from
presentation. Yes, XML is a great technology for helping you achieve
that.

However, sometimes, the medium is an inextricable part of the message.
The next time someone tries to sell you a line like "just separate the
content from the presentation with XML" be warned -- it is not
necessarily that simple.