The only way to stay sane in this business is to be selective about
what three-and four-letter acronyms you chose to develop expertise in.
Over the past decade, the four-letter acronym SGML and the three-letter
acronym XML have been high on my acronym shortlist. Guess I got lucky.
Mind you, I have backed some duds in my time as well. I was into JSP
back when it stood for Jackson Structured Programming, and I was
convinced Awk would take over the scripting world. As for the Z80
processor...lets not go there, too many painful memories...
Anyway, in recent years I monitored the movements of the UML acronym
purely through my peripheral vision. I guess I was hoping against hope
that it would disappear in a poof of modeling logic and spare me having
to shortlist it for my 20/20 attention. Recently, I concluding UML
wasn't going to go away and I started studying it in earnest. The
crossing of the Rubicon was triggered by the centrality of UML modeling
in the ebXML initiative.
UML stands for Unified Modeling Language and it is the clear leader in
an age-old field: object oriented application modeling. UML, like so
many three letter acronyms, is actually a basket of different ideas
joined together. Seven basic model (diagram) abstractions range
from "class" to "state transition diagram". Using the standardized
diagramming notation UML provides, analysts, programmers, and users can
share a common conceptual understanding of a system through diagrams
and narrative text.
XML is an increasingly important part of many applications. XML people
argue that XML provides its own modeling paradigm -- originally DTDs
and, more recently, W3C XML Schema. UML people argue that DTDs and W3C
Schema are just notations and that organizations can and should keep
their data models independent of any one syntax by using UML diagrams
as the normative reference. From the UML diagrams, the story goes, you
can *generate* DTDs, W3C Schemas, Relax NG Schemas, whatever you want.
Lets pause for breath and think about what is going on here. Syntax
(XML) versus semantics (model diagrams). Generating one from the other,
keeping your options open by becoming more abstract in your models.
There is a word for this phenomenon, actually an acronym. A four-letter
acronym: MMTT -- More Meta Than Thou.
The MMTT aspect of this worries me. Reason being we have heard it all
before. Just one more level and all will be revealed: ISO Seven Layered
Model, Ada, Z Notation, Architectural Forms. Will UML go the same way?
Another worrying thing is the mismatch between the things UML is happy
modeling and the things XML is happy modeling. In Object Oriented
modeling, we think of classes having attributes associated with them.
Typically, we just say something like "A Person has a Name and Age and
a Phone Number". This can be directly translated into the syntax of
umpteen object oriented programming languages.
With an XML hat on, the process of modeling is subtly, but
significantly, different. We say something like "A Person has a Name,
followed by an Age, followed by a Number". In other words, the XML
modeling presupposes that we are concerned about modeling the order
things occur in, as well as the things than can occur.
The order used here is SEQUENCE order but XML has numerous more complex
orders you can specify. The fact is, these ordering concepts do not fit
well into the UML paradigm and arguably, owe more to document-oriented
applications of XML than data-oriented applications. I have seem a
number of attempts (ironically using the UML term "tag"!) to get around
this problem but the results are not very satisfactory. I suspect that
this mismatch will further strengthen the growing schism between the
data-oriented and document-oriented XML worlds. It may well be that UML
friendly XML models will dictate the boundaries of XML usage in object
oriented application design.
The final worrying thing is the uncomfortable (to me) relationship
between what a UML model actually says and how it says it. UML is
unashamedly visual in its approach. It is the visualization of system
models that give UML its power. But here is the rub -- when you save a
UML model in its interchange format (known as XMI -- an XML based
format) you loose a lot of the visualization. The visualization is a
part of the tool that did the rendering -- not part of the model
itself. If your UML models can only properly be "understood" by your
UML tool -- even reading the XML interchange notation for your model --
who owns the model? You or the tool that created it?
The beauty of DTDs, and more recently Relax NG, as modeling paradigms
is that the notation *is* the visualization. Tools to draw pretty
pictures can be used but at the end of the day, they are just cognitive
lubricant. The notation itself -- in pure text form -- is the normative
reference as to what the model really means. Not so I fear with UML and
certainly no so of the W3C XML Schemas I have seen generated from UML
tools.
I am not against pretty pictures and GUI tools. I am however, against
any notion that any GUI tool own my data models. I fear that UML based
approached to XML modeling may be leading us in that direction. I am
especially concerned that the rush to UML is the result of
understandable fear of the complexity of W3C XML Schema. RelaxNG and
Schematron both show that powerful models can be created using non-
scary notations. Notations that can be supplemented by GUI tools but
that keep the notation -- not the visualization -- as the normative
reference of what it all means.