From: www.itworld.com

The web and the word processor

by Sean McGrath

April 13, 2007 —

 

In one form or another, I do words for a living. I spend part of my time writing words (as I am doing right now). I also spend part of my time writing complex sets of words (known as 'computer programs') to manage other complex sets of words (known as 'enterprise content').



On an almost daily basis I must address the question of how best to organize content so that it can be managed effectively. On an almost daily basis I find myself oscillating between two distinct-yet-closely related worlds...



On one hand there is the classic word processor world. These things generally provide cozy, self-contained mechanisms for managing words, tables, graphics up to about, say, 500 pages of stuff. Once the document size goes beyond that, things become more complicated. The good news is that most enterprise content can sensibly be managed in units of 500 pages or less. There seems to be two main reasons for this rule-of-thumb limit. First, if you are going to print something to paper you must worry about how to bind it. The thicker the book, the more complex that gets. Second, word processors tend to work by loading the entire document into memory and therefore the memory size of your computer dictates what constitutes a comfortable document size.



On the other hand, there is the less cozy but utterly compelling world of the web and web-oriented editing tools. Web editing tools have a completely different feel to them. A completely different comfort zone. Web pages get unwieldy well south of 500 pages, there is no self-contained environment for managing images along with words. Typographically speaking, they are rather limited compared to modern word processors...



Yet few would argue that the fundamentals of caring and feeding of content are present in both worlds. Why oh why is there such a dichotomy in the tools used? Why - in this day and age - do we have to 'convert' content to Web format? Why - in this day and age - do we have so many problems producing decent printed pages from web browsers?



In more technical language, why do we have XML-based markup languages like ODF and OOXML for managing words when we also have XML/SGML-based markup languages like XHTML/HTML for managing words? Does it make any sense to have two competing approaches? Are these approaches converging technologically or diverging?



Opinions differ of course. My take is that convergence is inevitable. I also have a suggestion for a key piece of the jigsaw puzzle that is currently missing. If it existed, convergence would - in my opinion - proceed faster than it currently is.



The missing piece is that the Web has no concept of a 'collection of small documents that can be edited/browsed/searched as a unit'. What is a 450 page user manual really? It is a collection of smaller documents that have been hooked together into a hierarchical and sequential ordering to create a full work. How do we know this? Because the table of contents makes the internal boundaries/structure explicit. What do we do when we publish this 450 page manual on the Web? We explode the content into 'chunks' corresponding to the internal boundaries/structure. Why do we use Word Processors to create these things? Because they give us a cozy, self contained world in which to organize content into sequential, hierarchical chunks. We can move stuff around, change levels, insert graphics etc. all in one tidy file called MyMagnumOpus.xyz.



The convenience of having it all in one file cannot be underestimated. In one fell swoop, it takes a horrible problem off the table. Namely, how to name the individual chunks of content. Contrast this with a purely web-oriented environment. Each chunk of stuff has a URL (a fantastically useful thing!) but each URL must be cared for by the author. When content is re-arranged, the URLs all change ... If you have ever tried to write something non-trivial as a set of HTML pages directly using an HTML word processor you know what I'm talking about.



This is the essential difference I think between current Word Processing tools and current Web tools. The Web is about smallish chunks of stuff managed as single units known as 'pages'. Word processors manage many smallish chunks of stuff in cohesive collections known as 'documents'.



It is a race. I do not think that is too strong a term. The word processor tools need to grow better and better features for chopping stuff up for Web publication. Or, more radically, word processors need to be inverted to take cognizance of the fact that these days, web publication tends to come first, with paper publication coming later (if at all).



The web-native tools need to grow better and better features for managing, say 150 distinct web pages as parts of a single, cozy, cohesive unit that can be manipulated with the same ease that, say, a 300 page word processor document can be re-arranged, search/replaced, re-formatted, change-barred etc. The web-native tools need to take cognizance of the fact that there are times when printing 300 pages worth of web content and binding it makes sense.



The missing piece, as I have said, is the concept of a 'collection of small documents that can be edited/browsed/searched as a unit'. Currently, this space is occupied by so-called web content management systems. The lack of a simple way to manage collections is what created the need to such management systems in the first place.



Some thing will come from left-field to fill this need. It could sprout from the HHP format used in Microsoft CHM[1], or from ATOM collections[2] or from WebDav collections[3] or from OPML[4] or from JavaHelp HelpSets[5]. There are no shortage of contenders.



From the word-processor side comes two other possibilities: the layout structure used in the Zip file formats of both ODF[6] and OOXML[7]...




It will be an interesting race but one that, for my own selfish reasons, I hope results in a winner sooner rather than later.




[1] http://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help


[2] http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-03.html


[3] http://www.ics.uci.edu/~ejw/authoring/props/draft-hopmann-collection-props-00.txt


[4] http://www.opml.org/


[5] http://java.sun.com/products/javahelp/


[6] http://www.sutor.com/newsite/blog-open/?p=995


[7] http://en.wikipedia.org/wiki/Office_Open_XML