topics that matter; ideas worth sharing

share a tip, submit a link, add something new

Books/chapters and directories/files - dichotomies considered harmful

May 9, 2005, 01:00 PM —  ITworld.com, Ebusiness in the Enterprise — 

The distinction between a full book and a mere chapter of a book, is a source of endless fascination for incurable information modellers like me.

Obviously, at the logical level, the distinction is driven by the content itself. A book is a complete unit of stuff. A chapter, is a sub-division within the complete book. At the physical level, however, technology starts to influence the book/chapter distinction. A chapter boundary, for Microsoft Word users or Open Office users, is likely to be influenced by how big the underlying file gets. Large files take longer to load and get increasingly slower to work with in typical word processing environments. Our decisions about where to draw the chapter boundaries are influenced to some extent by technology limitations.

If the physical constraints are not allowed to dictate the boundaries for chapters, then we can end up resorting to file naming conventions to split the content into manageable chunks e.g. chapter1_a, chapter1_b and so on. We might then decide to keep things clean by introducing a subdirectory for each chapter, putting the sub-chapters tidily away in their own little compartments.

All is well with the world. Or is it? This is where things get interesting from an information management perspective. A full unit of work - a book - has now been split into bits that are navigable through a directory structure and bits that are navigable through an application. The result? You can use off-the-shelf tools to navigate your way through the directories. You can see the overall structure of the book by simply looking at the directory structure as a hierarchy. You can see that chapter 1 has a number of sub-chapters. However, that is as far as you can go. To dig any further into the structure of chapter 1, section A, you need to launch the editing application.

What a pity.

Why is it, that we have this hard and fast dichotomy between directory structure and file structure? Why is it that file system exploring utilities need to stop in their tracks when they hit things called 'files'?

As you have probably noticed, this artificial split can be breached in certain circumstances, at least to some extent. Graphics file formats are a good example. Many file system exploring tools know about, say, JPEG files and can display thumbnails of their contents.

That is a start in the right direction but I think it needs to go a lot further if the artificial directory/file distinction is to be eradicated.

Let us go back to the book example. Let us use Microsoft's OLE technology as an analogy. With OLE you can embed one thing in another. So for example, you can embed an Excel spreadsheet into a Word document file. Now, in your head, take that further. Imagine a world in which the file system explorer is the top level application. It manages a single, humungous file on the disk into which you embed documents, spreadsheets, databases etc. Each think you embed into the explorer can itself embed other things to any depth required.

In such a world, directories/files have merged into one abstraction. The book author does not have to introduce artificial segmentation of the book into separate entities. In such a world, filenames become something of an oddity. What do you need filenames for? You would only really need a filename at the point where you decided to exchange information between systems A and B.

Moreover, once the package of data is pasted into System B's file system explorer at some suitable point, the filename would be thrown away.

Sounds interesting wouldn't you say? So why don't we have systems that work like that? There are, as ever, many reasons. One reason which was an issue some years ago, is ceasing to be an issue very quickly now. Obviously, in order to show the structure of a "file" a file system explorer needs to look inside the file format. If the file format is proprietary, then we can do nothing.

Enter XML-based file formats like the OASIS Open Document Format[1]. The day is coming when file system explorers will be able to do for office documents, what they currently do for JPEGs. That is a start in the right direction. Eventually, I hope we will see the directory/file distinction begin to melt away.

Technologies/applications that never quite made it to the mainstream such as OpenDoc[2] and FrameMaker[3] with its powerful Book/Chapter model, may yet have a second coming.


[1] http://www.oasis-open.org/committees/office/charter.php


[2] http://www.webopedia.com/TERM/O/OpenDoc.html


[3] http://www.adobe.com/products/framemaker/main.html



ITworld.com, Ebusiness in the Enterprise

I like it!
Post a comment
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
Resources
White Paper

Symantec Backup Exec 12 and Backup Exec System Recovery 8 deliver industry leading Windows data protection and system recovery. Download this whitepaper to find out the top reasons to upgrade and how to get continuous data protection and complete system recovery.

Webcast

Data and system loss — from a hard drive failure, malicious attack, natural disaster, or simple human error — can happen anytime. Don’t leave your business vulnerable. Make sure you have a secure recovery strategy in place. Symantec's latest backup and system recovery technology can efficiently restore critical applications, individual emails and documents and even restore your entire system in minutes in the event of a loss.

White Paper

Businesses face a growing challenge to ensure that the IT environment is properly protected. Backup Exec 12 integrates with other applications in the Symantec family of products, to complement your current data protection strategy, keep your data securely backed up and make it recoverable when you need it most.

Free stuff
Featured Sponsor

AISO founders envisioned a Web hosting company that was environmentally friendly. While the company employed energy-efficient innovations like solar panels, its infrastructure produced unacceptable power and cooling requirements. Find out how AISO leveraged AMD technology to overcome their challenge in this case study white paper.

In this whitepaper, Scalar explores the opportunity to change the landscape with respect to mission critical databases built around Oracle. Leveraging technologies such as Linux, high-end commodity processing power and Oracle RAC technology to architect, design, build and maintain database infrastructure that delivers maximum availability, reliability and performance at a fraction of traditional cost.

On a typical day, weather.com, the Web site for The Weather Channel in Atlanta, serves up between 15 million and 20 million page views. But in September 2004, when back-to-back hurricanes ransacked Florida, the peak traffic on one day more than tripled: over 70 million page views by more than 7 million unique visitors. Read the full success story now.

More Resources