From: www.itworld.com

Hysteresis, History and empty metadata fields

by Sean McGrath

March 6, 2006 —

 

As a teenager, some well meaning teachers decided to put us boys through some training in the fine arts of metalwork. We had classes two times a week for a couple of years. It is all a blur to me now. All I remember is the heat, the sweat, the smell of oiled metal filings and the noise. I cannot find words to express how much I hated those classes.




Actually, there is one memory that rises above the blur and stays with me to this day. A single, gorgeous sounding word: hysteresis [1]. That I should remember this obscure word rather than anything else from my metalwork classes is telling. You will not be surprised to hear that I do not have a workshop out back filled with hack saws and drill bits.




Hysteresis occurs whenever the effect that accompanies some cause is delayed for some reason. The term is most often associated with processes in the physical world. The movement of interest rates, the growth of insect populations, the rise and fall of magnetic fields, that sort of thing. It is also relevant in studying the strength of soldering joints which is where I came across it in my metalwork classes.




Recently it occurred to me that the concept of hysteresis is equally applicable to the more abstract concept of 'information' or 'knowledge'. The thought was prompted by a project I am involved in, in which content is created and then tagged as to its purpose and contents by a team of authors working with a content management system.




One of the perennial problems with content management systems is that they are generally designed with an existing corpus of information in mind. For this existing corpus, the users/owners tend to have a pretty good mental model of what the content is about, how it should be organized and so on. This is used to drive the design process for the new content management system. This results, almost invariably, with (a) some concept of a "document" and (b) some concept of the metadata to be associated with each document. The engineers then take the metadata information and craft a beautiful document metadata screen which users are invited to fill in when they create new content. One year later, most of the metadata fields for new content are found to be either blank or wrong, consisting of convenient dummy values to get the content management system to stop beeping. Much scratching of heads and nursing of sore wallets ensues.




And now for my theory. There is a hysteresis-based relationship between content and non-trivial metadata about the content. By non-trivial here I mean metadata that tells you what a piece of content is about, how it relates to other content and so on. Trivial metadata are things like author, date created and so on. Look at history, everywhere you look you will find classification systems that tidy everything into categories for us. The pre-Raphaelites, the stone age, the romantic poets, the continental philosophers. What do they have in common? The classification systems we use today to speak about these things came into existence afterwards. To take a flippant example, pre-Raphaelite artists did not have that term written on their business cards.




The 'aboutness' of the content we create and use in our endeavors is only obvious after the fact. This, I think, is the fundamental reason why so many metadata based content management systems have trouble getting good metadata out of content creators. The 'aboutness' of the stuff that was used to design the content management system was obvious because it was created after the content itself. However, for new content, the 'aboutness' has yet to be cooked so to speak.




My advice, if you find yourself in this situation, is to take a completely different tack. Writers write and categorizers categorize. There is an unavoidable delay between the two activities. The writers and the categorizers can be the same people but the activities are very different and cannot be done at the same time. Build this hysteresis into your workflows rather than fight against it. The alternative is blank or dummy metadata fields.





[1] http://www.lassp.cornell.edu/sethna/hysteresis/WhatIsHysteresis.html