November 07, 2006, 8:47 AM —
![]()
What seems to be the problem? The book The Long Tail by Chris Anderson examines the economics of abundance versus scarcity. It does this by showing that not only are the high-volume "hits" end of a traditional demand curve important, but so are the "misses" (or niches) of the seemingly-endless long tail of that demand curve important. We are conditioned to a world where "hits" are dominant and "misses" or "niches" are rendered unprofitable, impotent or non-existent. The bestseller list for books, the top forty for songs, and the top movies/videos lists are example of "hits." Yet Amazon, Netflix, and many others have done very well by selling to the long tail of the curve where now it is profitable to sell even low volume products -- what retail stores would not stock as unprofitable "niches" or "misses.".
That is all very nice you say, but how does that apply to storage? If your organization is typical, you are continuing to accumulate data in unprecedented quantities. From a storage perspective you must worry about how to house all this data, but from a business perspective you want to make sense out of this data so that greater business value can be extracted from it. You are now trying to manage the economics of abundance of both data and storage rather than the scarcity of data and storage. That is not easy since while the cost (and therefore relative scarcity) of storage is now relatively low, the cost for managing it is relatively high. After all, would it be worth your while to spend an hour saving 10G-bytes of storage?
Although the fit may not be a perfect one, we can apply the principles of the Long Tail to storage. Hopefully, that will be one way to help you better understand how the world of data is changing. That can lead to a better understanding of what you must do.
What do you need to know? We have become so mesmerized by the data "hits" such as current open orders, the unread e-mail in the inbox, work on a new word processing document or presentation, or a collaborative project that we ignore the data "niches" of fixed content data that actually makes up the bulk of the useful data in the enterprise. Niches in this case simply mean data that is less frequently accessed or used for a different purpose than the original one for which it was created or acquired. For example, an open order is part of the revenue-producing order fulfillment process so it is a "hit." A closed order "niche" does not represent additional revenue, but it may represent a product that is under warranty to be supported by a support organization, a record that has to be kept for financial reconciliation purposes, and a data "atom" to be sent to a data warehousing or business intelligence application for supply chain planning or customer up-selling or cross-selling.
The first age of computer-based information was the age of transactional data. That information tended to be in the form of structured data managed by database management systems in a relational database. Examples include order fulfillment systems, such as reservation systems, and enterprise resource management (ERM) systems. The second age has been the proliferation of semi-structured and unstructured data. That includes searchable semi-structured data, such as seemingly ubiquitous e-mail, personal productivity tools, such as word processing, spreadsheets, and presentations, and collaboration tools, such as Lotus Notes and Domino. That also includes unstructured data that can be sensed (such as seeing and hearing), including video, audio, and medical images. Both eras have focused on the "hits," which is the active changeable information.
The third era will not be about applications that create the data, but rather about how organizations will better manage and harvest value from an important competitive differentiating asset -- the unique information, such as customer history, that only they have. Remember that although this fixed content data is "niche" data, the data represents the bulk of the data in any enterprise.
And there are a number of challenges from a storage viewpoint. Growth, accumulation, freezing, and ageing are four basic principles that apply to storage today. Growth means that the volume that comes through the figurative door each year continues to grow. Accumulation means that more data is acquired than destroyed. Freezing means that data changes character sometimes in its lifecycle and becomes fixed content that is read-only -- writes have stopped for whatever reason. Ageing means that the patterns of usage of data change as it ages.













