Storage Tip: The long tail of data storage
![]()
What seems to be the problem? The book The Long Tail by Chris Anderson examines the economics of abundance versus scarcity. It does this by showing that not only are the high-volume "hits" end of a traditional demand curve important, but so are the "misses" (or niches) of the seemingly-endless long tail of that demand curve important. We are conditioned to a world where "hits" are dominant and "misses" or "niches" are rendered unprofitable, impotent or non-existent. The bestseller list for books, the top forty for songs, and the top movies/videos lists are example of "hits." Yet Amazon, Netflix, and many others have done very well by selling to the long tail of the curve where now it is profitable to sell even low volume products -- what retail stores would not stock as unprofitable "niches" or "misses.".
That is all very nice you say, but how does that apply to storage? If your organization is typical, you are continuing to accumulate data in unprecedented quantities. From a storage perspective you must worry about how to house all this data, but from a business perspective you want to make sense out of this data so that greater business value can be extracted from it. You are now trying to manage the economics of abundance of both data and storage rather than the scarcity of data and storage. That is not easy since while the cost (and therefore relative scarcity) of storage is now relatively low, the cost for managing it is relatively high. After all, would it be worth your while to spend an hour saving 10G-bytes of storage?
Although the fit may not be a perfect one, we can apply the principles of the Long Tail to storage. Hopefully, that will be one way to help you better understand how the world of data is changing. That can lead to a better understanding of what you must do.
What do you need to know? We have become so mesmerized by the data "hits" such as current open orders, the unread e-mail in the inbox, work on a new word processing document or presentation, or a collaborative project that we ignore the data "niches" of fixed content data that actually makes up the bulk of the useful data in the enterprise. Niches in this case simply mean data that is less frequently accessed or used for a different purpose than the original one for which it was created or acquired. For example, an open order is part of the revenue-producing order fulfillment process so it is a "hit." A closed order "niche" does not represent additional revenue, but it may represent a product that is under warranty to be supported by a support organization, a record that has to be kept for financial reconciliation purposes, and a data "atom" to be sent to a data warehousing or business intelligence application for supply chain planning or customer up-selling or cross-selling.
The first age of computer-based information was the age of transactional data. That information tended to be in the form of structured data managed by database management systems in a relational database. Examples include order fulfillment systems, such as reservation systems, and enterprise resource management (ERM) systems. The second age has been the proliferation of semi-structured and unstructured data. That includes searchable semi-structured data, such as seemingly ubiquitous e-mail, personal productivity tools, such as word processing, spreadsheets, and presentations, and collaboration tools, such as Lotus Notes and Domino. That also includes unstructured data that can be sensed (such as seeing and hearing), including video, audio, and medical images. Both eras have focused on the "hits," which is the active changeable information.
The third era will not be about applications that create the data, but rather about how organizations will better manage and harvest value from an important competitive differentiating asset -- the unique information, such as customer history, that only they have. Remember that although this fixed content data is "niche" data, the data represents the bulk of the data in any enterprise.
And there are a number of challenges from a storage viewpoint. Growth, accumulation, freezing, and ageing are four basic principles that apply to storage today. Growth means that the volume that comes through the figurative door each year continues to grow. Accumulation means that more data is acquired than destroyed. Freezing means that data changes character sometimes in its lifecycle and becomes fixed content that is read-only -- writes have stopped for whatever reason. Ageing means that the patterns of usage of data change as it ages.
Note that the prior age is not replaced nor supplanted by the new age. Each age requires a new way of thinking about how to manage the processes that are associated with the data. In the new era you will have to think about the Long Tail of storage -- the fixed content.
What are your choices? You can't live with the data, but you can't live without it.
If you are a storage administrator, you might feel that storage is like grain silos. Torrents of storage grain are coming in every day and all that you can do is buy more silos. That is a reactive approach. You might try a more proactive approach and work both within the rest of the IT department (such as with database administrators) and with business units. You can think about how to use data classification as part of information lifecycle management. You can think about how to create tiered storage and how to create active archives to better take advantage of the data "niches" of the Long Tail of storage. Getting everyone to think more about how to use fixed content data will not be easy, but you are likely to not only make your job more interesting and productive, but also you are likely to let management know that you are an up and comer.
Thinking about how to approach the abundance of data caused by fixed content data is not easy. There is much more to be done. But thinking about it as the Long Tail of storage is a start.