Really big data

By Anne Skamarock, Network World |  Storage Add a new comment

If you think you have storage issues, consider this: IBM is working with Seitel, a leading provider of seismic data for the oil and gas industry, to put together a storage-area network with an initial size of one petabyte. Wow, one petabyte of data!

To give you an idea just how big a petabyte is, it equals 1 billion (1,000,000,000) megabytes, or roughly 1 quadrillion (1,000,000,000,000,000) bytes. Another way to look at this is that it would take about 15,000 of the hard drives found in most home PCs these days to store that much data.

Roy Williams of California Institute of Technology maintains a Web page that tries to give examples of some of these large amounts of information:

http://webpages.shepherd.edu/TMCGIL01/one.htm

Taking one of his examples, the entire printed collection of the U.S. Library of Congress contains about 10 terabytes of information. A petabyte is 100 times larger than that!

While most companies are not currently trying to access petabytes of information, there are many industries that are getting very close. With the amounts of information moving to digital format, it won’t be long before all companies of any size will have to wrestle with these problems. So what does it mean to manage and access a petabyte of data?

First of all, let’s look at the physical space it would take to house a petabyte. If we use Seagate’s highest-capacity Fibre Channel Cheetah drives, soon to be available with formatted capacities of 180G bytes per drive, it would take approximately 5,600 of those drives to store a petabyte. If you stacked them all on top of each other, you would have a tower about 467 feet high. Or it you stacked them 10 high, you would need about 1,120 square feet of floor space. That is just to house the disk drives. Most storage systems come with power and cooling as well.

The next question is: How much time would it take to move a petabyte of data on a SAN? I have decided to take a leap of faith and say, for simplicity’s sake, we can get transfer rates on and off the disk and through a Fibre Channel SAN at 100 megabytes/second. I will note, however, that most applications running on Unix or Windows NT systems today cannot drive data that fast, so your results certainly will differ. To move a petabyte of data at a rate of 100M byte/sec will take nearly 116 days of constant, uninterrupted streaming.

Finally, I thought about how much it would cost to buy a petabyte of simple capacity (not fancy RAID or management software). If we say we can get capacity at $0.04 a megabyte, it would cost $40 million for one petabyte. Most "managed" storage costs between $0.25 and $0.35. At $0.30 per megabyte, that petabyte of storage would cost $300 million -- again, just for the storage -- in, perhaps, a RAID configuration.

As disk capacities continue to increase and the price of storage continues to fall, the cost of purchasing petabytes of data is not out of the realm of possibility. As this simple analysis shows, performance of the storage infrastructure will quickly become the bottleneck in managing and accessing those large amounts of data. While there are some who may read this and be amazed at the amounts of data being stored and accessed today and in the neaar future, others will chuckle and say, "so tell me something I didn’t already know!" For those of you who work with large data sets on a daily basis, the good news is that some of the problems you have been struggling with for years are becoming mainstream.

    Add a comment

    Post a comment using one of these accounts
    Or join now
    At least 6 characters

    Note: Comment will appear soon after you have activated your account.
    Obscene/spam comments will be removed and accounts suspended.
    The information you submit is subject to our Privacy Policy and Terms of Service.

    ITworld LIVE

    StorageWhite Papers & Webcasts

    White Paper

    AppAssure vs Acronis

    In this study of data protection for environments with virtual and physical servers running Windows, openBench Labs tested AppAssure Backup and Replication software v 4.7 and Acronis Backup & Recovery 11. Both solutions utilize block-based technology to unify data protection operations.

    White Paper

    Guaranteeing 100% Backup Recovery

    The single biggest challenge for IT personnel involved in the data protection process is making sure that their backups are recoverable every time. Management and users won't remember the ninety-nine successful recoveries but they will always remember the one failure.

    White Paper

    ESG Analyst White Paper - VMware's vSphere Storage Appliance: High Availability for Small IT Operations

    Learn how small and midsized businesses are increasingly adopting virtualisation to deliver consolidation, improve data back up and disaster recovery and increase security with an in-depth new paper from the Enterprise Strategy Group (ESG). Learn directly from your peer's experiences and see why VMware's solutions are perfect for the growing and ambitious business.

    Webcast On Demand

    Understand Your Data: The Future of Backup and Archiving

    Archiving and Backup are the foundation of the next generation of information governance. However, commodity data protection tools and basic archives are only good for storing data. In the changing IT landscape, understanding what you are keeping, when to delete, and delivering insight to the business from your data is the future of these systems. Join us to hear the impact of private and public cloud solutions, "big data" and your choices while market evolves.

    Sponsor: Autonomy

    White Paper

    NetVault: #1 in the 2011 Oracle Backup Solutions Buyer's Guide

    Want to know how NetVault Backup compared against other Oracle backup software solutions - and why it's DCIG's #1 choice? In this 37-page report you'll get unbiased, third-party evaluations of Oracle backup software - and why NetVault Backup sits on the top of the list. Download your copy today.

    See more White Papers | Webcasts

    Ask a question

    Ask a Question