Managing terabytes of data

By Barry Nance, Computerworld |  Development Add a new comment

How much data do large corporations manage? Tons of it. Referring to "tons" of data may be intuitive for paper records, but it's an unusual way to describe computer-stored information, which is usually measured by character counts and file sizes. Still, using ton may give an added sense of how much data a terabyte is. To be sure, measuring data by the ton isn't definitive because a disk drive's weight doesn't vary significantly over a wide range of storage capacities, but it's a handy starting point. A common 8GB hard drive weighs a little more than 1 lb. Figure that the weight of a shared enclosure, power supply and electronics will roughly double the drive's weight, and we can say that 8TB of data is approximately equivalent to 1 ton. That much storage is cumbersome and ungainly.

How does an enterprise deal gracefully and effectively with such unwieldy mountains of information? We asked four data-intensive companies -- Aetna Inc., The Boeing Co., Atos Origin and AT&T Corp. -- to tell us about the problems they faced in managing massive data stores, and how they solved them. For each company, the data is a significant corporate asset resulting from huge investments of time and effort. The data is also the source of many trials and tribulations for the employees who keep vigilant watch over it.

While these companies say that good tools are important for managing terabytes of information, their IT and database administrators also agree that having a clear and comprehensive perspective on the data, via both logical and physical views, is even more critical. Security, data integrity and data availability aren't trivial concerns, they point out, and giving users easy access to the data is a never-ending job.

Insuring a Healthy 21.8 Tons

On a daily basis, Renee Zaugg, operations manager in the operational services central support area at Aetna, is responsible for 21.8 tons of data (174.6TB). She says 119.2TB reside on mainframe-connected disk drives, while the remaining 55.4TB sit on disks attached to midrange computers running IBM's AIX or Sun Microsystems Inc.'s Solaris. Almost all of this data is located in the company's headquarters in Hartford, Conn. Most of the information is in relational databases, handled by IBM's DB2 Universal Database (Versions 6 and 7 for OS/390), DB2 for AIX, Oracle8 on Solaris and Sybase Inc.'s Adaptive Server 12 on Solaris. To make matters even more interesting, Zaugg adds, outside customers have access to about 20TB of the information. Four interconnected data centers containing 14 mainframes and more than 1,000 midrange servers process the data. It takes more than 4,100 direct-access storage devices to hold Aetna's key databases.

Tips for Managing Large Data Stores

Be selective in how you implement HSM. Instead of blindly giving all your data to a robotic HSM process, analyze and classify your company's data usage to know how often the data is reused and thus when HSM might be appropriate.

The logical view of your data is just as important as the physical view. Knowing which data elements are duplicated in your database and why tells you not only the degree of normalization but also what fraction of the database is involved in purely redundant I/O.

Perform data backup/restore fire drills periodically and religiously to make sure you don't lose lots of data to human error or natural disaster.

Recognize that you may have to develop your own transaction-aware backup software -- especially if you have a growing database and your relational database engine doesn't support hot backups. It's not funny when you run out of time for making off-line backup copies.

Carefully segregate externally visible data from your internal data, for security purposes. An ounce of prevention is worth a ton of cure.

Most of Aetna's ever-growing mountain of data is health care information. The insurance company maintains records for both health maintenance organization participants and customers covered by insurance policies. Aetna has detailed records of providers, such as doctors, hospitals, dentists and pharmacies, and it keeps track of all the claims it has processed. Some of Aetna's larger customers send tapes containing insured employee data, but Nancy Tillberg, head of strategic planning, says the firm is moving toward using the Internet to collect such data.

"Data integrity, backup, security and availability are our biggest concerns," Zaugg says. Her data handling tools, procedures and operations schedules have to stay ahead of not only the normal growth that results from the activities of the sales, underwriting and claims departments but also growth from corporate acquisitions and mergers.

    Add a comment

    Post a comment using one of these accounts
    Or join now
    At least 6 characters

    Note: Comment will appear soon after you have activated your account.
    Obscene/spam comments will be removed and accounts suspended.
    The information you submit is subject to our Privacy Policy and Terms of Service.

    ITworld LIVE

    DevelopmentWhite Papers & Webcasts

    White Paper

    HP NonStop SQL Fundamentals whitepaper

    This whitepaper offers a detailed look into the fundamentals of HP NonStop SQL solutions. See how this system delivers unprecedented levels of application availability with fail-safe data integrity and meets the needs of enterprises with large-scale business critical applications.

    White Paper

    Nebraska Medical Center case study

    See how the Nebraska Medical Center implemented a SQL solution to make information more readily available to streamline operations, improve patient care and facilitate medical research with an enterprise solution running on HP NonStop servers.

    White Paper

    Concepts of NonStop SQL/MX

    For DBAs and developers who are familiar with Oracle solutions and want to learn about NonStop SQL/MX, this whitepaper provides an overview of the similarities and differences between the two products-with a specific focus on implementation.

    White Paper

    6 Things Your CIO Needs to Know About Requirements

    If your organization is not predictably successful on technology projects, there is likely an issue in requirements. CIOs must take action and own requirements maturity improvement. There are 6 main things a CIO must know about requirements.

    Webcast On Demand

    User Experience Monitoring

    In this webinar, you will learn hints & tips for improving end-user response times from Forrester Research analyst, Jean-Pierre Garbani.

    Sponsor: Nimsoft

    See more White Papers | Webcasts

    Ask a question

    Ask a Question