One of the most talked about Big Data technologies is Hadoop, an open-source distributed data processing platform originally created for tasks such as compiling web search indexes. It's one of several so-called "NoSQL" technologies (others include CouchDB and MongoDB) that have emerged to organize web-scale data in novel ways.
Hadoop is capable of processing petabytes of data by assigning subsets of that data to hundreds or thousands of servers, each of which reports back its results to be collated by a master job scheduler. Hadoop can either be used to prepare data for analysis or as an analytic tool in its own right. Organizations that don't have thousands of spare servers to play with can also purchase on-demand access to Hadoop instances from cloud vendors such as Amazon.
Nustad says HMS is exploring the use of NoSQL technologies, although not for its massive Medicare and Medicaid claims databases. These contain structured data and can be handled with traditional data warehousing techniques, and it makes little sense to depart from traditional relational database management when tackling problems for which relational technology is the tried and true solution, she says. However, Nustad can see Hadoop playing a role in fraud and waste analytics, perhaps analyzing records of patient visits that might be reported in a variety of formats.
Among the CIOs interviewed for this story, those who had practical experience with Hadoop, including Rotella and Shopzilla CIO Jody Mulkey, are at companies that provide data services as part of their business.
"We're using Hadoop for what we used to use the data warehouse for," Mulkey says, and, more importantly, to pursue "really interesting analytics that we could never do before." For example, as a comparison shopping site, Shopzilla accumulates terabytes of data every day. "Before, we would have to sample data and partition data-it was so much work just to deal with the volume of data," he says. With Hadoop, Shopzilla is able to analyze the raw data and skip the in-between steps.
Good Samaritan Hospital, a community hospital in Southwest Indiana, is at the other end of the spectrum. "We don't have what I would classify as Big Data," says CIO Chuck Christian. Nevertheless, regulatory requirements are causing him to store whole new categories of data such as electronic medical records in great quantities. Doubtless there is great potential to glean healthcare quality information from the data, he says, but that will probably happen through regional or national healthcare associations rather than his individual hospital. It's unlikely he'll invest in exotic new technologies himself.