Last May, consulting firm McKinsey and Company issued a report that anticipated how organizations would be deluged with data in the years to come. They also predicted that a number of industries -- including health care, public sector, retail, and manufacturing -- would benefit by analyzing their rapidly growing mounds of data.
Collecting and analyzing transactional data will give organizations more insight into their customers' preferences. It can be used to better inform the creation of products and services, and allow organizations to remedy emerging problems more quickly.
"The use of big data will become a key basis of competition and growth for individual firms," the report concluded. "The use of big data will underpin new waves of productivity growth and consumer surplus."
Of course, Teradata, IBM and Oracle, among many others, have been offering terabyte scale data warehouses for more than a decade. These days, however, data tends to be collected and stored in a wider variety of formats and can be processed in parallel across multiple servers, which would be a necessity given the amounts of information being analyzed. In addition to exhaustively maintained transactional data from databases and carefully culled data residing in data warehouses, organizations also are reaping untold amounts of log data from servers and forms of machine generated data, customer comments from internal and external social networks and other sources of loose, unstructured data.
"Traditional data systems simply don't handle big data very well, either because they can't handle the variety of data -- today's data is much less structured because it evolves very quickly, and because [such systems] just cannot scale at the rate it which they must ingest data," said Eric Baldeschwieler, chief technology officer of Hortonworks, a Yahoo spinoff company that offers a Hadoop distribution.
Such data is growing at an exponential rate, thanks to Moore's Law, pointed out Curt Monash, of Monash Research. Moore's Law states that the number of transistors that can be placed on a processor wafer doubles approximately every 18 months. Each new generation of processors is twice as powerful as its most recent predecessor. And, not surprisingly, the power of new servers also doubles every 18 months, which means their activities will generate correspondingly larger datasets as well.
The big data approach represents a major shift in how data is handled, said Jack Norris, vice president of marketing for MapR. Before, carefully culled data was piped through the network to a data warehouse, where it could be further examined. With increasing amounts of data, however, "the network becomes the bottleneck," he said. Distributed systems such as Hadoop allow the analysis to occur where the data resides.