April 26, 2012, 8:00 AM —
Source: REUTERS/Denis Balibouse
Physicists studying the results of tests at the Large Hadron Collider (LHC) at CERN, the Swiss nuclear research laboratory outside Geneva, have a lot of data to ponder. In fact, they have nearly 50% more than they had originally estimated. Initially, the LHC was expected to generate 15 petabytes of usable data each year. Recent reports have raised that number to more than 22 petabytes annually.
However, CERN says that its tests produce vastly greater amounts of data than gets studied. Some experiments can create up to one petabyte of data per second. Lucky for the data storage managers, not to mention the tired-eye physicists, on average all but 200 Mbytes per second of that petabyte are deemed "uninteresting data" and discarded by the system.
It's Big Data quantities like we see at the LHC that helped prompt the U.S. government late last month to announce $200 million dollars in research and development funds specifically for scientists confronting the data deluge. According to a statement from the White House, the funds are necessary to develop the technologies capable of "managing, analyzing, visualizing, and extracting useful information from large and diverse data sets." And it's hoped that this investment in Big Data management "will accelerate scientific discovery and lead to new fields of inquiry that would otherwise not be possible."