Wave of Big Data could swamp corporate IT

Even aside from legal issues, the mass of Big Data may kill you

If cloud computing is becoming a commodity -- It is. That was the whole point. -- what will be the next big thing to hit IT?

Big Data. It's driving mergers, acquisitions, investments on Wall Street. IBM just launched a Boot Camp-style training class in what Big Data is and what to do with it.

And it's so difficult to manage (requiring expensive IT products and services) it's unlikely to be commoditized any time soon.

Big Data, which came out of academics and high-performance computing as way to describe mondo data sets -- originally things like the multi-terabyte piles of stuff Ph.D.s used to analyze weather patterns, sky maps and the possibility that the next time they fired two invisible particles in opposite directions around a giant loop that the universe would end when they met.

For corporate IT it's not nearly as exciting. Mostly what people mean are giant piles of largely irrelevant data people who often claim to be concerned about their privacy force on the universe by posting it on social networks with no limits on who can see it.

Some of that is useful from a marketing point of view. Some of the useful Big Data comes from engineering, mining/geological or logistical databases that track vast numbers of events or data instances and match patterns among them.

Much of the rest, the parts growing most quickly and most wastefully, is marketing-related information scraped, bought or sifted from social networking sites, discussion forums and often-false, usually faulty customer-comment and customer-information forms sent in by people who might or might not be customers, but would like their questions answered, please.

IBM's Big Data business manager Rod Smith told InfoWorld it was another big step up in volume to "help people sift through data where maybe 90 percent of it is not very useful."

Data mining, visualization, parsing, detailed queries, pattern matching and more arcane ways to sift wheat from chaff, then combining what's left with higher-quality, de-duped, verified, qualified and quantified customer data and parsing out where the profiles match can add a lot of value by giving the marketing department another angle from which to manipulate customers.

Much of the data will come in on purpose by people looking for that kind of insight. Much will come in because your end users want to carry iPads with very little data storage capacity and have instant access to terabytes of data and apps through a wireless network to the SANs in your basement.

That might make you feel more secure by taking data already in the company -- stored on PC and laptop hard drives -- onto the SAN where you can manage and protect it.

It will, guaranteed, cost IT a lot of money partly to pay for all the storage, partly for the processing power to crunch it and the development time to fix the reliable old applications you will completely shatter by trying to force huge logs of data through slots designed for pencils.

That's according to the computer-science journal article with my favorite title ever: The Pathologies of Big Data: Scale up your data sets enough and all your apps will come undone.

Business units wanting to use Big Data tend not to want to pay for all the adaptations required to use it without breaking everything else, though.

You already support terabytes of data, the cloud has plenty of power, so except for more disk space, what could you possibly need to support this tiny little initiative?

Rather than do any real planning or restructuring, it's likely they'll just push you to sign up for more cloud capacity -- an operational cost rather than a capital one, so it can suck wind out of your budget this year rather than next -- and tell you it's more important to worry about issues like who owns the data and how safe they are manipulating it.

There are issues there.

There are also issues in how the manipulation needs to be done and why manipulating an exabyte is different from manipulating a terabyte.

That's why IBM -- which is not short of Big Dat-esqe hardware or software -- bought Netezza and its high-performance data-warehousing apps for $1.7 billion last September. It's also why Teradata bought data warehousing startup Aster Data last week.

It's why the oddly named, little-known high-capacity, multiple-format, open-source data-management tool Hadoop is suddenly hot.

True, newish Big Data does sound a lot like data mining, or database management did a few years ago, in the same way Cloud Computing sounds like outsourcing and SAAS sounds like dumb-terminal shared-services computing.

But Big Data is new. Completely new, and revolutionary, if for no other reason than that the pure volume of data is like nothing you've ever (been dumb enough) to deal with before.

Former Soviet Premier Nikita Khrushchev once defended his decision to use large numbers of planes to counter the higher quality and sophistication of those NATO arrayed against him by saying

Josef Stalin defended the decision to send huge waves of shoddy tanks against the top-quality versions the Nazis were using to invade the Soviet Union by saying "quantity has a quality all its own."

Very true. That quality, it turned out, had a lot to do with costing more than you can afford without doing the job you need done, getting itself and almost you destroyed in the process.

But that's a lesson from history, not from IT. So just forget about it.

Kevin Fogarty writes about enterprise IT for ITworld. Follow him on Twitter @KevinFogarty.

ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon