Big-data benefits with small data

Smaller companies are turning to data, too

Even as big data skyrockets to the top of everyone's attention in the enterprise and government, another movement is rising to counter, or even complement, the idea of big data.

Predictably, it's a concept called small data.

There are a lot of critics of the big data phenomenon--from the knee-jerk if-its-popular-it-must-be-hype critics to the relational database vendors who are scrambling to catch up and are throwing all sorts of fear, uncertainty, and doubt (FUD) at big data.

But there are some legitimate criticisms that have been applied to big data that have gotten many people genuinely wondering if big data is perhaps not the only way to achieve the value of knowing your customers.

First, there's no getting around this: big data is big. It takes a lot of hardware to store and analyze peta-, yetta-, or (eventually) zettabytes of data. Even using commodity hardware solutions brings a hefty price tag. Public clouds like Amazon Web Services or OpenStack can help mitigate these costs, but there's still a lot of operational support with which to deal. This kind of costs can easily put big data out of the reach of many companies.

Second--and this is a bit less obvious, I promise--big data is a direct reaction to try to deal with a massive influx of data that was rolling in from one particular source: the Web. Enterprises that were running web sites suddenly had to contend with millions of facts per minute (or more), and needed a way to properly catch, store, and analyze this data in the hopes of figuring out what they could.

(It could be argued that large-scale retailers were generating nearly the same scale of data from point-of-sale transactions, but not at the sheer speeds that Web data can come in. Without that speed impetus, even retailers with massive data collections could easily use "traditional" data solutions to manage their own data.)

This combination of speed and volume is what makes big data tools and technology really necessary. But for a vast majority of businesses that do not have giga-scale ecommerce sites, is big data even worth the attempt?

Adding another barrier to small businesses and organizations capitalizing on big data is the fact that big data can be hard. Hadoop and MapReduce, which are right now topping the chatter charts, are a very good tools to use for commodity storage and analysis. But they can also be very difficult to manage--just ask anyone who had to create a MapReduce batch analysis job.

Of course, there's really no such thing as "easy" data, because the concepts of data storage are not the most intuitive in the world, as I often discover when first introducing them to my own students. But, on a relative scale, right now the world of Hadoop is not very mature and there is a very strong sense of flying-by-the-seat-of-your-pants going on in the Hadoop community. To be fair, this will settle out as things get more mature, but it still represents real barrier to entry to Hadoop-oriented big data.

This is where the small data movement (if that's really the best label) gets traction: knowing full well that there's a low chance that most non-enterprise or non-governments are going to actually need to use big data, let alone figure out how to afford and implement it, how does IT deliver the awesome benefits of big data?

Small data is a concept that tries to address this conundrum. One of the easier ways it's taking on the issue is taking advantage of the hype and hoopla around big data to remind businesses of the general value of data analytics no matter how big the data set is. That message is practically a non-brainer: business owners are scanning the headlines every day and getting excited about applying data to their decision-making process.

Another interesting and more technical process is the notion of personal data stores. Personal data stores are, in essence, a collection of data that remains attached to an individual customer. If and when that data is shared, is provides a much richer amount of information about the customer and what their likes/dislikes are. Personal data stores, if broadly disseminated, would alleviate the need for the massive analytics that have to sift through multi-signal data in order to find that one golden nugget of information that says "Brian likes this," because it would be right there.

Of course, "broadly disseminated" is a tricky thing: such data stores would be understandably private and many customers could simply choose to opt-out (nor opt-in) to revealing that information to a merchant. If personal data stores get more traction, we could see the re-introduction of loyalty programs that will encourage customers to share their personal information, perhaps for special deals and discounts.

Thanks to big data, many businesses recognize the value of data analysis. But there may be several new paths that will open up to help them achive the benefits of data decision making.

Read more of Brian Proffitt's Zettatag and Open for Discussion blogs and follow the latest IT news at ITworld. Drop Brian a line or follow Brian on Twitter at @TheTechScribe. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.

What’s wrong? The new clean desk test
Join the discussion
Be the first to comment on this article. Our Commenting Policies