Bigger than big dataAfter wading through these products, it became clear that "big data" was much bigger than any single buzzword. It's not really fair to lump together products that largely build tables with those that attempt complicated mathematical operations. Nor is it fair to compare simpler tools that work with generic databases with those that attempt to manage larger stacks spread out over multiple machines in frameworks like Hadoop.
To make matters worse, the targets are moving. Some of the more tantalizing new companies still aren't sharing their software yet. Mysterious Platfora has a button you can click to stay informed, while another enigmatic startup, Continuity, just says, "We're still in stealth, heads down and coding hard." They're surely not going to be the last new entrants in this area.
Despite the speed and sophistication of the new algorithms, I found myself liking the old classic reports the best. The Pentaho and Jaspersoft tools simply produce nice lists of the top entries, but this was all I needed. Knowing the top domains in my log file was enough.
The other algorithms are intellectually interesting, but they're harder to apply with any consistency. They can flag clusters or do fuzzy matching, but my data set didn't seem to lend itself to these analyses. Try as I might, I couldn't figure out any applications for my data that didn't seem contrived.
Others will probably feel differently. The clustering algorithms are used heavily in diverse applications such as helping people find similar products in online stores. Others use outlier detection algorithms to identify potential security threats. These all bear investigation, but the software is the least of the challenges.
Perhaps it is my lack of vision that left me clutching to the old sortable reports. In time, I may come to understand just how I might use the advanced algorithms to do more. This may be why most of these companies list consulting among their products. They will rent you one of their engineers, who is familiar with the software and the math, so you have a guide when you're digging around the data. This is a good option for every business because the needs and demands are often rather abstract and filled with wishful hand-waving.
At a recent O'Reilly Strata conference on big data, one of the best panels debated whether it was better to hire an expert on the subject being measured or an expert on using algorithms to find outliers. I'm not sure I can choose, but I think it's important to hire a person with a mandate to think deeply about the data. It's not enough to just buy some software and push a button.