Tableau Software started embracing Hadoop several versions ago, and now you can treat Hadoop "just like you would with any data connection." Tableau relies upon Hive to structure the queries, then tries its best to cache as much information in memory to allow the tool to be interactive. While many of the other reporting tools are built on a tradition of generating the reports offline, Tableau wants to offer an interactive mechanism so that you can slice and dice your data again and again. Caching helps deal with some of the latency of a Hadoop cluster.
The software is well-polished and aesthetically pleasing. I often found myself reslicing the data just to see it in yet another graph, even though there wasn't much new to be learned by switching from a pie chart to a bar graph and beyond. The software team clearly includes a number of people with some artistic talent.
Big data tools: Splunk Splunk is a bit different from the other options. It's not exactly a report-generating tool or a collection of AI routines, although it accomplishes much of that along the way. It creates an index of your data as if your data were a book or a block of text. Yes, databases also build indices, but Splunk's approach is much closer to a text search process.
This indexing is surprisingly flexible. Splunk comes already tuned to my particular application, making sense of log files, and it sucked them right up. It's also sold in a number of different solution packages, including one for monitoring a Microsoft Exchange server and another for detecting Web attacks. The index helps correlate the data in these and several other common server-side scenarios.
Splunk will take text strings and search around in the index. You might type in the URLs of important articles or the IP address. Splunk finds them and packages them into a timeline built around the time stamps it discovers in the data. All other fields are correlated, and you can click around to drill deeper and deeper into the data set. While this is a simple process, it's quite powerful if you're looking for the right kind of needle in your data feed. If you know the right text string, Splunk will help you track it. Log files are a great application for it.
A new Splunk tool called Shep, currently in private beta, promises bidirectional integration between Hadoop and Splunk, allowing you to exchange data between the systems and query Splunk data from Hadoop.