Big data tools: Jaspersoft BI Suite The Jaspersoft package is one of the open source leaders for producing reports from database columns. The software is well-polished and already installed in many businesses turning SQL tables into PDFs that everyone can scrutinize at meetings.
The company is jumping on the big data train, and this means adding a software layer to connect its report generating software to the places where big data gets stored. The JasperReports Server now offers software to suck up data from many of the major storage platforms, including MongoDB, Cassandra, Redis, Riak, CouchDB, and Neo4j. Hadoop is also well-represented, with JasperReports providing a Hive connector to reach inside of HBase.
This effort feels like it is still starting up -- many pages of the documentation wiki are blank, and the tools are not fully integrated. The visual query designer, for instance, doesn't work yet with Cassandra's CQL. You get to type these queries out by hand.
Once you get the data from these sources, Jaspersoft's server will boil it down to interactive tables and graphs. The reports can be quite sophisticated interactive tools that let you drill down into various corners. You can ask for more and more details if you need them.
This is a well-developed corner of the software world, and Jaspersoft is expanding by making it easier to use these sophisticated reports with newer sources of data. Jaspersoft isn't offering particularly new ways to look at the data, just more sophisticated ways to access data stored in new locations. I found this surprisingly useful. The aggregation of my data was enough to make basic sense of who was going to the website and when they were going there.
Big data tools: Pentaho Business Analytics Pentaho is another software platform that began as a report generating engine; it is, like JasperSoft, branching into big data by making it easier to absorb information from the new sources. You can hook up Pentaho's tool to many of the most popular NoSQL databases such as MongoDB and Cassandra. Once the databases are connected, you can drag and drop the columns into views and reports as if the information came from SQL databases.
I found the classic sorting and sifting tables to be extremely useful for understanding just who was spending the most amount of time at my website. Simply sorting by IP address in the log files revealed what the heavy users were doing.