While big data tools like Hadoop are usually used in conjunction with existing relational database systems, 10gen is working towards a loftier goal: getting its NoSQL database MongoDB to replace instances of relational databases.
The latest release of MongoDB has hit the proverbial shelves today, and version 2.2 is bringing a lot of new features under the hood that are geared towards speeding up the performance of 10gen's database that's made a big mark on the big data scene.
Comparing MongoDB and Hadoop is not really accurate: while they are both used to handle high-scale datasets, the truth is that their jobs are very different. Hadoop's distributed file system and MapReduce processing system can indeed store a lot of data and pull out information in batched jobs, but no one would call Hadoop a database.
MongoDB is part of the document-oriented store database class, where entire documents' worth of data are held, using schemaless JSON-style objects-as-documents to store information as opposed to the more commonly used XML documents.
This object-oriented approach is one of the major benefits MongoDB provides to users. That's because, since data within MongoDB is already object-oriented, developers building apps don't have to deal with the added complexity of an object-relational mapping layer that they have to have when coding into a standard relational database.
One of the new features within MongoDB 2.2 should improve developer's access to data even more, according to Jared Rosoff, Director of Product Marketing and Technical Alliances at 10gen. Users want more ways to work with and access data as usage expands. An improved aggregation framework in 2.2 enables real-time queries on data, simplifies reporting and provides the foundation for real-time analytics within MongoDB itself. This can accelerate performance of analytics and reporting up to 80 percent compared to using MapReduce in Hadoop.
New operators in the framework also add to the power of MongoDB for analytic, Rosoff explained. "It's very much like Unix's pipes and filters. A developer can piece together these operators and pull out data that much faster."
The upshot of this new framework is that it will be a lot easier and faster for reporting and aggregation tools to hook in to MongoDB and use data for business intelligence.
The new release also includes the capability to tag data in a MongoDB cluster to better improve data replication and availability across multiple datacenter deployments of MongoDB.
All of these features, aimed to reduce the latency of MongoDB, should help further 10gen's goal to increase MongoDB adoption in the enterprise. But it's not just a game of speed. As dataset complexity increases, there are more and more application sets that are simply not as well suited for relational databases like Oracle and SQL Server--at least not without breaking the bank. It's these types of datasets--such as metadata management--that are better suited for MongoDB in terms of speed and cost.
"Scale is what created the NoSQL movement," Rosoff said. "It's the new data models that will make NoSQL databases more attractive to the enterprise."
Read more of Brian Proffitt's Open for Discussion blog and follow the latest IT news at ITworld. Drop Brian a line or follow Brian on Twitter at @TheTechScribe. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.