March 15, 2012, 2:01 PM — The success of big data in the IT ecosystem gets a lot of help from Hadoop, but with each passing day, customers are realizing that there's another big player in town: 10gen's MongoDB.
If there's one thing I keep hearing about Hadoop, it's this: working with MapReduce is a pain in the tuchus.
Despite their power, building MapReduce jobs to run batch queries and analytics on the massive amounts of data sitting in Hadoop clusters is not something a lot of people really enjoy. This is based on the informal polls I have conducted, where typically the answer is accompanied by a glassy stare and a reflexive reach for the nearest bottle of scotch.
Kidding aside, while Hadoop and its distributions are recognized as some of the most powerful drivers in the big data sector today, the plain truth is, it's just not suited for everyone's needs. There are many instances where a lot of data has to be stored and queried in a relatively simplified manner.
This is the space where 10gen's MongoDB lives, and it's a space that's growing pretty darned fast. According to the latest release of Jaspersoft's Big Data Index, the experienced nearly 200 percent growth from Jan. 2011 to Jan. 2012. (Jaspersoft, it should be noted, measures downloads of specific connectors to its service to compile its Index, so take this number as you will.)
It's not the only indicator showing a steep growth curve for MongoDB. A visit to job site Indeed.com's Trends page shows MongoDB as the number two most searched job skill, right behind HTML5. (Hadoop is currently ranked at number seven.)
Each are small indicators, to be sure, but together they reflect a lot of the buzz going around the big-data sector these days: MongoDB may have gotten off to a later start than Hadoop, but it's coming up fast.
In the big data ecosystem of non-relational databases, MongoDB is part of the document-oriented store database class, where entire documents' worth of data. MongoDB and CouchDB are part of this class, using schemaless JSON-style objects-as-documents to store information as opposed to the more commonly used XML documents.
This object-oriented approach is one of the three major benefits MongoDB provides to users, according to 10gen President Max Schireson. That's because, since data within MongoDB is already object-oriented, developers building apps don't have to deal with the added complexity of an object-relational mapping layer that they have to have when coding into a standard relational database.
"This enables enormous developer productivity," Schireson said.
MongoDB's popularity can also be attributed to the ease in which data can be pulled out of the system. Because information is stored within flexible-schema documents, and not split apart between potentially hundreds of tables or machines, queries are a lot easier to create and a lot more powerful. The Mongo Query Language are set up as JSON, specifically BSON, query objects that make data a lot easier to retrieve without the limitations of batched jobs.
Finally, the storage architecture of MongoDB, where data is stored as single objects, reduces the sometimes huge numbers of tables found in equivalent relational database systems.
"Enormous table counts mean enormous transactions," Schireson said. In MongoDB, all transactions are greatly simplified, thus delivering a much higher performance.
For Schireson, even though it is easy to compare Hadoop and MongoDB, the two technologies live in vastly different spaces. Hadoop, he asserts, is a very powerful tool for data analytics. MongoDB is disruptive in a completely different arena: the data management space.
"Our focus in on the operational data store, like Oracle, versus the analytics done by a company like Teradata," Schireson.
It's hard not to cringe a bit when Schireson says this, since the last open source company that disrupted the heck out of Oracle's business got themselves acquired by Oracle. But that doesn't seem to be a concern for now.
Right now, 10gen is focusing on maintaining the growth of MongoDB and working with clients like Foursquare, which uses MongoDB to manage its core data and MongoDB's Hadoop connectors to let Foursquare perform analytical work with Hadoop.
"The revolution is only in the first phase," Schireson said.
Read more of Brian Proffitt's Zettatag and Open for Discussion blogs and follow the latest IT news at ITworld. Drop Brian a line or follow Brian on Twitter at @TheTechScribe. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.