Perhaps the best-known example of a MapReduce-based system is Hadoop, which uses MapReduce in combination with the Hadoop Distributed File System to store data effectively. It is important to note that Hadoop, along with the Cloudera commercial implementation and the Apache Cassandra system, are also members of the column-oriented store class of databases. This is no contradiction: the method of data store is separate from the distributed method of data processing. In fact, even though CouchDB lies in the document-storage class of data storage tools, it also uses MapReduce for data processing, just as Aster uses MapReduce atop the relational PostgreSQL database.
While MapReduce-related tools are enjoying their moment in the sun, there are some known drawbacks to using MapReduce. Writing the algorithms for the multi-core data processing can be very complicated, and because the technology is relatively new, many of these query routined have to be hand-coded.
As such, these processing algorithms can be extremely inefficient, requiring lots of processing hardware and storage space to complete a task. The good news is, since most MapReduce systems are based on open source technology, they can be run on commodity systems that are cheap, and easy to just pile into your environment.
And the algorithm sets are getting better; Pig's Pig Latin query language can dive into Hadoop storage systems in small steps to create data flows. Hive's HiveQL does much the same thing, though it's similar enough to SQL to be more familiar to data analysts.
Enterprise search products, such as ElasticSearch, Apache Lucene, and Apache Solr, use a concept called facets that enable you to treat data within documents as you would fields within a relational database. Facets are essentially inverted indexes that let you find specific pieces of information in a document, like an address or other customer information.
Enterprise search is ideal if you have a large set of these types of documents to cull through, and need to do some straightforward data mining or business intelligence analysis. The more structured the data, the better: enterprise search does particularly well with documents like weblogs, which are structured uniformly enough to enable deeper data mining.
The Big Data vendors
Now that you have an idea of the various technologies that are currently part of the Big Data sector, you will have a better context of how the players fit within this sector.
The list below is by no means comprehensive, but is meant to be a jumping off point to identify the key players in big data, and what products and services they offer.