Data feeds produce much of this source data, which come from Web server logs and outside sources. The Hadoop Flume component is used to ingest data. The Hadoop cluster also executes a series of MapReduce jobs to parse the raw data into summaries.

AOL also uses Couchbase's CouchDB as a switching station of sorts for data arriving from the feeds. Because CouchDB can work with data without writing it to disk, it can be used to parse data quickly before sending it to the next step.

"We didn't anticipate ad targeting to be a primary [market] for us. But Couchbase ended up filling a need for AOL and other ad companies," Ingenthron said. The work is "technically complex and has a lot challenges in processing data very quickly."

Scientific and medical publishing house Elsevier was looking for greater flexibility when it procured an XML-based, non-relational database system from Mark Logic, said Elsevier Labs Vice President Bradley Allen.

The scientific publishing world is moving from a static model to a more dynamic one, Allen explained. For the past few centuries, printed scientific paper, collected in journals, served as the basic unit of knowledge. It contained a description of the work, the authors and contributors, references and other core components of information. While the scientific publishing world is moving to digital, paper remains the dominant medium for data communication. "We're still in the horse-and-carriage era," Allen quipped.

Over time, the scientific paper will be decomposed into individual elements, which can be used in multiple products. Individual paragraphs or even individual assertions can be annotated and indexed, Allen predicted. They can then be reassembled into new works and embedded in applications, such as programs that doctors can consult. They can also be mined for new information through the use of analytics.

With this in mind, Elsevier is in the process of annotating the papers in its journals so they can be deployed in other applications and services. An XML database was a natural fit for this work, Allen explained. New content types can easily be added into a database, and the format allows individual components to be easily reused in new composite applications and services.

Elsevier has introduced a number of new products with this approach. One is the SciVal, a service for academic administrators that summarizes the publishing activity within their institution, giving them a quantitative idea of the organization's academic strengths and weaknesses. Another service is the Science Direct, a full-text search engine for Elsevier's journals.

