November 01, 2011, 6:04 AM — You can add MarkLogic to the growing list of database vendors rushing to embrace the open-source Hadoop programming framework for large-scale data processing.
MarkLogic 5, which became generally available on Tuesday, includes a Hadoop connector that will allow customers to "aggregate data inside MarkLogic for richer analytics, while maintaining the advantages of MarkLogic indexes for performance and accuracy," the company said.
MarkLogic is a "real, enterprise-class database, but it uses XML and XQuery instead of SQL, so it's well-suited for certain classes of applications," said analyst Curt Monash of Monash Research. "They have a nice scale-out story and they're dotting some i's and crossing some t's on industrial-strength performance."
The database's calling card has been its ability to manage, index and serve up large amounts of unstructured data, from text documents to media files.
It makes sense for MarkLogic to support Hadoop, Monash said.
"There are some multi-structured data use cases that are an obvious fit for MarkLogic over Hadoop and vice versa," he said. "Any integration lets you straddle them and get broader reach."
For example, an insurance company may have a set of documents numbering in the billions that it wants to pull up one by one and perform analytics on each, he said. "That would be a great use case for the combination," with MarkLogic handling the first part and Hadoop the second, he said.
The Hadoop tie-in reflects the broader trend around "Big Data," an industry buzzword that refers to the ever-increasing amount of unstructured information from sources apart from traditional enterprise applications, such as social networking sites and sensors.
Meanwhile, another new feature in MarkLogic 5 tries to make the most of the mix of storage customers might have, said CTO Ron Avnur. "We realized people have rotational drives and network-attached storage, and are starting to play more seriously with solid-state. These have different performance profiles."
System administrators will tell MarkLogic where and what the options for storage are, and the system will "do all the optimization." In this way, more frequently used data can be kept in flash and older or less frequently accessed information held elsewhere.
The new release also adds dashboards for overseeing multiple MarkLogic clusters. Customers may have development, test and production systems, and "they want to understand what's going on across those," Avnur said.
Also new are tie-ins to the Nagios open-source monitoring framework and Hewlett-Packard's Operations Manager software, as well as an API (application programming interface) that can be used to integrate with other management systems.