May 01, 2013, 4:44 PM — Users of the Hadoop data processing platform now have two more tools to help them sort through their mountains of information.
Hadoop distributor MapR has integrated the LucidWorks Search into its own distribution. Cloudera, meanwhile, has launched the first full release of its open source Impala SQL query engine for Hadoop.
"Using search as the user interface for big data is very interesting. Search is well suited to leveraging a lot of different types of information, especially unstructured information," said Jack Norris, chief marketing officer for MapR. "We're seeing some really interesting applications with search engines at their core, even if a typical user would not think of them as search engine driven."
LucidWorks Search is the commercial version of the open source Apache Lucene/Solr full-text search engine. With the new MapR integration, LucidWorks Search can search through either data on the Hadoop File Systems (HDFS) or on files on other file systems.
LucidWorks Search offers snapshots and mirrors for high availability, and eliminates much of the work required to install Lucene/Solr from scratch. It also offers native support for more data sources, a graphical user interface and a security framework.
The search engine could be used in a dynamic Web application to quickly retrieve photos, advertising, product recommendations, and other information that can be used to populate Web sites on the fly. "This isn't a lower cost substitute for data warehouses. This is about leveraging new data sources and doing some things that have a dramatic impact on the business," Norris said.
MapR and LucidWorks have been working together on pairing their technologies since 2011, when they formed a joint marketing agreement. Earlier this year, they released a connector that makes it easy to use Lucene/Solr with the MapR Hadoop distribution.
LucidWords Search works with the MapR's newly released M7 distribution, in beta form. In addition to supporting LucidWorks Search, the M7 edition has been re-architected to eliminate compactions or background consistency checks, speeding performance.
Also this week, Cloudera released version 1.0 of Cloudera Impala, an open source SQL-compliant query engine for Hadoop. SQL is the database interface language used in relational database management systems (RDMS) and is well-known by database administrators.