Enterprise search is a valuable, and growing component of big data

Lucid Imagination offers enterprise search for all

A new product from Lucid Imagination promises the functionality of enterprise search delivered from the cloud.

Enterprise search seems to be one of those concepts that's simple to understand -- so simple, in fact, that its value can be underestimated in the world of big data and data warehousing.

A big reason for this is the fact that enterprise search is something that most people can intuit a lot more easily than the somewhat arcane concepts associated with big data store tools like Hadoop, Cassandra, and that sector of data warehousing. After all, it's search: you type in some search terms in an only-slightly-more complicated-than-Google format, and your results are gathered. Not too hard to understand, and because enterprise search tools can be pointed at data wherever it sits, there's not a lot of conceptual understanding involved, either. That's pretty much how people perceive the way Google, Bing, and Yahoo find things on the Internet.

Of course, executing enterprise search isn't simple, and it has a lot more going for it than just simple grokking of the concepts. Using facets, enterprise search enables users to treat data within documents as they would fields within a relational database. Facets are essentially inverted indexes that let users find specific pieces of information in a document, like an address or other customer information.

This is what enterprise search is ideal for examining large sets of these types of documents, for straightforward data mining or business intelligence analysis. The more structured the data, the better: enterprise search does particularly well with documents like weblogs, which are structured uniformly enough to enable deeper data mining.

One major family of enterprise search products use the open source Apache Lucene and Apache Solr projects as the basis of their technology, and one of the more prominent commercial vendors in this family is Lucid Imagination.

Lucid is to Lucene and Solr what companies like Red Hat, SUSE, and Canonical are to Linux. Like a Linux distribution, Lucid Imagination's LucidWorks Enterprise product pulls together the best features of Apache Lucene/Solr, adding a few more features along the way, such as search connectors to SharePoint, Web, and Active Directory data. This is not an open core company: like Red Hat, versions of LucidWorks are provided free of charge, with a support subscription required for production use.

As you may recall, the evolution of Lucene search platform occurred at the beginning of the journey to the Hadoop distributed storage engine. Solr would, in turn, be based off of the Lucene Java library. This historical positioning may lead some to regard Lucene/Solr-based products as outmoded when compared to Hadoop and related software.

"Complimentary" might be a better word. When I chatted with Grant Ingersoll, Chief Scientist at Lucid Imagination, to learn more about this aspect of the data warehousing sector, he outlined several examples of how LucidWorks could be combined with Hadoop tech to deliver useful results.

Because of the fuzzy matching capabilities of enterprise search, Ingersoll explained, it's perfect for performing functions like searching logs generated by web sites, then sending those results into analytic engines driven by Hadoop-related tools, which in turn can be fed back into the web site's control systems. Such a process would be ideal for monitoring an ecommerce site and provide near-real-time feedback on certain products' popularity.

"We're actually everywhere within big data," Ingersoll added. "Lucene has made search a commodity."

And there's little question about Lucene/Solr's ability to scale: Twitter, for instance, uses Lucene for its search functionality. Searches can be run as forensic-type searches, or queries that hone in on very finely grained data.

Last week, Lucid stepped up its game even further, with the general availability release of LucidWorks Cloud, a hosted service that Ingersoll coined as "search-as-a-service." (Because clearly we're all lacking *aaS terms these days.) Still, the concept is intriguing: instead of hosting a local instance of LucidWorks, businesses can now plug their data into this hosted edition and run their enterprise searches from the cloud.

If this takes off, this should greatly increase the visibility of enterprise search in general, since a business' data can be examined now without the overhead of deploying and managing a local Lucene/Solr instance.

Enterprise search may be about to get even bigger.

Read more of Brian Proffitt's Zettatag and Open for Discussion blogs and follow the latest IT news at ITworld. Drop Brian a line or follow Brian on Twitter at @TheTechScribe. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.

ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon