Lucene and Solr are two Apache projects that are heavily intertwined--Lucene is the Java library upon which Solr is based, and both projects are deep in the DNA of Apache Hadoop. This ancestry might lead some to regard Lucene/Solr-based products as outmoded when compared to Hadoop and related software, but that would be a mistake. Enterprise search is a big business, and a very viable force within the larger big data movement.
According to the release notes, Solr 4.0 will see the addition of the new Solr Cloud tool that will enable more scalability for the Solr enterprise search platform. There are also a number of features coming in for users who use Solr as a primary NoSQL data store.
The changes for Lucene, as you might expect for a Java search-engine library, are less flashy, but no less important. According to developer Mike McCandless, FuzzyQuery, which matches terms "close" to a specified base term, will be 100 times faster in the final release of Lucene 4.0.
Enterprise search is typically more accessible to those new to big data because it's search: you type in some search terms in an only-slightly-more complicated-than-Google format, and your results are gathered. Not too hard to understand, since that's pretty much how people perceive the way Google, Bing, and Yahoo find things on the Internet.
Of course, executing enterprise search isn't simple, and it has a lot more going for it than just simple grokking of the concepts. Using facets, enterprise search enables users to treat data within documents as they would fields within a relational database. Facets are essentially inverted indexes that let users find specific pieces of information in a document, like an address or other customer information.
This is why enterprise search is ideal for examining large sets of these types of documents, straightforward data mining, or business intelligence analysis. The more structured the data, the better: enterprise search does particularly well with documents like weblogs, which are structured uniformly enough to enable deeper data mining.
Thought these are most definitely alpha releases, the progress will definitely be appreciated by the vendors and users who have placed their big data bets on enterprise search.
Read more of Brian Proffitt's Open for Discussion blog and follow the latest IT news at ITworld. Drop Brian a line or follow Brian on Twitter at @TheTechScribe. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.