LucidWorks: open source enterprise search

LucidWorksFinding exactly what you’re looking for as quickly as possible – it makes perfect sense, doesn’t it? Companies everywhere are seeing their data growing and being saved in different sources and places, in different formats and with specific security settings. Finding the correct piece of information isn’t getting any easier and can mean lots of wasted search time. So a good enterprise search system can result in significant money savings over the long term.

Yet is ‘enterprise search’ just a catchphrase? Over the years, different commercial systems have strived to offer a robust and complete solution to enterprise search needs. But that may also be part of the problem: commercial systems often represent ‘overkill’ compared to users’ real needs. These systems can also be far too expensive for the majority of users and sometimes lack the desired extra flexibility.

LucidWorks to the rescue?

Most companies (still) believe in the importance of enterprise search, but they’re on the lookout for flexible, effective, maintainable and cost-effective systems. LucidWorks aims to address those needs.

LucidWorksLucidWorks is the enterprise search platform solution from Lucid Imagination and it is built completely around Apache Lucene and Apache Solr. All the flexibilities and strengths of those two systems are incorporated transparently and openly, so let’s zoom in on those functionalities.

Apache LuceneApache Lucene

Apache Lucene is a very powerful and flexible yet easy-to-use indexing and searching library written in Java. It was first adopted by the famous Apache community in 2001 and has since evolved from version 1.0 to 3.4.0. From the beginning, this library has proved to be very popular and is frequently used by many developers on various systems. As a result, it has become a mature, widely supported and attractive product. The list of systems that make use of Lucene is impressive. Lucene is of course adopted by other open source platforms such as Alfresco and Liferay. However, commercial vendors such as IBM, EMC, and SDL Tridion have also started to replace their OEM-ed search engines with Lucene. Even large-scale social networks such as LinkedIn and Twitter are powering their search engines with Lucene.

What Lucene offers:

  • High-performance indexing: Lucene indexes quickly and incrementally on different data types. It also does a great job of compressing its index. During the indexing process, it is possible to make use of analysing techniques such as stemming (converting a word to its base) and filtering out stopwords. As Lucene is open source, custom analysers can be written and implemented easily.
  • Very flexible and powerful searching: On the search side, Lucene offers a range of features to match almost every kind of search need: search in specific fields, with wildcards, based on the proximity of two terms, within a date range, and so on. Lucene ranks the results and facilitates sorting on every field. Moreover, the full flexibility of open source allows users to tweak all functionalities to their specific needs.

Apache SolrApache Solr

Lucene is sometimes described as ‘just a toolbox’, because it can’t be used ‘as is’ without development. Solr provides more power and tools for those users of Lucene, with a sort of ‘server-isation’ of Lucene. Accepted by Apache in 2006, this library is newer than Lucene and its version number is now synchronised to Lucene’s. Among the companies and systems that already use Solr are eBay, CNET and Acquia Drupal. Alfresco 4, the open platform for social content management, will also use Solr.

These are just some of the great features Solr offers:

  • Configuration for indexing and searching,
  • Facets to filter results,
  • REST API, which makes it very easy to use Solr in every programming language,
  • Detail on scoring to analyse why Lucene gives a certain score to a certain result,
  • ‘More like this’ links in the results,
  • Rich document parsing,
  • Better highlighting possibilities,
  • A (basic) administration interface with logging and analysing tools.

LucidWorksLucidWorks

With all that power built in via Lucene and Solr, why employ yet another layer? LucidWorks is a powerful enterprise, commercial-grade version of Solr and Lucene with enterprise-grade support. It’s as if Lucene provides the parts and Solr is the engine made of those parts, with Lucidworks the vehicle putting it all together.

Lucid Imagination was founded in 2007. The company specialises in Lucene and Solr and some of its team members are key contributors to both Java libraries. So all support is provided by specialists.

On top of that support, LucidWorks adds some great features to those already found in Solr and Lucene:

  • An easy-to-use administration user interface, enabling extensive configuration – which would otherwise have to be done via configuration files. It also adds several logging features.
  • Several crawlers, to index different types of documents, including a Sharepoint crawler.
  • A security layer to integrate Sharepoint security or LDAP.
  • A click scoring mechanism that allows the boosting of documents based on the number of clicks made.
  • Alerts on specific searches: messages can be sent when new results match a certain search.
  • A great query parser with field boosting, configuration for highlighting, good handling of synonyms, etc.
  • Very easy installation.
  • Excellent handling of different data sources in one collection.

But is LucidWorks open source?

Yes, indeed it is. And that is just one of its strengths.

It goes without saying that open source is cost-effective. Some development and configuration must be done, and the enterprise edition is not free of charge, but the core is free.

Open source also means flexibility: developers have complete access to all the code’s bits and bytes, everything can be tweaked as wanted, there is no black box, and all mechanisms are completely transparent. Flexibility also means ‘plugability’, in two ways. Firstly, users can plug LucidWorks into almost any software – through the REST API – and configure it to their needs. Secondly, it can use every other open source library, with for example Tika and POI used for crawling Office and PDF documents.

And what about support and documentation? Because Lucene and Solr are very popular, there is plenty of documentation and support available. Thanks to continuing input from various users and contributors over the last decade, the libraries are today mature, robust, future-proof and stable. It could even be argued that they are more stable than some commercial systems. And besides the support available from the open source community, LucidWorks offers professional support from specialists.

Conclusion

As it is based on a mainstream open source platform, Lucidworks comes with all the usual advantages – among them flexibility, openness and a large user- and support-group.

Compared to commercial systems, LucidWorks may lack certain built-in connectors. But the most important ones (including Sharepoint) are available.

One thing that seems to be missing in LucidWorks is a built-in search user interface – or at least some components to make a user search interface. However, given that everything is transparent and accessible via REST, there are numerous possibilities to build custom user interfaces or to plug LucidWorks into existing systems. The JavaScript framework ajax-solr also offers many ready-to-use and configurable widgets for every custom user interface.

LucidWorks is definitely an enterprise search system that meets most of the needs of companies that want a robust, stable, fast, efficient and flexible search system.

Want to learn more about Search Platforms? Join us at the FREE Amplexor Enterprise Search Seminar on the 8th of December 2011.

Subscribe here: http://www.amplexor.com/enterprise-search-2011