A History of Lucene and Solr

posted in: lucene, solr

I’ve often seen mistaken descriptions of Solr as just “a http wrapper around Lucene”. Unfortunately that mischaracterization was never nipped in the bud early enough and has continued to be repeated in many places such as press articles (where it is picked up and repeated again). Of course people who have been involved with Lucene and Solr from the beginning know better!

The fact that there was so much core functionality in Solr that Lucene users wanted even led the projects to merge in 2010.

Here’s a partial history of some Solr milestones that include core search functionality (i.e. not related to just exposing Lucene via HTTP):

Functionality Implemented in Solr Available in Lucene
Numerics + range queries Jan 2006 Sept 2009 Lucene 2.9
Index Replication Jan 2006 July 2013 Lucene 4.4 Replication Module
Unique keys (overwriting) Jan 2006 ? 2007 IndexWriter.updateDocument
Many analysis filters, WordDelimiterFilter, Soundex, Regex/Pattern, HTML, kstem, trim, reverse wildcard, multi-word synonym, etc Jan 2006 – various Oct 2012 Lucene 4.0 all analysis filters moved from Solr to Lucene
Searcher concurrency control Jan 2006 Nov 2011, Lucene 3.5, SearcherManager
Faceted search Sep 2006 Sep 2011, Lucene 3.4, LUCENE-3079
Function queries Jan 2006 Jun 2007, Solr’s FunctionQuery was copied (not moved) into Lucene 2.2 but it stagnated, function queries were later moved from Solr to Lucene for version 4.0 (Oct 2012)
Distributed search Feb 2008 Jul 2011, Lucene 3.3, partial support via TopDocs.merge
Query-time Join April 2011 Jan 2012, LUCENE-3602
Grouping / Field Collapsing Aug 2010 (dev patches used by many in production much earlier however) May 2011 – Oct 2011, Grouping moved from Solr to Lucene LUCENE-1421, LUCENE-3483, etc.
Constant score queries, including prefix/range queries that don’t explode when too many terms are matched Jan 2006 May 2006, moved from Solr to Lucene LUCENE-383 etc.
Multi-valued field cache (UnInvertedField) Nov 2008 SOLR-475 Mar 2011, moved from Solr to Lucene LUCENE-3003
Distributed faceting Feb 2008 Jul 2013, Lucene 4.4, partial support via FacetResult.mergeHierarchies?
Auto-suggest Aug 2010, SOLR-1316 May 2011 Moved from Solr to Lucene, LUCENE-2995
field types Jan 2006 Oct 2012 Lucene 4.0 FieldType class
Configurable analysis component factories Jan 2006 July 2012, all analysis factories moved from Solr to Lucene, LUCENE-2510
User-oriented query parsers (dismax, edismax) Jan 2006, Nov 2009 SOLR-1553 Nov 2013 LUCENE-5336
Real-time Get Nov 2011 SOLR-2700 Jan 2013 LUCENE-4695
Filter Cache Jan 2006 Nov 2014 LUCENE-6077

Of course, I’ve only touched on some of the features that were in Solr first and later became available in Lucene. I’ve left out all of the features that Lucene still does not have (like optimistic locking, numeric statistics), and more server-ish features (many query parser types, in/out support for JSON, XML, CSV, etc.)

The reality is that both Lucene and Solr have long been innovating in the open source search space.