Unleashing Solr’s Custom Analytics Engine

posted in: Uncategorized

Solr 4.9 unleashes Solr’s uniquely powerful Custom Analytics Engine.

In recent blogs I wrote about the nuts and bolts of the new AnalyticsQuery API and MergeStrategy API. This blog is a high level look at that power that these API’s provide.

In A Nutshell:

The AnalyticsQuery API allows you to plugin custom analytics algorithms into Solr. The MergeStrategy API allows you to control the merging of analytic output from the shards during a distributed search.

What Are The Use Cases For These New API’s

Very simply put the usecase is: Analyzing data in a custom way.

This is where you can add your secret sauce. The special algorithm that sets your business apart. The place for sifting scientific data. The place to run complex mathematical algorithms on vast amounts of data.

The AnalyticsQuery API is designed for quants, scientists, data scientists and software engineers building custom analytic applications.

Components of Solr’s Custom Analytics Engine

To fully appreciate the power of Solr’s Custom Analytics Engine, we need to review the truly remarkable technologies that it leverages.

SolrCloud

SolrCloud provides Solr with huge scale. SolrCloud is a modern and full featured distributed indexing and search framework. It automates the process of managing large clusters of Solr servers. It can scale out to support billions of documents and thousands of queries per second.

Solr Search

Solr’s powerful search features allow you to perform custom analytics on very precise sets of data. And it’s fast, designed from the ground up to support sub-second responses.

Lucene

Lucene provides the low level index structures and caches. It is accessed by Solr to perform the search. From an analytics standpoint, the AnalyticsQuery API gives you full access to Lucene’s column oriented memory and disk caches. These low level caches allow you to efficiently access the data that you need to build custom analytics on the search results.

AnalyticsQuery API

The AnalyticsQuery API allows you plug in code at the exact place where Search and Analytics intersect. The AnalyticsQuery API does this by allowing you to plugin your own Lucene Collector. A Lucene Collector sees every search result and provides low level access to all of Lucenes column oriented structures.

MergeStratgey API

The MergeStrategy API allows you to perform custom analytics across large clusters of servers (SolrClouds). It does this by providing a plugin point for code that merges the analytic output from the shards.

This powerful stack is available now with Solr 4.9, all it’s waiting for is your algorithms.