Understanding Solr’s New MergeStrategy API

posted in: analytics, search

In the last couple of posts, I introduced the RankQuery API and the AnalyticsQuery API. The RankQuery API allows developers to plug-in custom ranking algorithms to Solr and the AnalyticsQuery API allows developers to plug-in custom analytic logic to Solr.

Both the RankQuery and AnalyticsQuery API are designed to operate on a single Solr instance.

The MergeStrategy API is designed to give developers full control of how data is merged from the shards during a distributed search. Both the RankQuery and AnaylticsQuery API’s provide a hook into the MergeStrategy API.

Before we look at the hooks into the MergeStrategy API, let’s review the MergeStratgey interface. There are 5 methods in the MergeStrategy interface shown below:


  public void merge(ResponseBuilder rb, ShardRequest sreq);

  public boolean mergesIds();

  public boolean handlesMergeFields();

  public void handleMergeFields(ResponseBuilder rb, SolrIndexSearcher searcher) throws IOException;

  public int getCost();

Let’s review the methods one at a time to see what they do.

public void merge(ResponseBuilder rb, ShardRequest sreq);

The merge method is where you actually perform the merge. You are passed an instance to the ShardRequest, where you can access all the responses from the shards. You are also passed an instance of the ResponseBuilder so you can output the merged results.

public boolean mergesIds();

The mergesIds method tells Solr whether the MergeStrategy merges document ID’s from the shards. In the case of a MergeStrategy for a RankQuery, you would return true. For a MergeStrategy for an AnalyticsQuery, you would return false.

public boolean handlesMergeFields();

Often when you’re merging document ID’s from the shards, you need accompanying data from the shards. For example if you have sort values for the search, the sort values will need to be forwarded to the aggregator so it can apply the sort during the merge.

There are two approaches for forwarding merge-field data to the aggregator node. The first approach is to simply specify the field in the sort criteria. Solr automatically forwards the fields in the sort criteria to the aggregator node.

The MergeStrategy interface also provides a method for developers to handle the marshaling of merges fields to the aggregator node. The handlesMergeFields() method signals Solr that you plan to marshal the merge fields. If this method returns true Solr will not use the sort criteria to send the merge fields, and rely on the MergeStrategy’s implementation for this.

 public void handleMergeFields(ResponseBuilder rb, SolrIndexSearcher searcher) throws IOException;

The handleMergeFields method defines the logic for marshaling the merge-fields to the aggregator node. This method is run on each shard. You are passed the current searcher so you can retrieve the merge-fields, and the ResponseBuilder so you can place the merge-fields onto the output.

public int getCost();

You can define a different MergeStratgey for each AnalyticsQuery. The getCost method defines the order that the MergeStrategies are run on the aggregator node.

Now that we’ve reviewed the MergeStrategy interface let’s take a look at how to hook into the MergeStrategy API from a RankQuery and an AnalyticsQuery.

To provide a custom MergeStrategy for your RankQuery you simply return your MergeStrategy implementation from the RankQuery.getMergStrategy() method. Solr takes care of the rest and ensures that the MergeStrategy is used to merge the ID’s from the shards.

To provide a MergeStratgey for your AnalyticsQuery, you extend AnalyticsQuery and set the MergeStrategy in the super classes constructor. Again, Solr will take care of the rest, and ensure that the MergeStrategy is used to merge your analytic results from the shards.

Below is a simple AnalyticsQuery and accompanying MergeStrategy:

class TestAnalyticsQuery extends AnalyticsQuery {

  public TestAnalyticsQuery() {
    super(new TestAnalyticsMergeStrategy());
  }

  public DelegatingCollector getAnalyticsCollector(ResponseBuilder rb, IndexSearcher searcher) {
   return new TestAnalyticsCollector(rb);
 }
}

class TestAnalyticsCollector extends DelegatingCollector {
  ResponseBuilder rb;
  int count;

  public TestAnalyticsCollector(ResponseBuilder rb) {
    this.rb = rb;
  }

  public void collect(int doc) throws IOException {
    ++count;
    delegate.collect(doc);
  }

  public void finish() throws IOException {
    NamedList analytics = new NamedList();
    rb.rsp.add("analytics", analytics);
    analytics.add("mycount", count);
    if(this.delegate instanceof DelegatingCollector) {
      ((DelegatingCollector)this.delegate).finish();
    }
  }
}

class TestAnalyticsMergeStrategy implements MergeStrategy {

  public boolean mergesIds() {
    return false;
  }

  public boolean handlesMergeFields() {
    return false;
  }

  public int getCost() {
    return 100;
  }

  public void  handleMergeFields(ResponseBuilder rb, SolrIndexSearcher searcher) {
  }

  public void merge(ResponseBuilder rb, ShardRequest shardRequest) {
    int count = 0;
    NamedList merged = new NamedList();

    for(ShardResponse shardResponse : shardRequest.responses) {
      NamedList response = shardResponse.getSolrResponse().getResponse();
      NamedList analytics = (NamedList)response.get("analytics");
      Integer c = (Integer)analytics.get("mycount");
      count += c.intValue();
    }

    merged.add("mycount", count);
    rb.rsp.add("analytics", merged);
  }
}

Notice in the code above how the MergeStrategy is set in the AnalyticsQuery constructor. The DelegatingCollector returned by the AnalyticsQuery is simply counting the documents in the collect() method, and placing the count onto the response in the finish() method.

The MergeStrategy implementation totals up all the counts from the shards.