nCache: Heliosearch/Solr Off-Heap FieldCache

nCache

Heliosearch has a new replacement for the Lucene FieldCache currently used by Solr for sorting, faceting, and function queries.

 
Introducing nCache (n is for “native”):

  • nCache has Off-Heap Data-structures, just like the Off-Heap Filters to lower garbage collection pauses and GC overhead.
  • nCache is a managed cache, meaning you can do anything with it that you can do with other Solr caches, including configuring size and warming policies, and viewing cache statistics through the admin page.
  • nCache is NRT friendly. Field values are cached on per-segment basis, enabling rapid turn-around time for new index snapshots.
  • nCache is designed for maximium performance, even when the system is not experiencing garbage collection issues.
  • nCache uses no weak references like the Lucene FieldCache does.

 
UPDATE SINCE THIS POST: nCache now has support for all numerics and string fields.

Integer Performance Results

Currently, only integer fields have been implemented for nCache, so this is what we tested.

The first test involved sorting by integer fields with different numbers of unique values. Queries were of the following form:

q={!cache=false}*:*
&sort=my_int_field1 desc

The test index consisted of 50M documents, and the query for a given field was executed 10 times consecutively, and the fastest time was retained.

int_sort_latency

 
Next we tested the concurrent query throughput on the same 50M document index. The first set of queries consisted of sorting by a random integer field (the same set we used for the first test). The second set of queries consisted of using a function query to add two of the integer fields together and sort by the resulting score.

The function queries were of the following form:

q={!func cache=false}add(my_int_field1, my_int_field2)

int_query_throughput

 
The first time one sorts on an indexed field, the FieldCache (or nCache) entry is built by un-inverting the field. With per-segment caches, only new segments will need un-inverting when the index changes (although a major merge can cause all segments to change).

The un-invert time for all of the integer fields for all 22 segments in the 50M document index was tested by repeating the test 3 times (stopping the server each time) and taking the lowest (fastest) result.

 
There were no significant garbage collection pauses during these tests. Different query loads that produce more garbage should show an even greater throughput advantage for Heliosearch’s off-heap nCache.

Cache Management and Statistics

nCache is like any other Solr Cache, so you can configure and manage it and get statistics via the admin page, or via JMX.

Some of the statistics available include:
size – the memory used by the entry (in bytes) for the field (most of it will be off-heap memory)
segments – the number of segments populated for the field
carriedOver – the number of segments shared with the previous searcher / index snapshot
 
Here is an example of the admin statistics after running through some of the tutorial.

Getting Started with nCache

Heliosearch uses nCache by default, just as it uses off-heap filters by default.
Simply download the latest release and start using it!
If you’re new to Heliosearch/Solr, you may want to start here.

Only integer fields have been implemented so far, but other field types will quickly follow.