Noggit, the JSON Streaming Parser

posted in: java, json | 0

Noggit is the world’s fastest streaming JSON parser for Java. Noggit is the streaming JSON parser used in Solr. It lives here on github. JSON features and extensions Noggit supports a number of extensions to the JSON grammar. All of … Continued

Heliosearch/Solr JSON Request API

posted in: heliosearch, json, query, solr | 0

Although query parameters are often an easy method to create a Heliosearch/Solr requests by hand, they have a number of drawbacks: Inherently un-structured, requiring unsightly parameters like f.facet_name.facet.range.start=5 Inherently un-typed… everything is a string. More difficult to decipher large requests. … Continued

JSON Facet API

posted in: analytics, facets, heliosearch, solr | 0

Introduction Facet Types Testing Using Curl Terms Facet Query Facet Range Facet Related page: Facet Functions Related page: Sub-Facets Introduction Heliosearch has a completely re-written faceted search module with a structured JSON API to control the faceting commands. The structured … Continued

Heliosearch/Solr Facet Functions and Analytics

Traditional faceted search (also called guided navigation) involves counting search results that belong to categories (also called facet constraints). The new facet functions in Heliosearch/Solr extends normal faceting by allowing additional aggregations on document fields themselves. Combined with the new … Continued

Heliosearch/Solr Subfacets

posted in: analytics, facets, grouping, solr | 0

Subfacets (also called Nested Facets) is a more generalized form of Solr’s current pivot faceting that allows adding additional facets for every bucket produced by a parent facet. Subfacet advantages over pivot faceting: Subfacets work with facet functions (statistics), enabling … Continued

Streaming Aggregation For SolrCloud

Introduction Tuples, TupleStreams and Sorted Streams SolrStream Stream Decorators Streaming Set Operations Streaming Aggregation Distributed Searching, Sorting and Streaming: CloudSolrStream A Bottleneck Rears It’s Ugly Head! Distributed Aggregation: ParallelStream Scaling-up With SolrCloud Replicas   Introduction This blog describes the new … Continued

Solr Terms Query for matching many terms

Solr 4.10 and Heliosearch .07 have added a terms query (or terms filter) to more efficiently match many terms in a single field. A large number of terms are often useful for things like access control lists or security filters. … Continued

Native Code Faceting

Native code faceting for Solr has just been added to Heliosearch, and benchmarks show an impressive 2x performance increase! This is faceting code written in C++ and statically compiled for maximum performance, and loaded into the JVM via JNI (Java … Continued

Solr Cross Data Center Replication

posted in: solr, solr cloud | 0

Solr needs a flexible cross-datacenter architecture that can handle both a variety of application needs as well as a variety of infrastructure resources. Design Goals Accommodate 2 or more data centers Accommodate active/active uses Accommodate limited band-with cross-datacenter connections Minimize … Continued

Solr Filter Caching

posted in: filters, heliosearch, lucene, search, solr | 0

The filter caching features in Solr allow for precise control over how filter queries are handled in order to maximize performance. Solr has the ability to specify if a filter is cached, specify the order filters are evaluated, and specify … Continued

Off-Heap FieldCache Faceting and Sorting

Lucene/Solr background Lucene has a segmented architecture – when a small amount of documents are added to an existing index, this will often just add an additional small segment to the index. Caching data structures at the segment level (e.g. … Continued

A History of Lucene and Solr

posted in: lucene, solr | 0

I’ve often seen mistaken descriptions of Solr as just “a http wrapper around Lucene”. Unfortunately that mischaracterization was never nipped in the bud early enough and has continued to be repeated in many places such as press articles (where it … Continued

Solr’s New AnalyticsQuery API

posted in: Uncategorized | 0

In Solr 4.9 there is a new AnalyticsQuery API that allows developers to plug custom analytic logic into Solr. The AnalyticsQuery class provides a clean and simple API that gives developers access to all the rich functionality in Lucene and … Continued

New in Solr 4.9: Query Re-Ranking

posted in: Uncategorized | 0

In my last blog, I discussed Solr’s new RankQuery capability, which allows developers to take full control of the ranking process. The new ReRankingQParserPlugin leverages the RankQuery framework to hook in Lucene’s new QueryRescorer. Query Re-Ranking Explained: Solr offers a … Continued

Solr’s New RankQuery API

posted in: Uncategorized | 0

Coming in Solr 4.9 is a new RankQuery API. Before diving into how the RankQuery API works, I’ll give a little background into how ranking works in Lucene/Solr. A Lucene search can have three parts to it: 1) A Query: … Continued

Parameter Substitution / Macro Expansion

posted in: heliosearch, solr | 0

Macro Expansion is a new Heliosearch feature that does parameter substitution across all request parameters. The macro expansion is done at the same point in time that default parameters are applied (i.e. when the request reaches the correct solr request … Continued

Solr 4.8 Features

posted in: search, solr | 0

Solr 4.8 has been released. Here’s an overview of how to use some of the new features. Complex Phrase Queries The complexphrase query parser can produce phrase queries with embedded wildcards and boolean queries. It works via multiple passes, parsing … Continued

Getting started with Tomcat and Solr

posted in: solr, tomcat | 0

Step 1: Download HDS Apache Solr by default ships with a Jetty based Solr server in the “example” directory, but many people prefer Tomcat. Configuring your own Tomcat server can be daunting, as can be seen by the large list … Continued

Solr’s New Expand Component

posted in: grouping | 0

Coming in Solr 4.8 is a new search component called the ExpandComponent. The ExpandComponent can be used to expand parent/child relationships in Solr. This blog describes how to use the ExpandComponent to expand the groups that were collapsed by the … Continued

Heliosearch/Solr Off-Heap FieldCache Performance

Heliosearch’s off-heap FieldCache was previously introduced and benchmarked for integer fields. Support for all numeric field types as well as string fields has now been completed, and this post will focus on the performance of string fields. A review of … Continued

Solr 4.7 Features

posted in: Uncategorized | 0

Solr 4.7 has been released! Here’s a slightly more in-depth overview of some selected features. Deep Paging Both single node, and distributed deep paging have been added to Solr! I previously created an example of how to use Solr’s deep … Continued

nCache: Heliosearch/Solr Off-Heap FieldCache

nCache Heliosearch has a new replacement for the Lucene FieldCache currently used by Solr for sorting, faceting, and function queries.   Introducing nCache (n is for “native”): nCache has Off-Heap Data-structures, just like the Off-Heap Filters to lower garbage collection … Continued

Solr Cloud Client Side Document Routing

posted in: search, solr, solr cloud | 0

In Solr 4.5, client side document routing was added to CloudSolrServer. This feature routes document updates to the correct shard leader. This is the default behavior for CloudSolrServer so you don’t have to do anything to turn it on. Just … Continued

MurmurHash3 for Java

posted in: java, solr | 0

Background I needed a really good hash function for the distributed indexing we’re implementing for Solr. Since it will be used for partitioning documents, it needed to be really high quality (well distributed) since we don’t want uneven shards. It … Continued

Solr’s Realtime Get

posted in: lucene, search, solr | 0

Solr took another step toward increasing it’s NoSQL datastore capabilities, with the addition of realtime get. Background As readers probably know, Lucene/Solr search works off of point-in-time snapshots of the index. After changes have been made to the index, a … Continued

Indexing JSON in Solr 3.1

posted in: Uncategorized | 0

Solr has been able to produce JSON results for a long time, by adding wt=json to any query. A new capability has recently been added to allow indexing in JSON, as well as issuing other update commands such as deletes … Continued

Solr Result Grouping / Field Collapsing

posted in: search, solr | 0

Result Grouping, also called Field Collapsing, has been committed to Solr! This functionality limits the number of documents for each “group”, usually defined by the unique values in a field (just like field faceting). You can think of it like … Continued

CSV output for Solr

posted in: search, solr | 0

Solr has been able to slurp in CSV for quite some time, and now I’ve finally got around to adding the ability to output query results in CSV also. The output format matches what the CSV loader can slurp. Adding … Continued

Ranges over Functions in Solr 1.4

posted in: lucene, search, solr | 0

Solr 1.4 contains a new feature that allows range queries or range filters over arbitrary functions.  It’s implemented as a standard Solr QParser plugin, and thus easily available for use any place that accepts the standard Solr Query Syntax by … Continued

Filtered query performance increases for Solr 1.4

posted in: lucene, search, solr | 0

One of the many performance improvements in the upcoming Solr 1.4 release involves improved filtering performance. Solr 1.4 filters are both faster (anywhere from 30% to 80% faster to calculate intersections, depending on configuration), take less memory (40% smaller), and … Continued

Solr scalability improvements

posted in: java, lucene, search, solr | 0

With CPU cores constantly increasing, there has been some major work done in Lucene/Solr to increase the scalability under multi-threaded load. Read-only IndexReaders One bottleneck was synchronization around the checking of deleted docs in a Lucene IndexReader.  Since another thread … Continued

Solr Faceted Search Performance Improvements

posted in: java, lucene, search, solr | 0

Having performance issues with Solr’s faceted search and certain types of fields?  Help has arrived in the form of a new Solr faceting algorithm!  This new faceting implementation dramatically improves the performance of faceted search, making it suitable for a … Continued

lookup3ycs : a standard high performance string hash

posted in: java, search | 0

I was surprised to discovered that there isn’t a good cross-platform hash function defined for strings. MD5, SHA, FVN, etc, all define hash functions over bytes, meaning that it’s under-specified for strings. So I set out to create a standard … Continued

Distributed Search for Solr

posted in: java, lucene, search, solr | 0

A new chapter in Solr scalability has been opened with the addition of distributed search! http://wiki.apache.org/solr/DistributedSearch Distributed Search splits an index into multiple shards, and queries across all the shards, combining the results and presenting a single merged response that … Continued

Solr at Web 2.0 Expo Berlin

posted in: search | 0

I’ll be giving a Solr presentation Nov 8th in Berlin, titled “Add Powerful Full Text Search to Your Web App with Solr“. Should be fun, just wish I had more free time while in Berlin…