Getting Started with Solr: a Simple Solr Tutorial
1. Download Solr
Download your Solr distribution of choice:
- Apache Solr, the original open source search server.
- Heliosearch/Solr, which has additional enhancements like
- HDS, which is Apache Solr with Tomcat
You only need to download the single .ZIP or .TGZ file and extract it anywhere you like – no installation is required!!
2. Start the Server
$ cd example $ java -jar start.jar
3. Go!
You’re now ready to start using Heliosearch/Solr!
To verify it’s up and running, you can point your browser at the admin page:
If something didn’t work, check if you have the proper prerequisites.
Basic Commands
Now that Solr is running, we can add a document (also known as “indexing” a document):
$ curl http://localhost:8983/solr/update -H 'Content-type:application/json' -d ' [ {"id" : "book1", "title" : "American Gods", "author" : "Neil Gaiman" } ]'
And then we can ask for it back:
$ curl http://localhost:8983/solr/get?id=book1
{ "doc": { "id" : "book1", "author": "Neil Gaiman", "title" : "American Gods", "_version_": 1410390803582287872 } }
Of course for queries, you can always just use your browser and click on the link
or cut’n’paste the URL into your browser and modify the query directly in the address bar to try out different requests.
The author
and title
fields are pre-defined in the schema, but Solr can use convention over configuration for new fields if one does not wish to edit the schema. In this manner, Solr includes the essential benefits of schemaless – namely the ability to add new fields on the fly without having to pre-define them.
Let’s update book1
with cat
, a category field, and two new fields that haven’t been defined in the schema, a publication year, and an ISBN. Via dynamic fields, a field name ending with _i
tells Solr to treat the value as an integer
, while a field name ending with _s
is treated as a string
.
$ curl http://localhost:8983/solr/update -H 'Content-type:application/json' -d ' [ {"id" : "book1", "cat" : { "add" : "fantasy" }, "pubyear_i" : { "add" : 2001 }, "ISBN_s" : { "add" : "0-380-97365-0"} } ]'
By using convention via dynamicFields, Solr avoids the pitfalls of trying to guess at the types of new fields while retaining the benefits of dynamically adding new fields as needed.
Now let’s add a few more documents, this time in CSV (comma separated values) format:
$ curl http://localhost:8983/solr/update?commitWithin=5000 -H 'Content-type:text/csv' -d ' id,cat,pubyear_i,title,author,series_s,sequence_i book2,fantasy,1996,A Game of Thrones,George R.R. Martin,A Song of Ice and Fire,1 book3,fantasy,1999,A Clash of Kings,George R.R. Martin,A Song of Ice and Fire,2 book4,sci-fi,1951,Foundation,Isaac Asimov,Foundation Series,1 book5,sci-fi,1952,Foundation and Empire,Isaac Asimov,Foundation Series,2 book6,sci-fi,1992,Snow Crash,Neal Stephenson,Snow Crash, book7,sci-fi,1984,Neuromancer,William Gibson,Sprawl trilogy,1 book8,fantasy,1985,The Black Company,Glen Cook,The Black Company,1 book9,fantasy,1965,The Black Cauldron,Lloyd Alexander,The Chronicles of Prydain,2 '
We added the commitWithin=5000
parameter to indicate that we would like our updates to be visible within 5000 milliseconds (5 seconds). The Lucene library that Solr uses for full-text search works off of point-in-time snapshots that must be periodically updated in order for queries to see new changes.
Note that although we often use JSON in our examples, Solr is actually data format agnostic – you’re not artificially tied to any particular transfer-syntax or serialization format such as JSON or XML.
Now let’s query our book collection! For example, we can find all books with “black” in the title field:
The fl
parameter stands for “field list” and specifies what stored fields should be returned from documents matching the query. We should see a result like the following:
{"response":{"numFound":2,"start":0,"docs":[ { "title":["The Black Company"], "author":"Glen Cook"}, { "title":["The Black Cauldron"], "author":"Lloyd Alexander"}] }}
Advanced Query Example
Let’s try a more advanced query that combines many elements – limiting the number of books shown for any given series to 1 by grouping documents by series_s, sorting by publication year descending, and requesting facet counts for the book category:
We can see how easy it is to construct and understand even a complex request by stepping through the parameters:
-
q=*:*
the main query,*:*
matches all documents -
fl=id,title,series_s,pubyear_i
field list – the list of fields we want to return for matching documents -
sort=pubyear_i desc
sorts the list of matching documents bypubyear_i
in descending order -
group=true
turns on the grouping / field-collapsing feature -
group.main=true
put the grouped documents where the main query results normally appear instead of in thegrouped
section of the response. -
group.field=series_s
group together matching documents by theseries_s
field -
facet=true
turns on the faceting feature -
facet.field=cat
get facet counts for each value of thecat
field. In this example, we have 5 “fantasy” books and 4 “sci-fi” books that match the query
Notice that by using simple parameters, as opposed to a compilcated hierarchial DSL, it’s very easy to add additional request parameters without worrying about matching up braces or how they nest within a request. For example, if you wanted to get facet counts by publication year, you could simply add facet.field=pubyear_i
anywhere in the list of request parameters.
Simple parameter-based requests are especially valuable during ad-hoc testing where it’s easy to add, remove, and edit request parameters right in the address bar of your browser! They also play nicer with HTML forms which can directly create Solr requests from the request parameters.
Next Steps
Welcome to the community! Now that you’ve discovered just how easy it is to get up and running, you should check out all of the other powerful features that Heliosearch/Solr has to offer.
Remember to subscribe to the where you’ll meet a ton of helpful users and developers!