Follow Us:

Call Now! +39 0761 1916790

Randomizing top-n results in Solr

After shuffling a bit [1] the top-n search results returned by Solr, you may want to effectively randomize them in a non-repeatable way. 

What we want to do is: run a query and (pseudo) randomly reorder the first top results. I will be using again the query reranking feature, but this time, I need a re-ranking query that produces different results each time is invoked.

I created a simple function [2] (i.e. a ValueSourceParser plus a ValueSource subclasses) that is based on a (threaded-local) java.util.Random instance which simply returns a (pseudo) random number each time it is invoked.

Once the two classes have been packed in a jar, put under the lib folder and configured in solrconfig.xml with the name rnd:

<valueSourceParser 
     name="rnd"   
     class="com.faearch.search.function.RandomValueSourceParser"/>

I only need to use it in a re-rank query using the boost parser:

<requestHandler ...>

  <str name="rqq">{!boost b=rnd() v=$q}</str>
  <str name="rq">{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=1.2}</str>
...

You can now start Solr, index some document, run several times the same query (by default ordered by score) and see what happens.  Don’t forget to include the score in the field list (fl) parameter; in this way you will see the concrete effect of the multiplicative random boost:  

http://…?q=shoes&fl=score,*

<result name="response" numFound="2" start="0" maxScore="0.32487732">  <doc>
  <str name="product_name">shoes B</str>
  <float name="score">0.32487732</float></doc>
  <doc>
  <str name="product_name">shoes A</str>
  <float name="score">0.22645184</float></doc>
</result>

And after running a second time:

<result name="response" numFound="2" start="0" maxScore="0.61873287">  <doc>
  <str name="product_name">shoes B</str>
  <float name="score">0.61873287</float></doc>
  <doc>
  <str name="product_name">shoes A</str>
  <float name="score">0.3067757</float></doc>
</result>

ooops that’s the same order…don’t worry, it’s the randomness, and I indexed only 2 docs, see the score value, which is different from the previous example. Let’s try again:

<result name="response" numFound="2" start="0" maxScore="0.24988756">  <doc>
  <str name="product_name">shoes A</str>
  <float name="score">0.24988756</float></doc>
  <doc>
  <str name="product_name">shoes B</str>
  <float name="score">0.22548665</float></doc>
</result>


[1] http://www.spaziocodice.com/2015/11/08/shuffling-top-results-in-solr-with-query-re-ranking

[2] https://gist.github.com/agazzarini/a802eff3b50c03fae2364458719be94e

Andrea Gazzarini

Andrea Gazzarini is a curious software engineer, mainly focused on the Java technology. He strongly loves coding and definitely likes to be considered a developer. Andrea has more than 15 years of experience in various software engineering areas, from telecommunications to banking. He has worked for several medium- and large-scale companies, such as IBM and Orga Systems. Andrea has several certifications in the Java programming language (programmer, developer, web component developer, business component developer, and JEE architect), BEA products (build and portal solutions), and Apache Solr (Lucid Apache Solr/Lucene Certified Developer).

No Comments

Sorry, the comment form is closed at this time.