Randomizing top-n results in Solr
After shuffling a bit [1] the top-n search results returned by Solr, you may want to effectively randomize them in a non-repeatable way.
What we want to do is: run a query and (pseudo) randomly reorder the first top results. I will be using again the query reranking feature, but this time, I need a re-ranking query that produces different results each time is invoked.
I created a simple function [2] (i.e. a ValueSourceParser plus a ValueSource subclasses) that is based on a (threaded-local) java.util.Random instance which simply returns a (pseudo) random number each time it is invoked.
Once the two classes have been packed in a jar, put under the lib folder and configured in solrconfig.xml with the name rnd:
<valueSourceParser name="rnd" class="com.faearch.search.function.RandomValueSourceParser"/>
I only need to use it in a re-rank query using the boost parser:
<requestHandler ...> <str name="rqq">{!boost b=rnd() v=$q}</str> <str name="rq">{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=1.2}</str> ...
You can now start Solr, index some document, run several times the same query (by default ordered by score) and see what happens. Don’t forget to include the score in the field list (fl) parameter; in this way you will see the concrete effect of the multiplicative random boost:
http://…?q=shoes&fl=score,*
<result name="response" numFound="2" start="0" maxScore="0.32487732"> <doc> <str name="product_name">shoes B</str> <float name="score">0.32487732</float></doc> <doc> <str name="product_name">shoes A</str> <float name="score">0.22645184</float></doc> </result>
And after running a second time:
<result name="response" numFound="2" start="0" maxScore="0.61873287"> <doc> <str name="product_name">shoes B</str> <float name="score">0.61873287</float></doc> <doc> <str name="product_name">shoes A</str> <float name="score">0.3067757</float></doc> </result>
ooops that’s the same order…don’t worry, it’s the randomness, and I indexed only 2 docs, see the score value, which is different from the previous example. Let’s try again:
<result name="response" numFound="2" start="0" maxScore="0.24988756"> <doc> <str name="product_name">shoes A</str> <float name="score">0.24988756</float></doc> <doc> <str name="product_name">shoes B</str> <float name="score">0.22548665</float></doc> </result>
[1] http://www.spaziocodice.com/2015/11/08/shuffling-top-results-in-solr-with-query-re-ranking
[2] https://gist.github.com/agazzarini/a802eff3b50c03fae2364458719be94e