Follow Us:

Call Now! +39 0761 1916790

Apache Solr: How To Implement a Search Workflow

In this post we describe an approach for orchestrating a search workflow in Apache Solr using Invisible Queries .

What are “Invisible Queries”?

The following is an extract of an article [1] on Lucidworks.com, by Grant Ingersoll, talking about invisible queries:

“It is often necessary in many applications to execute more than one query for any given user query.  For instance, in applications that require very high precision (only good results, forgoing marginal results), the app. may have several fields, one for exact matches, one for case-insensitve matches and yet another with stemming.  Given a user query, the app may try the query against the exact match field first and if there is a result, return only that set.  If there are no results, then the app would proceed to search the next field, and so on.”

(source: https://lucidworks.com/blog/2009/08/12/fake-and-invisible-queries)

The sentence above assumes a scenario where the (client) application issues to Solr several and subsequent requests on top of a single user query (i.e. one user query => many search engine queries). 

What about you don’t have such control? Imagine you’re the search engineer of an e-commerce portal that has been built using Magento; someone installed and configured the Solr connector and everything is working: when the user submits a search, the connector forwards the request to Solr, which in turns executes a query.

The Context

Now, imagine the query above returns no results. The whole request / response interaction is gone, the user sees something like “Sorry, no results for your search”.

Although this could sound perfectly reasonable, in this post we will focus on a different approach based on “invisible queries”. The main point here is a precondition: I cannot change the client code; this because (for example):

  • I don’t want to introduce custom code in my Magento / Drupal instance
  • I don’t know PHP
  • it’s not possible to implement that workflow on the client side
  • I want to move as much as possible the search logic in Solr

What I’d like to have is a single entry point (i.e. one single request handler) exposed to my clients. That endpoint should be able to execute a workflow like this:

An Example of Search Workflow

The CompositeRequestHandler

Apache Solr implements the concept of endpoint by means of a component called RequestHandler. You can have multiple RequestHandler instances in configuration therefore providing multiple endpoints.

The idea is to provide a Facade Request Handler which is able to chain several other handlers; the configuration would look like this:

<requestHandler name="/search" class="...CompositeRequestHandler">      
    <str name="chain">/rh1,/rh2,/rh3</str> 
</requestHandler> 

/rh1, /rh2 and /rh3 are standard SearchHandler instances already declared that are chained sequentially as depicted in the workflow above.

The CompositeRequestHandler implementation is actually simple: its handleRequestBody method executes, sequentially, the configured handler references, and it breaks the execution chain after receiving the first positive query response (usually that is a query response with numFound > 0, although the last version of the component allows to configure also other predicates). The logic would be something like this:

chain.stream()
    // Get the request handler associated with a given name
    .map(refName -> requestHandler(request, refName))
    // Only SearchHandler instances are allowed in the chain
    .filter(SearchHandler.class::isInstance) 
    // executes the handler logic 
    .map(handler -> executeQuery(request, response, params, handler))
    .filter(qresponse -> howManyFound(qresponse) > 0)
    // Stop the iteration when the first condition above has been satisfied
    .findFirst()
    // or, if we don't have any positive executions, just returns an empty response.
    .orElse(emptyResponse(request, response)));

You can find the source code of CompositeRequestHandler here. As usual, any feedback is warmly welcome.

Andrea Gazzarini

Andrea Gazzarini is a curious software engineer, mainly focused on the Java technology. He strongly loves coding and definitely likes to be considered a developer. Andrea has more than 15 years of experience in various software engineering areas, from telecommunications to banking. He has worked for several medium- and large-scale companies, such as IBM and Orga Systems. Andrea has several certifications in the Java programming language (programmer, developer, web component developer, business component developer, and JEE architect), BEA products (build and portal solutions), and Apache Solr (Lucid Apache Solr/Lucene Certified Developer).

No Comments

Post a Comment

Comment
Name
Email
Website