Apache Solr: Loading Data at Startup

Apache Solr: Loading Data at Startup

SolrEventListener is an interface that defines a set of callbacks on several lifecycle events:

  • void postCommit()
  • void postSoftCommit()
  • void newSearcher(SolrIndexSearcher newSearcher, SolrIndexSearcher currentSearcher)

For this example, I’m not interested in the two first callbacks because the corresponding invocations will happen, as their name suggests, after hard and soft commit events.
The interesting method is instead newSearcher(…) which allows me to register a custom event listener associated with two events:

  • firstSearcher
  • newSearcher

 

In Solr, the Index Searcher which serves requests at a given time is called the current searcher.  At startup time, there’s no current searcher because the first one is created; hence we are in the “firstSearcher” event, which is exactly what I was looking for 😉

When another (i.e. new) searcher is opened, it is prepared (i.e., auto-warmed) while the current one still serves the incoming requests. When the new searcher is ready, it will become the current searcher, it will handle any new search requests, and the old searcher will be closed (as soon as all requests it was servicing finished). This scenario is where the “newSearcher” callback is invoked.

As you can see, the callback method for those two events is the same; there’s no a “firstSearcher” and a “newSearcher” method. The difference resides in the input arguments: for “firstSearcher” events, there’s no currentSearcher so the second argument is null; this is obviously not true for “newSearcher” callbacks where both the first and second arguments contain a valid searcher reference.

Returning to my scenario, all that I need 

  • to declare that listener in solrconfig.xml
  • a concrete implementation of SolrEventListener

In solrconfig.xml, within the <updateHandler> section, I can declare my listener:

				
					<listener event="firstSearcher" class="a.b.c.SolrStartupListener">
    <str name="datafile">${solr.solr.home}/sample/data.xml&lt;/str>
</listener>
				
			

The listener will be initialized with just one parameter, the file that contains the sample data. Using the “event” attribute I can inform Solr about the kind of event I’m interested on (i.e firstSearcher).

The implementation class is quite simple: it extends SolrEventListener:

				
					public class SolrStartupListener implements SolrEventListener {
...

    @Override
    public void init(final NamedList args) {
        this.datafile = (String) args.get("datafile");
    }
    ...
    
    LocalSolrQueryRequest request = null;
    try {
           // 1. Create the arguments map for the update request
           final NamedList args = new SimpleOrderedMap();
            args.add(
                    UpdateParams.ASSUME_CONTENT_TYPE,  
                    "text/xml");
            addEventParms(currentSearcher, args);

            // 2. Create a new Solr (update) request
            request = new LocalSolrQueryRequest(
                     newSearcher.getCore(), 
                     args);
           
            // 3. Fill the request with the (datafile) input stream
            final List streams = new ArrayList();
            streams.add(new ContentStreamBase() {
                @Override
                public InputStream getStream() throws IOException {
                    return new FileInputStream(datafile);
                }
            });
           
            request.setContentStreams(streams);
           
            // 4. Creates a new Solr response
            final SolrQueryResponse response = 
                new SolrQueryResponse();
           
            // 5. And finally call invoke the update handler
            SolrRequestInfo.setRequestInfo( 
                new SolrRequestInfo(request, response))

            newSearcher
                 .getCore()
                 .getRequestHandler("/update")
                 .handleRequest(request, response);    
  
        } finally {
            request.close();
        }
    }
}
				
			

Voilà: if you start Solr, you will see sample data loaded. Other than avoiding a lot of repetitive tasks, this could be useful when you’re using a SolrCore as a NoSql storage, like for example, if you are storing SKOS vocabularies for synonyms, translations, and broader / narrower searches.    

Share this post

Leave a Reply

Discover more from SpazioCodice

Subscribe now to keep reading and get access to the full archive.

Continue reading