If you’ve ever integrated search into your website, chances are you’ve stumbled across Solr and its close relative, Lucene. Solr is a popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search.
Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world’s largest internet sites, including ZoomInfo. Here at Zoom, we use Solr pervasively – it’s integrated as a service layer and powers everything from our website, to our API, to our CRM integrations, to our data services products.
Zoom’s Solr index is bigger than most – ours hosts roughly 100 million person and company profiles and serves millions of requests per day. Being such a pivotal part of our infrastructure, you can be darn sure that we thoroughly test our indexing and search service layers.
Simplified a bit, one way that we test Solr is by feeding it documents and asking it queries as part of a ~100k document regression suite. The hardest part was figuring how to set up the testing framework properly to ensure both correctness and isolation. You don’t want tests stepping on each other’s toes, now do you?
It turns out that there is a supported (though woefully under-documented) way to run a Solr Server right in-process. No need for an application container, not even Jetty. This ends up being not only a great building block for a testing framework, but a fantastic way to get around one of Solr’s biggest limitations – that it’s insanely slow if you need to return more than a few pages of results. By stacking things the way we did, you can get all of the benefits of both Solr and Lucene, with none of the drawbacks. More on that later.
Now, I’ll preface this by warning you that this approach won’t do for testing Solr’s performance, sharding, or replication features. But in our experience, it’s a darn good way to do functional tests. Let’s take a look.
We start off with a generic class meant to encapsulate access to our Solr server. This is a dumbed-down version of Zoom’s InProcessSolrServer class. In our version, we do some additional specialization, like deploying our custom solrconfig.xml, schema, stop words list, plugins, and the like. That would needlessly complicate the example, though. What the class amounts to is a lot of boring boilerplate to create a delegate EmbeddedSolrServer that we’ll pass each request to. Importantly, in Solr, a “request” can be any command – it could be a query, an instruction to index a document, or anything in between. Tying this back to “normal” Solr, the usual HTTP-based servlet embedded in Jetty does something just like this. We add a few small utility methods at the end, because they’re useful for stacking our embedded Solr instance with Lucene.
So now that we can embed Solr in-process, how should we leverage that in our testing frameworks? Zoom has two general approaches:
- For tests that explicitly involve our “indexing” service, create an empty InProcessSolrServer and feed it documents. To test that indexing worked as-expected, we either:
- Run a comprehensive set of queries against the Solr Server, save the results to XML, and “diff” the results against some baseline.
- Use the Lucene integration to dump the index’s stored fields to some more human-readable form, like XML or CSV, and then “diff” the results against some baseline.
- For tests that are more consumers of the index, we pre-can indexes inside of ZIP archives. Our test’s setup method extracts the index, wraps it in an InProcessSolrServer, and then runs a set of queries against that.
The second code snippet below roughly illustrates how we do option #2.
Finally, let’s look at how to get the best of both the Solr and Lucene worlds. The final code snippet is a simple method that lets you execute Solr queries and get back a Lucene DocIterator object. This lets you use your Solr plugins, sorting algorithms, query syntax, sort order, and – most importantly – your Solr-based code, while being able to efficiently enumerate all of the documents in your result set. For instance, a “*:*” query at Zoom will match 100 million documents, and this approach can export them all to a massive CSV file in roughly an hour. For each matching Lucene document, a simple callback is invoked. Some utility code for converting between Solr and Lucene documents is also included, so that your existing Solr-based code can continue to work without modification.
/**
* This code is provided under the MIT License (http://www.opensource.org/licenses/mit-license.php).
* It depends on the following 3rd-party packages:
*
* * Apache Solr: http://lucene.apache.org/solr/
* * Spring Framework: http://www.springsource.org/
*/
package com.zoominfo.util.solrembed;
import java.io.Closeable;
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrRequest;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.core.CoreDescriptor;
import org.apache.solr.core.CoreContainer;
import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer;
import org.apache.solr.core.SolrCore;
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.search.SolrIndexSearcher;
import org.apache.solr.util.RefCounted;
import org.springframework.beans.factory.annotation.Required;
import org.xml.sax.SAXException;
/**
* A class that manages the life-cycle of an in-process Solr server.
*/
public class InProcessSolrServer extends SolrServer implements Closeable {
private File solrdir = null;
private File datadir = null;
private SolrServer delegate = null;
private transient SolrCore core = null;
/**
* <p>Sets your Solr root directory. In Solr’s documentation,
* this is generally referred to as "/solr-root". Your "conf"
* directory (containing your schema, stopwords, synonyms, …)
* will be a subdirectory of this.</p>.
*
* @param solrdir
*/
@Required
public final void setSolrdir(final File solrdir) {
this.solrdir = solrdir;
System.setProperty("solr.home", solrdir.getPath());
if (this.datadir == null) {
setDatadir(new File(solrdir, "data"));
}
}
/**
* <p>Sets your Solr data directory. This is the parent directory
* of your "index" and "spellchecker" directories.</p>
*
* @param datadir
*/
public final void setDatadir(final File datadir) {
this.datadir = datadir;
System.setProperty("solr.data.dir", datadir.getPath());
}
/**
* <p>The only @SolrServer method that you need to override. This method
* passes all queries and indexing events on to an in-process delegate.</p>
*
* @param req
* @return
* @throws SolrServerException
* @throws IOException
*/
@Override
public NamedList<Object> request(final SolrRequest req)
throws SolrServerException,
IOException {
try {
return getDelegate().request(req);
} catch (final SolrServerException e) {
throw e;
} catch (final IOException e) {
throw e;
} catch (final Exception e) {
throw new SolrServerException(e);
}
}
@Override
public synchronized void close() {
if (core != null) {
core.close();
core = null;
}
}
@Override
@SuppressWarnings("FinalizeDeclaration")
protected void finalize() throws Throwable {
close();
super.finalize();
}
/**
* This method creates an in-process Solr server that otherwise behaves just
* as you’d expect.
*/
private synchronized SolrServer getDelegate() throws SolrServerException {
if (delegate != null) {
return delegate;
}
try {
File solrconfigXml = new File(new File(solrdir, "conf"), "solrconfig.xml");
CoreContainer container = new CoreContainer(solrdir.getPath(), solrconfigXml);
CoreDescriptor descriptor = new CoreDescriptor(container, "core1", solrdir.getCanonicalPath());
core = container.create(descriptor);
container.register("core1", core, false);
delegate = new EmbeddedSolrServer(container, "core1");
return delegate;
} catch (ParserConfigurationException ex) {
throw new SolrServerException(ex);
} catch (SAXException ex) {
throw new SolrServerException(ex);
} catch (IOException ex) {
throw new SolrServerException(ex);
}
}
/**
* SolrIndexSearcher adds schema awareness and caching functionality over the Lucene IndexSearcher.
*
* @return
* @throws SolrServerException
*/
public RefCounted<SolrIndexSearcher> getIndexSearcher() throws SolrServerException {
// http://lucene.apache.org/solr/api/org/apache/solr/search/SolrIndexSearcher.html
getDelegate(); // force the delegate to be created
return core.getSearcher();
}
/**
* Returns the index schema used by this Solr server
*
* @return
* @throws SolrServerException
*/
public IndexSchema getIndexSchema() throws SolrServerException {
getDelegate(); // force the delegate to be created
return core.getSchema();
}
}
package com.zoominfo.util.solrembed;
import org.apache.solr.client.solrj.SolrServer;
import java.io.IOException;
import com.zoominfo.util.ZipUtil;
import com.zoominfo.util.solrembed.InProcessSolrServer;
import java.io.File;
import static org.apache.commons.io.FileUtils.deleteDirectory;
public abstract class InProcessSolrServerTestBase {
private InProcessSolrServer solrServer = null;
private File solrOutFolder = null;
public SolrServer getSolrServer() {
return solrServer;
}
/**
* <p>Deploys a pre-built Solr index, and creates an in-process Solr server.
* This is meant to be called from an @Before or @BeforeClass type of method.</p>
*
* @param solrIndexZip a pre-built Solr index, bundled into a ZIP
* @throws IOException
*/
protected void deploySolrIndex(final File solrIndexZip) throws IOException {
assert solrOutFolder == null;
solrOutFolder = File.createTempFile("solr", "index");
solrOutFolder.delete();
solrOutFolder.mkdirs();
// unzip a canned solr index. code not provided.
ZipUtil.unzipFolder(solrIndexZip, solrOutFolder);
// create an in-process solr server
solrServer = new InProcessSolrServer(solrOutFolder);
}
/**
* <p>Un-deploys the in-process Solr server.
* Meant to be called from an @After or @AfterClass method.</p>
* @throws IOException
*/
protected void tearDown() throws IOException {
if (solrOutFolder != null) {
solrServer.close(); // turn off the solr server
deleteDirectory(solrOutFolder); // delete the solr directory, recursively
}
}
}
/**
* Runs the specified query, passing every result to the specified Callback argument
*
* @see SolrIndexSearcher.getDocList
*
* @param solrServer the Solr Server you want to search
* @param query the Solr query
* @param filterList the list of documents to not return. may be null
* @param sort criteria by which to sort (if null, query relevance is used)
* @param start offset into the list of documents to return
* @param length maximum number of documents to return. -1 returns all documents
* @param callback action performed for each matching document
* @param closeIndexSearcher pass true to keep the searcher open for more queries
* @throws Exception
*/
public static void run
(final InProcessSolrServer solrServer,
final Query query,
final List<Query> filterList,
final Sort sort,
final int start,
int length,
final Callback callback,
final boolean closeIndexSearcher
) throws Exception {
RefCounted
<SolrIndexSearcher
> indexSearcherRef =
null;
try {
indexSearcherRef = solrServer.getIndexSearcher();
SolrIndexSearcher indexSearcher = indexSearcherRef.get();
if (length < 0) {
length = indexSearcher.getIndexReader().maxDoc();
}
DocList docList = indexSearcher.getDocList(query, filterList, sort, start, length, 0);
callback.setIndexSchema(solrServer.getIndexSchema());
callback.begin();
DocIterator iter = docList.iterator();
while (iter.hasNext()) {
Document doc = indexSearcher.doc(iter.nextDoc());
callback.collect(doc);
}
callback.end();
if(closeIndexSearcher) {
indexSearcher.close();
}
} finally {
if (indexSearcherRef != null) {
indexSearcherRef.decref();
}
}
}
/**
* A utility method that converts a Lucene document to a Solr document
*
* @param doc a Lucene document
* @return an equivalent Solr document
* @throws Exception
*/
public SolrDocument convertToSolrDocument(final Document doc) throws Exception {
SolrDocument solrDoc = new SolrDocument();
// load the solr doc from the lucene doc
new org.apache.solr.update.DocumentBuilder(schema).loadStoredFields(solrDoc, doc);
return solrDoc;
}
/**
* A utility method that converts a Solr document to a Lucene document
*
* @param doc a Solr document
* @return an equivalent Lucene document
*/
public Document convertToLuceneDocument(final SolrDocument doc) {
// load the lucene doc from the solr doc
return DocumentBuilder.toDocument(org.apache.solr.client.solrj.util.ClientUtils.toSolrInputDocument(doc), schema);
}