The Apache Hadoop project develops open-source software for reliable, scalable,  distributed computing. The Hadoop framework allows for distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Since its initial release in late 2007, Hadoop has become the leading way to do Data Mining and Distributed Computing. The project enjoys support from major backers such as Yahoo! and Cloudera and a very broad adoption rate by both large and small companies. Right now, there are over 4100 Hadoop-related jobs posted on Indeed. That’s 3x the number of Django jobs listed, and 5x more than the Node.js framework.

Here at Zoom, we employ Hadoop for a wide variety of data processing and data mining tasks. As one example, we’ve got 12 years of crawler data archived, comprising some 50 TB of information. That’s a fantastically rich corpus primed for data mining. This is what I’d refer to as a “traditional” use of Hadoop. That is, we store a massive amount of data in HDFS, and then run MapReduce jobs against it, looking for interesting information. This use case is right in Hadoop’s sweet spot – if you bring the computation to where the data lives, you can achieve massive parallelism without worrying about things like network latency and network throughput.

But not all of our uses of Hadoop are so traditional. Given the variety of different data collection & data processing tasks Zoom performs, not all of them lend themselves to a MapReduce model. For example, some of them query databases or Solr servers. Some make RESTful API requests to Google. Some run IMAP commands. Some crawl websites. But, in our opinion anyway, many of these use cases till lend themselves well to the Hadoop framework. What we generally end up doing is defining a work queue (eg: a crawl schedule) in HDFS, and store the results back into HDFS for use by other jobs.

Zoom isn’t in the business of building platforms, and you probably shouldn’t be either. It’s usually a much better use of resources to focus on your core competencies and do the things that make your company the best widget maker on the planet. Ready-made platforms generally reduce development costs and shrink time to market. And with today’s robust Open Source ecosystem, there are few reasons not to use off the shelf platforms like Hadoop. If you need a platform that’s:

  • Horizontally and vertically scalable (ideally, with process isolation)
  • Fault-tolerant
  • Highly available
  • Complete with a simple reporting framework
  • Complete with a simple management/administration framework

And you need it done quickly & cheaply, Hadoop is definitely worth checking out.

Posted in ZoomInfo News | Leave a comment

If you’ve ever integrated search into your website, chances are you’ve stumbled across Solr and its close relative, Lucene. Solr is a popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search.

Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world’s largest internet sites, including ZoomInfo. Here at Zoom, we use Solr pervasively – it’s integrated as a service layer and powers everything from our website, to our API, to our CRM integrations, to our data services products.

Zoom’s Solr index is bigger than most – ours hosts roughly 100 million person and company profiles and serves millions of requests per day. Being such a pivotal part of our infrastructure, you can be darn sure that we thoroughly test our indexing and search service layers.

Simplified a bit, one way that we test Solr is by feeding it documents and asking it queries as part of a ~100k document regression suite. The hardest part was figuring how to set up the testing framework properly to ensure both correctness and isolation. You don’t want tests stepping on each other’s toes, now do you? :)

It turns out that there is a supported (though woefully under-documented) way to run a Solr Server right in-process. No need for an application container, not even Jetty. This ends up being not only a great building block for a testing framework, but a fantastic way to get around one of Solr’s biggest limitations – that it’s insanely slow if you need to return more than a few pages of results. By stacking things the way we did, you can get all of the benefits of both Solr and Lucene, with none of the drawbacks. More on that later.

Now, I’ll preface this by warning you that this approach won’t do for testing Solr’s performance, sharding, or replication features. But in our experience, it’s a darn good way to do functional tests. Let’s take a look.

We start off with a generic class meant to encapsulate access to our Solr server. This is a dumbed-down version of Zoom’s InProcessSolrServer class. In our version, we do some additional specialization, like deploying our custom solrconfig.xml, schema, stop words list, plugins, and the like. That would needlessly complicate the example, though. What the class amounts to is a lot of boring boilerplate to create a delegate EmbeddedSolrServer that we’ll pass each request to. Importantly, in Solr, a “request” can be any command – it could be a query, an instruction to index a document, or anything in between. Tying this back to “normal” Solr, the usual HTTP-based servlet embedded in Jetty does something just like this. We add a few small utility methods at the end, because they’re useful for stacking our embedded Solr instance with Lucene.

So now that we can embed Solr in-process, how should we leverage that in our testing frameworks? Zoom has two general approaches:

  1. For tests that explicitly involve our “indexing” service, create an empty InProcessSolrServer and feed it documents. To test that indexing worked as-expected, we either:
    1. Run a comprehensive set of queries against the Solr Server, save the results to XML, and “diff” the results against some baseline.
    2. Use the Lucene integration to dump the index’s stored fields to some more human-readable form, like XML or CSV, and then “diff” the results against some baseline.
  2. For tests that are more consumers of the index, we pre-can indexes inside of ZIP archives. Our test’s setup method extracts the index, wraps it in an InProcessSolrServer, and then runs a set of queries against that.

The second code snippet below roughly illustrates how we do option #2.

Finally, let’s look at how to get the best of both the Solr and Lucene worlds. The final code snippet is a simple method that lets you execute Solr queries and get back a Lucene DocIterator object. This lets you use your Solr plugins, sorting algorithms, query syntax, sort order, and – most importantly – your Solr-based code, while being able to efficiently enumerate all of the documents in your result set. For instance, a “*:*” query at Zoom will match 100 million documents, and this approach can export them all to a massive CSV file in roughly an hour. For each matching Lucene document, a simple callback is invoked. Some utility code for converting between Solr and Lucene documents is also included, so that your existing Solr-based code can continue to work without modification.


/**
 * This code is provided under the MIT License (http://www.opensource.org/licenses/mit-license.php).
 * It depends on the following 3rd-party packages:
 *
 * * Apache Solr: http://lucene.apache.org/solr/
 * * Spring Framework: http://www.springsource.org/
 */

package com.zoominfo.util.solrembed;

import java.io.Closeable;
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;

import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrRequest;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.core.CoreDescriptor;
import org.apache.solr.core.CoreContainer;
import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer;
import org.apache.solr.core.SolrCore;
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.search.SolrIndexSearcher;
import org.apache.solr.util.RefCounted;
import org.springframework.beans.factory.annotation.Required;
import org.xml.sax.SAXException;

/**
 * A class that manages the life-cycle of an in-process Solr server.
 */

public class InProcessSolrServer extends SolrServer implements Closeable {

    private File solrdir = null;
    private File datadir = null;
    private SolrServer delegate = null;
    private transient SolrCore core = null;

    /**
     * <p>Sets your Solr root directory. In Solr’s documentation,
     * this is generally referred to as "/solr-root". Your "conf"
     * directory (containing your schema, stopwords, synonyms, …)
     * will be a subdirectory of this.</p>.
     *
     * @param solrdir
     */

    @Required
    public final void setSolrdir(final File solrdir) {
        this.solrdir = solrdir;

        System.setProperty("solr.home", solrdir.getPath());

        if (this.datadir == null) {
            setDatadir(new File(solrdir, "data"));
        }
    }

    /**
     * <p>Sets your Solr data directory. This is the parent directory
     * of your "index" and "spellchecker" directories.</p>
     *
     * @param datadir
     */

    public final void setDatadir(final File datadir) {
        this.datadir = datadir;
        System.setProperty("solr.data.dir", datadir.getPath());
    }

    /**
     * <p>The only @SolrServer method that you need to override. This method
     * passes all queries and indexing events on to an in-process delegate.</p>
     *
     * @param req
     * @return
     * @throws SolrServerException
     * @throws IOException
     */

    @Override
    public NamedList<Object> request(final SolrRequest req)
            throws SolrServerException,
            IOException {
        try {
            return getDelegate().request(req);
        } catch (final SolrServerException e) {
            throw e;
        } catch (final IOException e) {
            throw e;
        } catch (final Exception e) {
            throw new SolrServerException(e);
        }
    }

    @Override
    public synchronized void close() {
        if (core != null) {
            core.close();
            core = null;
        }
    }

    @Override
    @SuppressWarnings("FinalizeDeclaration")
    protected void finalize() throws Throwable {
        close();
        super.finalize();
    }

    /**
     * This method creates an in-process Solr server that otherwise behaves just
     * as you’d expect.
     */

    private synchronized SolrServer getDelegate() throws SolrServerException {
        if (delegate != null) {
            return delegate;
        }

        try {
            File solrconfigXml = new File(new File(solrdir, "conf"), "solrconfig.xml");

            CoreContainer container = new CoreContainer(solrdir.getPath(), solrconfigXml);
            CoreDescriptor descriptor = new CoreDescriptor(container, "core1", solrdir.getCanonicalPath());

            core = container.create(descriptor);
            container.register("core1", core, false);
            delegate = new EmbeddedSolrServer(container, "core1");

            return delegate;
        } catch (ParserConfigurationException ex) {
            throw new SolrServerException(ex);
        } catch (SAXException ex) {
            throw new SolrServerException(ex);
        } catch (IOException ex) {
            throw new SolrServerException(ex);
        }
    }

    /**
     * SolrIndexSearcher adds schema awareness and caching functionality over the Lucene IndexSearcher.
     *
     * @return
     * @throws SolrServerException
     */

    public RefCounted<SolrIndexSearcher> getIndexSearcher() throws SolrServerException {
        // http://lucene.apache.org/solr/api/org/apache/solr/search/SolrIndexSearcher.html
        getDelegate(); // force the delegate to be created

        return core.getSearcher();
    }

    /**
     * Returns the index schema used by this Solr server
     *
     * @return
     * @throws SolrServerException
     */

    public IndexSchema getIndexSchema() throws SolrServerException {
        getDelegate(); // force the delegate to be created

        return core.getSchema();
    }
}
 


package com.zoominfo.util.solrembed;

import org.apache.solr.client.solrj.SolrServer;
import java.io.IOException;
import com.zoominfo.util.ZipUtil;
import com.zoominfo.util.solrembed.InProcessSolrServer;
import java.io.File;
import static org.apache.commons.io.FileUtils.deleteDirectory;

public abstract class InProcessSolrServerTestBase {

    private InProcessSolrServer solrServer = null;
    private File solrOutFolder = null;

    public SolrServer getSolrServer() {
        return solrServer;
    }

    /**
     * <p>Deploys a pre-built Solr index, and creates an in-process Solr server.
     * This is meant to be called from an @Before or @BeforeClass type of method.</p>
     *
     * @param solrIndexZip a pre-built Solr index, bundled into a ZIP
     * @throws IOException
     */

    protected void deploySolrIndex(final File solrIndexZip) throws IOException {
        assert solrOutFolder == null;

        solrOutFolder = File.createTempFile("solr", "index");
        solrOutFolder.delete();
        solrOutFolder.mkdirs();

        // unzip a canned solr index. code not provided.
        ZipUtil.unzipFolder(solrIndexZip, solrOutFolder);

        // create an in-process solr server
        solrServer = new InProcessSolrServer(solrOutFolder);
    }

    /**
     * <p>Un-deploys the in-process Solr server.
     * Meant to be called from an @After or @AfterClass method.</p>

     * @throws IOException
     */
    protected void tearDown() throws IOException {
        if (solrOutFolder != null) {
            solrServer.close(); // turn off the solr server
            deleteDirectory(solrOutFolder); // delete the solr directory, recursively
        }
    }
}
 


    /**
     * Runs the specified query, passing every result to the specified Callback argument
     *
     * @see SolrIndexSearcher.getDocList
     *
     * @param solrServer the Solr Server you want to search
     * @param query the Solr query
     * @param filterList the list of documents to not return. may be null
     * @param sort criteria by which to sort (if null, query relevance is used)
     * @param start offset into the list of documents to return
     * @param length maximum number of documents to return. -1 returns all documents
     * @param callback action performed for each matching document
     * @param closeIndexSearcher pass true to keep the searcher open for more queries
     * @throws Exception
     */

    public static void run(final InProcessSolrServer solrServer, final Query query,
            final List<Query> filterList, final Sort sort,
            final int start, int length, final Callback callback, final boolean closeIndexSearcher) throws Exception {
        RefCounted<SolrIndexSearcher> indexSearcherRef = null;

        try {
            indexSearcherRef = solrServer.getIndexSearcher();

            SolrIndexSearcher indexSearcher = indexSearcherRef.get();
            if (length < 0) {
                length = indexSearcher.getIndexReader().maxDoc();
            }

            DocList docList = indexSearcher.getDocList(query, filterList, sort, start, length, 0);

            callback.setIndexSchema(solrServer.getIndexSchema());
            callback.begin();

            DocIterator iter = docList.iterator();
            while (iter.hasNext()) {
                Document doc = indexSearcher.doc(iter.nextDoc());
                callback.collect(doc);
            }

            callback.end();
            if(closeIndexSearcher) {
                indexSearcher.close();
            }

        } finally {
            if (indexSearcherRef != null) {
                indexSearcherRef.decref();
            }
        }
    }

    /**
     * A utility method that converts a Lucene document to a Solr document
     *
     * @param doc a Lucene document
     * @return an equivalent Solr document
     * @throws Exception
     */

    public SolrDocument convertToSolrDocument(final Document doc) throws Exception {
        SolrDocument solrDoc = new SolrDocument();

        // load the solr doc from the lucene doc
        new org.apache.solr.update.DocumentBuilder(schema).loadStoredFields(solrDoc, doc);

        return solrDoc;
    }

    /**
     * A utility method that converts a Solr document to a Lucene document
     *
     * @param doc a Solr document
     * @return an equivalent Lucene document
     */

    public Document convertToLuceneDocument(final SolrDocument doc) {
        // load the lucene doc from the solr doc
        return DocumentBuilder.toDocument(org.apache.solr.client.solrj.util.ClientUtils.toSolrInputDocument(doc), schema);
    }


Posted in ZoomInfo News | Leave a comment

Few things are worse than tracking down a Heisenbug in your testing framework. As a responsible developer, you hate breaking the build. Before checking in your code, you made sure that all of the tests pass locally. But you just got that dreaded email – the build is broken. Worse still, it turns out that only every nth build fails. Most of the builds still pass. You scratch your head. What’s going on?

In my experience, these Heisenbugs are usually caused by tests that depend on external resources like a database, NoSQL server, filesystem, or Solr. Sometimes, they’re caused by a system outage like a network failure. More often, the root cause is a race condition triggered by two tests updating the same shared resource. Regardless of the cause, what it is above all else is frustrating.

Oftentimes, this happens when you’re automating some larger functional test or integration test. Many organizations that encounter this sort of problem usually adopt a “tiered” approach, resembling something like this:

  • Mock out external dependencies, and keep your unit tests simple. These tests can’t reference any external resources, and are run constantly (more-or-less) by your CI system.
  • Run bigger functional & system tests asynchronously – usually nightly or weekly. These tests can depend on external resources and generally take longer to run than unit tests.

I’ve always been a little leery of this sort of model. Organizations usually don’t strictly adhere to this separation. Testing budgets are always tight, and corners often get cut. But even if you’ve implemented this model perfectly, what you’ve done is introduced friction into your development process. You’ve increased the time it takes between writing a line of code and knowing its impact on your production environment. You’ve reduced the universe of meaningful tests that your developers can run on their own. Usually, this is better than what you had before – i.e. before you had functional & system tests. But you can do better still.

One partial (and popular) solution to this problem are mock objects. Mock objects are great at testing your “meat and potatoes” main code path and really excel at helping you write effective unit tests. But in my experience they really fall short on data-driven or system tests. Ideally, you’d like for each of your developers to have a smaller, isolated copy of your production environment to test their changes in, with lots & lots of data to test against.

What if I told you that you could have a miniature copy of your production environment – your database, FTP server, Solr server, and Cassandra NoSQL database running right in-process as part of your testing framework? That you could have meaningful data-driven tests with no external dependencies. That you could have meaningful data-driven tests that you could run from an airplane. Today’s Open Source ecosystem makes this easy. Over the next few posts, I’ll show you how.


An important part of Zoom’s workflow involves interchanging data over FTP. In today’s world of RESTful APIs, this process sounds a little dated to be sure. But whether it be for exchanging data with customers or with vendors, FTP is a neat little protocol that doesn’t want to go away.

To ensure that our code that interacts with these FTP servers doesn’t break, we’ve implemented a testing framework that embeds Apache Mina’s FtpServer in-process. With some small changes, you should be able to inherit this class for your own needs and just fill in the business logic.

The class starts by creating a sandboxed FTP server in its @Before method. The FTP server is configured with a single dummy user, running on any available port, and pointed at some empty temporary directory on your local filesystem. Since we used @Before and not @BeforeClass, a new sandboxed FTP server will be created for every @Test. In our setup method, an FtpClient object is created, connected to this in-process server, and ready to use in your @Tests. After each @Test is completed, the FtpClient is disconnected, the server is shut down, and the directory is deleted from your filesystem, including any files that you might have put there as part of your business logic. Now, you have one fewer thing to configure on your staging environment or development VM. Let’s take a look:


/**
 * This code is provided under the MIT License (http://www.opensource.org/licenses/mit-license.php).
 * It depends on the following 3rd-party packages:
 *
 * * Apache Mina FtpServer: http://mina.apache.org/ftpserver/
 * * Apache Commons I/O: http://commons.apache.org/io/
 * * Apache Commons Net: http://commons.apache.org/net/
 * * JUnit: http://www.junit.org/
 */

package ftpserverapplication;

import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Properties;
import static org.apache.commons.io.FileUtils.deleteDirectory;

import org.apache.ftpserver.FtpServer;
import org.apache.ftpserver.FtpServerFactory;
import org.apache.ftpserver.ftplet.FtpException;
import org.apache.ftpserver.listener.ListenerFactory;
import org.apache.ftpserver.usermanager.ClearTextPasswordEncryptor;
import org.apache.ftpserver.usermanager.PropertiesUserManagerFactory;

import org.apache.commons.net.ftp.FTP;
import org.apache.commons.net.ftp.FTPClient;
import org.apache.commons.net.ftp.FTPReply;

import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import static org.junit.Assert.*;

/**
 * <p>This class is intended to be a base class for any test that needs to interact
 * with a FTP server. It creates a new sandboxed FTP server for each @Test
 * that gets executed.</p>
 */

public class FtpServerTest {

    // HACK UTILITY METHOD
    private static File createTempFileName(final String name) throws IOException {
        // File.createTempFile() actually creates the file. We only want a name.
        // Don’t use this in production code, as it could be exploited or suffer
        // from race conditions, similar to C’s "tmpnam()" function.
        File f = File.createTempFile(name, ".temp");
        f.delete();
        return f;
    }
    private FtpServer server;
    private File propsFile;
    private File ftpBaseDirectory;
    private FTPClient client;

    /**
     * <p>This method sets up some basic configuration parameters for the FTP Server.
     * It creates a single user account, tied to a base directory.</p>
     *
     * @param ftpDirectory the root directory for the FTP Server
     * @return the FTP Properties file that will drive its configuration
     */

    private static File generateFtpProperties(File ftpDirectory) throws IOException {
        Properties props = new Properties();

        // create an account for the user "ftpuser" whose password is "password"
        // most of these are just "stock" values that won’t affect your functional
        // testing, so don’t worry too much about them
        props.put("ftpserver.user.ftpuser.userpassword", "password");
        props.put("ftpserver.user.ftpuser.homedirectory", ftpDirectory.getCanonicalPath());
        props.put("ftpserver.user.ftpuser.enableflag", "true");
        props.put("ftpserver.user.ftpuser.writepermission", "true");
        props.put("ftpserver.user.ftpuser.maxloginnumber", "0");
        props.put("ftpserver.user.ftpuser.maxloginperip", "0");
        props.put("ftpserver.user.ftpuser.idletime", "0");
        props.put("ftpserver.user.ftpuser.uploadrate", "0");
        props.put("ftpserver.user.ftpuser.downloadrate", "0");

        // create a sandboxed properties file
        File propsFile = File.createTempFile("ftpserver", "properties");

        FileOutputStream os = new FileOutputStream(propsFile);
        props.store(os, "Apache Mina FtpServer configuration");
        os.close();

        // ensure that the properties file gets cleaned up when the tests are done
        propsFile.deleteOnExit();

        return propsFile;
    }

    /**
     * Here, we create our in-process FTP Server, as well as a client connected to it.
     */

    @Before
    public void setup() throws IOException, FtpException {
        FtpServerFactory serverFactory = new FtpServerFactory();

        // generate a random, sandboxed directory that will serve as the base
        // directory for our FTP Server
        ftpBaseDirectory = createTempFileName("ftpserver");
        boolean createdDirectories = ftpBaseDirectory.mkdirs(); // create the directory
        if (!createdDirectories) {
            throw new IOException("Could not create directory: " + ftpBaseDirectory.getCanonicalPath());
        }

        propsFile = generateFtpProperties(ftpBaseDirectory);

        PropertiesUserManagerFactory userManagerFactory = new PropertiesUserManagerFactory();
        userManagerFactory.setPasswordEncryptor(new ClearTextPasswordEncryptor());
        userManagerFactory.setFile(propsFile);

        serverFactory.setUserManager(userManagerFactory.createUserManager());

        ListenerFactory listenerFactory = new ListenerFactory();

        // let the ftp server bind to any available port
        listenerFactory.setPort(0);

        // replace the default listener
        serverFactory.addListener("default", listenerFactory.createListener());

        server = serverFactory.createServer();

        // start the FTP server
        server.start();

        // creates a new FTP client
        client = new FTPClient();

        // connect to the ftp server on the correct port
        client.connect("127.0.0.1", serverFactory.getListener("default").getPort());

        // check the reply code to verify success
        if (!FTPReply.isPositiveCompletion(client.getReplyCode())) {
            client.disconnect();
            throw new IOException("FTP server refused connection. Reason: " + client.getReplyString());
        }

        // log in to the ftp server
        boolean isLoggedIn = client.login("ftpuser", "password");
        if (!isLoggedIn) {
            throw new IOException("Login failed to FTP server. Reason: " + client.getReplyString());
        }

        // transfer data in binary mode, otherwise the FTP server will
        // un-helpfully convert line endings for you
        boolean supportsBinary = client.setFileType(FTP.BINARY_FILE_TYPE);
        if (!supportsBinary) {
            throw new IOException("Could not set mode to ‘binary’. Reason: " + client.getReplyString());
        }
    }

    /**
     * This method cleans up the various sandboxes we created and stops the FTP server.
     */

    @After
    public void tearDown() throws IOException {
        // disconnect the client
        client.disconnect();

        // stop the server
        server.stop();

        // clean up the FTP server’s properties file
        propsFile.delete();

        // clean up the temporary ftp directory, recursively
        deleteDirectory(ftpBaseDirectory);
    }

    @Test
    public void test() throws IOException {
        String remoteFileName = "test.txt";
        byte[] testTextBytes;

        {
            // This block is where you would put your business logic.
            // I’ve dummied up some business logic for illustrative purposes.
            // All this does is create a simple file on the server.

            String testText = "hello world!\n"
                    + "this file has several lines in it.\n"
                    + "it is an otherwise boring file.";

            testTextBytes = testText.getBytes("UTF-8");
            ByteArrayInputStream localInput = new ByteArrayInputStream(testTextBytes);

            boolean storedOk = client.storeFile(remoteFileName, localInput);

            localInput.close();

            if (!storedOk) {
                throw new IOException("Transfer failed to FTP server. Reason: " + client.getReplyString());
            }
        }

        {
            // In this block, we assert that our business logic worked appropriately.
            // For this use case, we simply assert that the file exists on the local
            // hard drive, and that it has the same size as we’d expect it to.

            // assert that the file exists
            File storedFile = new File(ftpBaseDirectory, remoteFileName);
            assertTrue(storedFile.exists());

            // assert that the file has the correct size in bytes
            long length = storedFile.length();
            assertEquals(testTextBytes.length, length);
        }
    }
}
 

-Dominic Lachowicz, Director of Core Development


Posted in ZoomInfo News | Leave a comment

If you’ve ever used more than one programming language, you’ve probably found that – as you switch between them – there’s usually some feature that you’ve left behind that you really miss. It could be a particular class, syntactic sugar, or language construct that you just can’t do without. When I joined Zoom some 5+ years ago, I went from doing development primarily in Python and C# to doing development primarily in Java. One of the little things I missed from those languages was generators.

Generators are a simple but powerful tool for creating iterators. They are written like regular functions but use a “yield” statement whenever they want to return data. Each time next() is called in your for-each loop, control passes to the generator and it resumes right where it left off – its local variables and execution state are automatically saved between calls. The generator returns control back to your loop when it “yields” the next value in the iteration.

Using generators – like iterators – can confer some nice performance benefits. This performance boost is the result of the lazy (on-demand) generation of values, which translates to lower memory usage. Furthermore, we do not need to wait until all the elements have been generated before we start to use them.

It’s important to note that anything that can be done with generators can also be done with iterators. What makes generators so nice & compact is that the iterator(), hasNext() and next() methods are all created automatically for you. This helps make generators easier to write and much clearer than an iterator-based approach. You don’t have to write a ton of boilerplate and a mini state machine just to keep track of your progress.

By now, you’ve probably picked up that Java doesn’t have generators. Worse still, Java (the language, not the JVM) doesn’t have built-in support for continuations – a useful building block for implementing generators (though there are some 3rd-party add-ons like Apache Javaflow that implement them via bytecode manipulation). Fortunately, not all is lost. A bit of Googling turned up an excellent blog post by Jim Blackler who had implemented a Yield/Return framework in Java. After ironing out a few bugs in the framework (and passing those patches back to Jim), we at Zoom adopted it for use in our production environment.

Jim’s framework is based on a traditional producer/consumer model. In it, two threads effectively do the work of one. Control passes between the “worker” thread (which computes your iteration’s values) and the managing thread (which implements the java.util.Iterator logic). Each invocation of “Iterator.next()” causes the worker thread to wake up and compute the next result, which it puts into a Java SynchronousQueue. The worker thread goes back to sleep and the manager thread pops the queue, returning the newly-computed value to your “for” loop.

By itself, Jim’s framework got us lazy-computation and a more natural programming model than straight Java Iterators, which was a huge improvement. But much like iterators, it required writing a lot of boilerblate code. Writing anything in Java requires writing a lot of boilerplate, but we strive to keep that to a minimum. To solve this, we wrote a really simple “Generator” base class that is easily extensible. It implements the Iterable interface, so you can use Generators wherever you would use a Java 1.5-style “for-each” loop. All it requires is that you implement your business logic inside of a “run()” method, return values via its “yield” method, and it figures out the rest for you. I’ve included the framework (both Jim’s and Zoom’s code) for download here, and I’ve included some illustrative code samples below.

In the following examples, you’ll see how easy it is to implement Python’s numeric range() function and its string-reversing function using generators, just like the core Python library does. You can read up on the Python examples and more on generators here.

-Dominic Lachowicz, Director of Core Development


    static Iterable<Integer> range(int hi) {

        return range(0, hi);
    }

    static Iterable<Integer> range(int lo, int hi) {

        return range(lo, hi, 1);
    }

    static Iterable<Integer> range(final int lo, final int hi, final int step) {

        return new Generator<Integer>() {

            @Override

            protected void run() {
                for (int i = lo; i != hi; i += step) {

                    yield(i);
                }

            }
        };
    }

    static Iterable<Character> reverse(final CharSequence string) {
        return new Generator<Character>() {

            @Override
            protected void run() {

                for (int i : range(string.length()1, -1, -1)) {

                    yield(string.charAt(i));
                }

            }
        };
    }

    static Iterable<Integer> firstn(final int n) {
        return new Generator<Integer>() {

            @Override
            protected void run() {

                int num = 0;
                while (num < n) {

                    yield(num);
                    num += 1;

                }
            }
        };

    }

Posted in Best Practices | Tagged , | Leave a comment

ZoomInfo is extremely excited to jump-start February with our first tradeshow appearance of the year at Marketing Sherpa’s E-mail Summit in Las Vegas.  And of course, it’s only normal for us to give away awesome prizes for stopping by our booth.  Fill out a survey or take a demo and you’re automatically a winner of one of five Vegas-style prizes including a Kindle, $25 AMEX Gift Cards, 1,000 Free Contacts, E-mail Address Validation up to 2,500 contacts and 2 tickets to Cirque LOVE with dinner at Kokomos.  Everyone’s a winner in Vegas baby!

Not attending next week’s show? Find us at our second stop in Hollywood, FL on Feb 22-24 for DMA’s E-mail Evolution Conference at the Westin Diplomat Resort & Spa.  We’ll be showing off our award-winning Pro solution and offering fun giveaways including $100 iTunes Gift Cards, Kindles, $25 AMEX Gift Cards and thousands of free contacts straight from the ZoomInfo database.

Want to schedule a meet up with one of our data experts?  Fill out this form

We hope to see you soon!  Viva Las Vegas!

Posted in ZoomInfo News | Leave a comment

First and foremost, we begin by sending a bold THANK YOU to our ZoomInfo team, clients, supporters, and online followers!   In honor of both our outstanding year and recent announcement of being a CODiE Award Finalist, we decided to kick off our holiday festivities popping bottles and toasting to all our supporters.  Cheers, this one’s for you!

Inspired with this support, we decided to return the favor by donating coats, clothes, toys, and more to several charities including Coats for Kids and The Official U.S. Marine Corps Toys for Tots Foundation.  Please join us in this effort!

Continuing on high spirits, we’re currently preparing for our end of year events, including both our annual company pot luck and New Year’s Party!   Anticipating the assortment of homemade dishes, we’ve been prepared to eat since Thanksgiving! Well, who wouldn’t when spoiled with amazing bakers like Kelly G., who we believe to be an Account Representative during the day and a baker during the night!  Have you seen her Halloween cake! Sweet, isn’t it? Okay! Okay! Enough with our holiday celebrations and on to our industry events!

Last week our Director of Sales, Mark Ruthfield held an engaging speaking session at AA-ISP’s Boston Conference attendees, spreading his wealth of sales knowledge and company updates to other sales leaders.  One of the events he mentioned was our upcoming Sales Dream Team Recruiting Event, welcoming potential entry level candidates for our sales and marketing team.  If you know anyone suited for that event, give them the gift of opportunity and send them our way!

No, the excitement doesn’t stop there! Continuing this eventful month, our marketing team has coordinated not one but TWO webinars in just one week!  Our first webinar, “Pro Day,” uncovered several tips and tricks for generating new leads with ZoomInfo Pro.  ‘Tis the season of giving all right!   All attendees left with a FREE 24-hour exclusive pass to our tool topped off with a BONUS invitation to today’s webinar featuring Recruiting Guru, Chris Murdock, who will be directly responding to questions from recruiting professionals of all industries.  No worries though, with luck and support for reading our blog, you have just been invited!

Again, we love our supporters and we appreciate the continuous belief in our services!  To wrap things up, we’ve included the most recent picture of our ZoomInfo team, taken just a couple days ago!

Don’t forget to Follow Us for live announcements, Like Us for access to event photos, and Join us on LinkedIn!  We hope you enjoy the rest of the week and be sure to revisit our blog!

-Catherine Heng

Posted in ZoomInfo News | Leave a comment

Hands raised if you want new leads!  I repeat, hands raised if you want new leads!! I’m guessing those hands raised belong to your sales department? Oh marketing, do you feel that pressure or jeez, that stink?  We’ve all sensed that at one point in time or another but cheer up, You Are Not Alone!

Zooming to the rescue, we are here to lend you a hand, giving you our FANTASTIC FOUR secrets to generating new B2B leads!

  1. Connections: It’s not about what you know but WHO you know!   Business flourishes at a fast pace most effectively through referrals and viral marketing, all part of a heavy cost saving strategy for winning new business.   Manage those connections through social media (Twitter, Facebook, and LinkedIn) to increase your lead generation initiatives in a fast, free, and fun way!  Do I smell a New Year’s Resolution? Free the social butterfly in you and make connections, connection, and let’s not forget, more connections!
  2. E-mail Marketing: Saving your company’s time and money, B2B e-mail marketing is the new direct mail.  With the rising popularity in this effort, our clients are leveraging ZoomInfo’s data services (e-mail validation marketing lists, and data append services) to successfully execute targeted e-mail marketing campaigns, filling their bucket of leads.  Has your company checked out ZoomInfo’s B2B database?  Do itI dare you!
  3. Content: If your products are as good as you claim than you’d be an expert in your industry.  This trust and credibility can be easily established through content marketing using blogs, newsletters, webinars, whitepapers, and e-books!  Utilizing this tactic, companies such as  Zappos, Hubspot, and Forrester have been able to convert their large audience of readers (connections: tip #1) into new leads.
  4. Search Engine Optimization: True to the fact that I am extremely gullible, I believe your products are as good as you say, though that would mean I  recognize your products.  Used to generate new leads and increase brand recognition, Search Engine Optimization Marketing along with banner ads are both popular strategies to pocket new leads.  Now that’s a “win-win” situation!

How are these tips connected?  By leveraging landing pages, you’ll transform viewers into leads with limited to no cost.  Now marketing, do you still feel that pressure or jeez, that stink? Well, getting rid of both requires taking action Speak with us today to determine how our products can support your future lead generation initiatives.

Best of Luck!

-Catherine Heng

Posted in ZoomInfo News | Leave a comment

While the United States unemployment rate has plumped to an eye-opening rate of about 9.1%, ZoomInfo is welcoming new talent in all departments including engineering, sales, and marketing.  Yeah folks, you got it! We’re opening our glass doors to potential Zoomers which, who knows, could be YOU!

What’s all the hype with ZoomInfo?  Luckily for you, I’ve worked my way to veteran status, being able to give you a live testimonial of what’s hot at ZoomInfo.  Ouch, it burns!  With so many reasons to work at ZoomInfo, here are my FANTASTIC FOUR!

  • People:  Why wouldn’t you want to work with a diverse and exceptionally talented crowd of individuals?  With about 65 employees, we learn together, work together, and even share our special occasion with one another; births to weddings and plane rides to holidays.  Aww, it’s a Kodak moment!
  • Events:   With new volunteers every quarter, all Zoomers have the opportunity to plan company festivities.  Hosting over a dozen company events this year, we’ve attended two Boston sports games (Bruins and Red Sox), a golf tournament, a bowling extravaganza, and how remarkable, a couple of themed company meetings!  You know the saying, “Work hard, play hard?”  Well, we do too!
  • Product:  Who isn’t aware of our award winning ZoomInfo Professional Solution, Data Services, and our FREE Community Edition? Thanks to our Engineering Team, our patent technology has a unique method of aggregating data, constantly keeps our database fresh and up-to-date.   Written with confidence and spoken truthfully, we love the quality of our data and we are POSITIVE you will too.
  • Culture:  As vibrant as our snazzy new clocks and our orange and blue painted office, our culture is invigorating, exciting, and hip!  From Engineer’s Tech Fest to Sale’s Phone Blitz, we are constantly implementing new and fun ways to work together.   You can buy data and you can buy tools. Culture though, it’s priceless!

Feel free to ZOOM-IN!

Check out Careers at ZoomInfo for all posted career positions.  Already taken?  No sweat!  Follow us for future updates on new opportunities and open houses.  Curious? E-mail our Human Resources Manager, Jennie Cohen, or our Sales and Marketing Associate, Me!

Have a happy holiday and we hope to hear from you soon!

- Catherine Heng

Posted in ZoomInfo News | Leave a comment

Welcome our guest blogger, Justin Champion, who is a Marketing Consultant with Search Mojo, a dedicated search engine marketing agency. He works with new prospects to determine the best fitting search marketing solution to achieve their goals. He also is a regular blogging contributor to Search Mojo’s blog, Search Marketing Sage.

4 Tips to Improving Your New Business Prospecting Efforts

Prospecting for new business is not an easy task. Many get weighed down by the thought of having to call someone they do not know, who is not expecting their call. Ugh! However, for those of you who are new to the game or looking to reevaluate your approach, here are four tips to make your lead-generation process more effective.

1. Set a simple goal for the initial call

Do not just walk in expecting to wine and dine them. In order to gain confidence with a prospect, you are going to need trust. This is not something you are going to establish in the first two minutes. The goal here should be identifying the decision maker and setting an appointment. You are not calling to sell them something. You’re calling to set up a time to discuss their needs and appropriate solutions.

If they are not willing to schedule an appointment with you, then they are probably not going to buy from you either.

2. Slow and steady wins the race

HimynameisJustinandIworkwithSearchMojo… What? Don’t get your whole speech out in one breath. Make sure to slow your pace and pronounce your words in a clear and calm manner. If your approach sounds rushed, then you are not setting a comfortable setting for a conversation. The last thing you want to do is repeat yourself more than once—not very professional.

3. Know your prospect

Before you call, make sure you know:

  • Name of company
  • Industry
  • Point of contact (use ZoomInfo Pro!)
  • Why you’re calling

The more you know about their industry—pain points, terminology, etc—the better. Now, I am not saying that you have to perform in depth research on each prospect, but it will help to work a vertical market and get comfortable with the industry. Plus, what does it hurt to take a look at their site before calling? The worst that can happen is you’ll know more about their specific business offerings.

4. Make sure to follow up

Over 90% of the time you will not reach the individual you want to talk to. This is also where knowing what to say comes in handy—get ready to make your first impression with a memorable voicemail.

However, a voicemail or leaving a message with the operator is not going to cut it. If you really want to make a difference you have to follow up. I do not recommend pestering them, calling every day, but do not expect them to call you back because they will forget about you.

All-in-all, prospecting can be a rewarding experience with the right preparation and attitude. The goal is not to get frustrated and focus in for the long haul. Not everyone is going to want your services, but there are people interested. It is your job to find them.

Posted in ZoomInfo News | Leave a comment

ZoomInfo relies on technology to create the world’s best directory of business and employee profiles, so we are always looking for new technologies and ways to exploit existing technologies to further that objective. As the Architect at ZoomInfo, I do my best to keep up with new technologies – but there’s only so many hours in a day, and I certainly don’t have a monopoly on good ideas. (Far from it!)

From time to time, the ZoomInfo Engineering Team takes a break for a couple of days from normal project work (such as A Need To Scale: The Cassandra Project) to explore new ideas in a lively brainstorming session we call IdeaFest. Recently, I sponsored TechFest with this stated theme: “Demonstrate how new tools will allow you or ZoomInfo to do new tricks for the business or be more productive.” I provided a few examples and told everyone to be creative, which was pretty much all the ground rules!

About a week before TechFest, a whiteboard in the Engineering area was commandeered so team members could publicly announce the technology they planned to explore and new trick or productivity enhancement they thought it would enable. This got the conversation started and built excitement for the two-day TechFest.

On the morning of Day 1, all team members described their plans.  The resulting discussion enhanced the ideas and provided a mechanism for some team members to join forces.

Because we use Hadoop map-reduce and HDFS a lot, there were a number of people looking to investigate related technologies such as Mahout for machine learning and data mining; Cloudera’s distribution of Hadoop and related tools such as Scoop, Flume; and Hive for a scalable, query-able data warehouse. They created a company competitor classifier, a virtual Cloudera Hadoop cluster, and demonstrated potential ways we could use Scoop and Hive for ZoomInfo business on that cluster.

We use Cassandra for some of our large data stores, so engineers in both QA and development took the opportunity to investigate tools to query and export data in Cassandra to ease testing and debugging on those projects.  They looked at off-the-shelf tools and wrote a prototype to iterate over rows and extract key data.

Since we use SOLR as our search platform and have data in Cassandra, one engineer checked out Solandra, which integrates SOLR searching with Cassandra data. He learned that it has potential to provide SOLR searches updated immediately upon writes to Cassandra.

Some others investigated tools to increase productivity such as JRebel for web development debugging, Javeleon for NetBeans development, and Review Board for web-based code reviews.

Another wrote a prototype with Apache CXF to investigate its potential for a new API.

At the end of each day, all team members gave a demo or presented what they had done. To be fair, there were some other technologies that did not pan out – but that’s okay! By going through this exercise, we were able to determine that those technologies were not a good fit for now. Some technologies were deemed quite ready to proceed with at this time, so we’ll keep an eye on them as they mature. A handful of prototypes proved successful enough to be assigned to backlogs in specific projects for further investigation. Meanwhile, a couple of the productivity tools are already being worked into our development environment.

After TechFest, we returned to project work with some new tools to increase productivity, along with improved morale from exploring things that had been on our to-do lists. The Engineering Team is now highly motivated to leverage some of the identified technologies to do new tricks for the business.  Check back for future blog posts to see how we did that!

-Leo Laferriere, Web Architect

Posted in ZoomInfo News | Leave a comment