<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
  <title>Finding Science</title>
  <link rel="alternate" type="text/html" href="http://findingscience.com/" />
  
  <id>http://findingscience.com/atom.xml</id>
  <updated>2012-02-20T19:01:40Z</updated>
  <subtitle>when I grow up I want to be a computer scientist</subtitle>

  
  <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/FindingScience" /><feedburner:info uri="findingscience" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry>
    <title>Testing Twisted Web Resources</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/SHYy70Jtv8Q/testing-twisted-web-resources.html" />
    <id>tag:findingscience.com:/python/twisted/2012/02/20/testing-twisted-web-resources</id>
    <updated>2012-02-20T21:33:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p>Testing web resources in <a href="http://twistedmatrix.com">Twisted</a> can be a bit of a pain, and the Twisted docs don’t describe how best go go about writing tests for <a href="http://twistedmatrix.com/documents/8.1.0/api/twisted.web.resource.Resource.html">twisted.web.resource.Resource</a> objects.</p>

<p>Generally, usage of <code>twisted.web</code> resources looks something like this:</p>
<div class="highlight"><pre><code class="python"><span class="kn">from</span> <span class="nn">twisted.internet.defer</span> <span class="kn">import</span> <span class="n">inlineCallbacks</span>
<span class="kn">from</span> <span class="nn">twisted.internet</span> <span class="kn">import</span> <span class="n">defer</span><span class="p">,</span> <span class="n">reactor</span>
<span class="kn">from</span> <span class="nn">twisted.web</span> <span class="kn">import</span> <span class="n">resource</span>
<span class="kn">from</span> <span class="nn">twisted.web.server</span> <span class="kn">import</span> <span class="n">NOT_DONE_YET</span>

<span class="k">class</span> <span class="nc">ChildPage</span><span class="p">(</span><span class="n">resource</span><span class="o">.</span><span class="n">Resource</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">render</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">):</span>
        <span class="n">d</span> <span class="o">=</span> <span class="n">defer</span><span class="o">.</span><span class="n">Deferred</span><span class="p">()</span>
        <span class="n">d</span><span class="o">.</span><span class="n">addCallback</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">renderResult</span><span class="p">,</span> <span class="n">request</span><span class="p">)</span>
        <span class="n">reactor</span><span class="o">.</span><span class="n">callLater</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">d</span><span class="o">.</span><span class="n">callback</span><span class="p">,</span> <span class="s">"hello"</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">NOT_DONE_YET</span>

    <span class="k">def</span> <span class="nf">renderResult</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">result</span><span class="p">,</span> <span class="n">request</span><span class="p">):</span>
        <span class="n">request</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
        <span class="n">request</span><span class="o">.</span><span class="n">finish</span><span class="p">()</span>
        
<span class="k">class</span> <span class="nc">MainPage</span><span class="p">(</span><span class="n">resource</span><span class="o">.</span><span class="n">Resource</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="n">resource</span><span class="o">.</span><span class="n">Resource</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">putChild</span><span class="p">(</span><span class="s">'childpage'</span><span class="p">,</span> <span class="n">ChildPage</span><span class="p">())</span>
</code></pre>
</div>
<p>I created a small bit of code that wraps some of the testing library in Twisted. This code can be used to easily create tests by just using a <code>DummySite</code> instead of a <code>twisted.web.server.Site</code>. You can then call <code>get</code> and <code>post</code> on that site (and pass optional dictionaries of get/post arguments and headers). Here’s what a test looks like:</p>
<div class="highlight"><pre><code class="python"><span class="kn">from</span> <span class="nn">twisted.trial</span> <span class="kn">import</span> <span class="n">unittest</span>
<span class="kn">from</span> <span class="nn">twisted_web_test_utils</span> <span class="kn">import</span> <span class="n">DummySite</span>

<span class="k">class</span> <span class="nc">WebTest</span><span class="p">(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TestCase</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">setUp</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">web</span> <span class="o">=</span> <span class="n">DummySite</span><span class="p">(</span><span class="n">MainPage</span><span class="p">())</span>

    <span class="nd">@inlineCallbacks</span>
    <span class="k">def</span> <span class="nf">test_get</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="n">response</span> <span class="o">=</span> <span class="k">yield</span> <span class="bp">self</span><span class="o">.</span><span class="n">web</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">"childpage"</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">response</span><span class="o">.</span><span class="n">value</span><span class="p">(),</span> <span class="s">"hello"</span><span class="p">)</span>

	<span class="c"># if you have params / headers:</span>
	<span class="n">response</span> <span class="o">=</span> <span class="k">yield</span> <span class="bp">self</span><span class="o">.</span><span class="n">web</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">"childpage"</span><span class="p">,</span> <span class="p">{</span><span class="s">'paramone'</span><span class="p">:</span> <span class="s">'value'</span><span class="p">},</span> <span class="p">{</span><span class="s">'referer'</span><span class="p">:</span> <span class="s">"http://somesite.com"</span><span class="p">})</span>
</code></pre>
</div>
<p>Here’s the testing code if you want to use it:</p>
<script src="https://gist.github.com/1873035.js?file=twisted_web_test_utils.py">
</script>
<p>And with that, a few hours worth of work will save me at least a few 10 minute segments in the future.</p></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;Testing web resources in &lt;a href='http://twistedmatrix.com'&gt;Twisted&lt;/a&gt; can be a bit of a pain, and the Twisted docs don&amp;#8217;t describe how best go go about writing tests for &lt;a href='http://twistedmatrix.com/documents/8.1.0/api/twisted.web.resource.Resource.html'&gt;twisted.web.resource.Resource&lt;/a&gt; objects.&lt;/p&gt;

&lt;p&gt;Generally, usage of &lt;code&gt;twisted.web&lt;/code&gt; resources looks something like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='kn'&gt;from&lt;/span&gt; &lt;span class='nn'&gt;twisted.internet.defer&lt;/span&gt; &lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='n'&gt;inlineCallbacks&lt;/span&gt;
&lt;span class='kn'&gt;from&lt;/span&gt; &lt;span class='nn'&gt;twisted.internet&lt;/span&gt; &lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='n'&gt;defer&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;reactor&lt;/span&gt;
&lt;span class='kn'&gt;from&lt;/span&gt; &lt;span class='nn'&gt;twisted.web&lt;/span&gt; &lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='n'&gt;resource&lt;/span&gt;
&lt;span class='kn'&gt;from&lt;/span&gt; &lt;span class='nn'&gt;twisted.web.server&lt;/span&gt; &lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='n'&gt;NOT_DONE_YET&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;ChildPage&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;resource&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Resource&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;render&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;request&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
        &lt;span class='n'&gt;d&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;defer&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Deferred&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
        &lt;span class='n'&gt;d&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;addCallback&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;renderResult&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;request&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
        &lt;span class='n'&gt;reactor&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;callLater&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;d&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;callback&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;hello&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;NOT_DONE_YET&lt;/span&gt;

    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;renderResult&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;request&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
        &lt;span class='n'&gt;request&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;write&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
        &lt;span class='n'&gt;request&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;finish&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
        
&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;MainPage&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;resource&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Resource&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;__init__&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
        &lt;span class='n'&gt;resource&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Resource&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;__init__&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
        &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;putChild&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;#39;childpage&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;ChildPage&lt;/span&gt;&lt;span class='p'&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;I created a small bit of code that wraps some of the testing library in Twisted. This code can be used to easily create tests by just using a &lt;code&gt;DummySite&lt;/code&gt; instead of a &lt;code&gt;twisted.web.server.Site&lt;/code&gt;. You can then call &lt;code&gt;get&lt;/code&gt; and &lt;code&gt;post&lt;/code&gt; on that site (and pass optional dictionaries of get/post arguments and headers). Here&amp;#8217;s what a test looks like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='kn'&gt;from&lt;/span&gt; &lt;span class='nn'&gt;twisted.trial&lt;/span&gt; &lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='n'&gt;unittest&lt;/span&gt;
&lt;span class='kn'&gt;from&lt;/span&gt; &lt;span class='nn'&gt;twisted_web_test_utils&lt;/span&gt; &lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='n'&gt;DummySite&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;WebTest&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;unittest&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;TestCase&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;setUp&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
        &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;web&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;DummySite&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;MainPage&lt;/span&gt;&lt;span class='p'&gt;())&lt;/span&gt;

    &lt;span class='nd'&gt;@inlineCallbacks&lt;/span&gt;
    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;test_get&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
        &lt;span class='n'&gt;response&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;yield&lt;/span&gt; &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;web&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;get&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;childpage&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
        &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;assertEqual&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;response&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;value&lt;/span&gt;&lt;span class='p'&gt;(),&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;hello&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;

	&lt;span class='c'&gt;# if you have params / headers:&lt;/span&gt;
	&lt;span class='n'&gt;response&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;yield&lt;/span&gt; &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;web&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;get&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;childpage&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;&lt;span class='s'&gt;&amp;#39;paramone&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt; &lt;span class='s'&gt;&amp;#39;value&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;},&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;&lt;span class='s'&gt;&amp;#39;referer&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;http://somesite.com&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Here&amp;#8217;s the testing code if you want to use it:&lt;/p&gt;
&lt;script src='https://gist.github.com/1873035.js?file=twisted_web_test_utils.py'&gt;
&lt;/script&gt;
&lt;p&gt;And with that, a few hours worth of work will save me at least a few 10 minute segments in the future.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/SHYy70Jtv8Q" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/python/twisted/2012/02/20/testing-twisted-web-resources.html</feedburner:origLink></entry>
  
  <entry>
    <title>Stopping Time During Python Tests</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/045iWcINm9Q/stopping-time-during-python-tests.html" />
    <id>tag:findingscience.com:/python/testing/2012/02/12/stopping-time-during-python-tests</id>
    <updated>2012-02-12T20:50:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p>When running unit tests in Python, it’s often the case that I need to “stop time” so that the current time remains the same during the entire execution of the test. For instance, in cases where I expect the result of a slow (networked) operation to return a value based on a creation time. If this creation process crosses into a new second, then the creation time of each of the objects will not be the same. This becomes a problem when there is latency associated either with the request to create the object or in the response after the object has been created (causing a potentially large difference between a the time the request was made and the time of the response). To compensate, I use a decorator for the unit test methods that need it.</p>

<p>Here’s the decorator function:</p>
<div class="highlight"><pre><code class="python"><span class="kn">import</span> <span class="nn">time</span>

<span class="k">def</span> <span class="nf">stopTime</span><span class="p">(</span><span class="n">f</span><span class="p">):</span>
    <span class="n">original</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span>
    <span class="k">def</span> <span class="nf">newf</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="n">now</span> <span class="o">=</span> <span class="n">original</span><span class="p">()</span>
	<span class="n">time</span><span class="o">.</span><span class="n">time</span> <span class="o">=</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">now</span>
        <span class="n">result</span> <span class="o">=</span> <span class="n">f</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
	<span class="n">time</span><span class="o">.</span><span class="n">time</span> <span class="o">=</span> <span class="n">original</span>
	<span class="k">return</span> <span class="n">result</span>
    <span class="k">return</span> <span class="n">newf</span>
</code></pre>
</div>
<p>Here’s an example of usage in a unit test:</p>
<div class="highlight"><pre><code class="python"><span class="kn">import</span> <span class="nn">unittest</span><span class="o">,</span> <span class="nn">time</span>

<span class="k">class</span> <span class="nc">TestSomething</span><span class="p">(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TestCase</span><span class="p">):</span>

    <span class="nd">@stopTime</span>
    <span class="k">def</span> <span class="nf">test_something</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="n">a</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
        <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
	<span class="n">b</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
</code></pre>
</div>
<p>In this case, <code>a</code> and <code>b</code> will be the same, thus demonstrating your awesome ability to alter the <a href="http://en.wikipedia.org/wiki/Spacetime">space-time continuum</a>.</p>

<p>To further illustrate what’s occurring, here’s a picture of what you’re doing:</p>
<a href="http://en.wikipedia.org/wiki/File:Spacetime_curvature.png">
<img src="http://upload.wikimedia.org/wikipedia/commons/2/22/Spacetime_curvature.png" />
</a></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;When running unit tests in Python, it&amp;#8217;s often the case that I need to &amp;#8220;stop time&amp;#8221; so that the current time remains the same during the entire execution of the test. For instance, in cases where I expect the result of a slow (networked) operation to return a value based on a creation time. If this creation process crosses into a new second, then the creation time of each of the objects will not be the same. This becomes a problem when there is latency associated either with the request to create the object or in the response after the object has been created (causing a potentially large difference between a the time the request was made and the time of the response). To compensate, I use a decorator for the unit test methods that need it.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s the decorator function:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='nn'&gt;time&lt;/span&gt;

&lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;stopTime&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;f&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
    &lt;span class='n'&gt;original&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;time&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;time&lt;/span&gt;
    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;newf&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;args&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='o'&gt;**&lt;/span&gt;&lt;span class='n'&gt;kwargs&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
        &lt;span class='n'&gt;now&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;original&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
	&lt;span class='n'&gt;time&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;time&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;lambda&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt; &lt;span class='n'&gt;now&lt;/span&gt;
        &lt;span class='n'&gt;result&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;f&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;args&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='o'&gt;**&lt;/span&gt;&lt;span class='n'&gt;kwargs&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
	&lt;span class='n'&gt;time&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;time&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;original&lt;/span&gt;
	&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;result&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;newf&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Here&amp;#8217;s an example of usage in a unit test:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='nn'&gt;unittest&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='nn'&gt;time&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;TestSomething&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;unittest&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;TestCase&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;

    &lt;span class='nd'&gt;@stopTime&lt;/span&gt;
    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;test_something&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
        &lt;span class='n'&gt;a&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;time&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;time&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
        &lt;span class='n'&gt;time&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;sleep&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
	&lt;span class='n'&gt;b&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;time&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;time&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
        &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;assertEqual&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;a&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;b&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;In this case, &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt; will be the same, thus demonstrating your awesome ability to alter the &lt;a href='http://en.wikipedia.org/wiki/Spacetime'&gt;space-time continuum&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To further illustrate what&amp;#8217;s occurring, here&amp;#8217;s a picture of what you&amp;#8217;re doing:&lt;/p&gt;
&lt;a href='http://en.wikipedia.org/wiki/File:Spacetime_curvature.png'&gt;
&lt;img src='http://upload.wikimedia.org/wikipedia/commons/2/22/Spacetime_curvature.png' /&gt;
&lt;/a&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/045iWcINm9Q" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/python/testing/2012/02/12/stopping-time-during-python-tests.html</feedburner:origLink></entry>
  
  <entry>
    <title>Bandit: An A/B Testing Alternative for Rails</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/1wCkCctCOYE/bandit%3A-a-b-testing-alternative-for-rails.html" />
    <id>tag:findingscience.com:/rails/vanity/statistics/testing/2011/11/12/bandit:-a-b-testing-alternative-for-rails</id>
    <updated>2011-11-12T14:26:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p>In a typical A/B test, two alternatives are compared to see which produces the most “conversions” (that is, desired results). For instance, if you have a website with a big “Sign Up” button that you want visitors to click, you may wish to choose different background colors. Under typical A/B testing guildlines, you would pick a number (say, <em>N</em>) of users for a test and show half of them one color and half of them another color. After users are shown the button, you record the number of clicks that result from viewing each color. Once <em>N</em> users view one of the two alternatives, a statistical test (generally categorical, like a Chi-Square Test or a G-Test) is run to determine whether or not the number of clicks (aka, “conversions”) for one color were higher than the number of clicks for the other color. This test determines whether the difference you observed was likely due simply to chance or whether the difference you saw was more likely due to an actual difference in the rate of conversion.</p>

<p>This method of testing is popular, but is fraught with issues (practical and statistical). The <a href="https://github.com/bmuller/bandit">bandit gem</a> provides an implementation of an alternative method of testing for Rails that solves many of these issues.</p>

<h2 id="issues_with_ab_testing">Issues with A/B Testing</h2>

<p>There are a number of issues with A/B testing (some of which have been described in more detail <a href="http://untyped.com/untyping/2011/02/11/stop-ab-testing-and-make-out-like-a-bandit">here</a>):</p>

<ol>
<li>You can’t try anything too crazy without having to worry about half of your users not converting. For instance, you may want to try a horrendous color for your “Buy Now” button but are too afraid about potentially harming sales if your users hate it. In this case, the risk of a big change may outweigh the possible benefit if your users like it.</li>

<li>A/B testing provides a way of only testing two alternatives at once. Pick two, wait, pick two more, wait - this is not the easiest workflow if you want to test 50 options.</li>

<li>With A/B Testing, you need to have a fixed sample size to make the test valid (otherwise, you run the risk of repeated significance testing errors, as described in more detail <a href="http://www.evanmiller.org/how-not-to-run-an-ab-test.html">here</a>).</li>

<li>Due to the fixed sample size requirement, you may have to wait a while before you get any results from your test (especially if the expected improvement is marginal, in which case your sample size would need to be larger). This problem can be compounded if you don’t get much traffic.</li>

<li>Designers and developers generally don’t want to (and shouldn’t have to) understand statistical concepts like <a href="http://en.wikipedia.org/wiki/Statistical_power">power</a>, <a href="http://en.wikipedia.org/wiki/P-value">p-values</a>, or <a href="http://en.wikipedia.org/wiki/Confidence_intervals">confidence</a> when creating and evaluating tests.</li>

<li>There are no good answers for what you should do when A performs just as well as B. Was the sample size just too small (implying you should try again with a large sample)? Go with A? Go with B? Does it matter? The reality is it may matter - but you won’t know.</li>
</ol>

<h2 id="the_bandit_method">The Bandit Method</h2>

<p>The ultimate goal of A/B testing is to increase conversions. The problem can be described terms that differ greatly from the multitude of questions A/B testing brings (i.e., “Is A better than B?” followed by “Is B better than C?” followed by “Is C better than D?” <em>ad</em> <em>infinitum</em>). Instead, imagine you have a multitude of possible alternatives, and you want to make a decent choice between alternatives you know perform well and alternatives you haven’t tried very often each time a user requests a page. With each page load, pick the best alternative most of the time and an alternative that hasn’t been displayed much some of the time. After each display, monitor the conversions and update what you consider the “better” alternatives to be. This is the basic method of a solution to what is called the multi-armed bandit problem.</p>

<p>With a bandit solution, there is no concept of a “test”. At no point does the system announce a winner and a loser. Alternatives can be added or removed at any time. The better performing alternatives will be displayed more often, and the worst alternatives will rarely be displayed. At any point, if one of the poorly performing alternatives begins to perform better it will be shown more often. This provides solutions to all of the problems listed above:</p>

<ol>
<li>Go ahead and try something crazy. If it performs poorly, it won’t be shown very often.</li>

<li>Pick as many alternatives as you’d like and add them.</li>

<li>There’s no “test”, and no minimal sample size needed before optimization can start.</li>

<li>Information about conversions is utilized as users convert or do not convert. There is no pause before results can be immediately used in selecting the next alternative to display to a visitor.</li>

<li>Designers and developers can add alternatives or remove them at any time. The system will adjust immediately. If an alternative seems to be consistently performing poorly, it can be removed at any time. Alternatively, it can just be left forever. The best option will always be displayed the most often. There are no complicated decisions that have to be made up front or requirements that designers or developers know anything about proper statistical hypothesis testing.</li>

<li>If one alternative performs the same as another, they will both be displayed with the same regularity. There would be no need to choose one over the other or remove either of them.</li>
</ol>

<h2 id="bandit_gem">Bandit Gem</h2>

<p>While there are a few A/B testing libraries for Rails out there, the preeminant one (<a href="http://vanity.labnotes.org/">Vanity</a>) has <a href="/vanity/statistics/testing/2011/03/17/statistical-analysis-and-a-b-testing-correctly.html">statistical issues</a> and is <a href="https://github.com/assaf/vanity/issues/11">unreliable</a> in a production environment. <a href="https://github.com/bmuller/bandit">Bandit</a> was created to test the feasibility of a multi-armed bandit based alternative to A/B testing and to solve the issues with the Rails based A/B testing gems. It is still in development, though - use at your own risk.</p>

<h2 id="resources">Resources</h2>

<ul>
<li><a href="https://github.com/bmuller/bandit">bandit gem</a></li>

<li><a href="http://untyped.com/untyping/2011/02/11/stop-ab-testing-and-make-out-like-a-bandit">http://untyped.com/untyping/2011/02/11/stop-ab-testing-and-make-out-like-a-bandit</a></li>

<li><a href="http://en.wikipedia.org/wiki/Multi-armed_bandit">http://en.wikipedia.org/wiki/Multi-armed_bandit</a></li>

<li><a href="http://www.evanmiller.org/how-not-to-run-an-ab-test.html">http://www.evanmiller.org/how-not-to-run-an-ab-test.html</a></li>
</ul></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;In a typical A/B test, two alternatives are compared to see which produces the most &amp;#8220;conversions&amp;#8221; (that is, desired results). For instance, if you have a website with a big &amp;#8220;Sign Up&amp;#8221; button that you want visitors to click, you may wish to choose different background colors. Under typical A/B testing guildlines, you would pick a number (say, &lt;em&gt;N&lt;/em&gt;) of users for a test and show half of them one color and half of them another color. After users are shown the button, you record the number of clicks that result from viewing each color. Once &lt;em&gt;N&lt;/em&gt; users view one of the two alternatives, a statistical test (generally categorical, like a Chi-Square Test or a G-Test) is run to determine whether or not the number of clicks (aka, &amp;#8220;conversions&amp;#8221;) for one color were higher than the number of clicks for the other color. This test determines whether the difference you observed was likely due simply to chance or whether the difference you saw was more likely due to an actual difference in the rate of conversion.&lt;/p&gt;

&lt;p&gt;This method of testing is popular, but is fraught with issues (practical and statistical). The &lt;a href='https://github.com/bmuller/bandit'&gt;bandit gem&lt;/a&gt; provides an implementation of an alternative method of testing for Rails that solves many of these issues.&lt;/p&gt;

&lt;h2 id='issues_with_ab_testing'&gt;Issues with A/B Testing&lt;/h2&gt;

&lt;p&gt;There are a number of issues with A/B testing (some of which have been described in more detail &lt;a href='http://untyped.com/untyping/2011/02/11/stop-ab-testing-and-make-out-like-a-bandit'&gt;here&lt;/a&gt;):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You can&amp;#8217;t try anything too crazy without having to worry about half of your users not converting. For instance, you may want to try a horrendous color for your &amp;#8220;Buy Now&amp;#8221; button but are too afraid about potentially harming sales if your users hate it. In this case, the risk of a big change may outweigh the possible benefit if your users like it.&lt;/li&gt;

&lt;li&gt;A/B testing provides a way of only testing two alternatives at once. Pick two, wait, pick two more, wait - this is not the easiest workflow if you want to test 50 options.&lt;/li&gt;

&lt;li&gt;With A/B Testing, you need to have a fixed sample size to make the test valid (otherwise, you run the risk of repeated significance testing errors, as described in more detail &lt;a href='http://www.evanmiller.org/how-not-to-run-an-ab-test.html'&gt;here&lt;/a&gt;).&lt;/li&gt;

&lt;li&gt;Due to the fixed sample size requirement, you may have to wait a while before you get any results from your test (especially if the expected improvement is marginal, in which case your sample size would need to be larger). This problem can be compounded if you don&amp;#8217;t get much traffic.&lt;/li&gt;

&lt;li&gt;Designers and developers generally don&amp;#8217;t want to (and shouldn&amp;#8217;t have to) understand statistical concepts like &lt;a href='http://en.wikipedia.org/wiki/Statistical_power'&gt;power&lt;/a&gt;, &lt;a href='http://en.wikipedia.org/wiki/P-value'&gt;p-values&lt;/a&gt;, or &lt;a href='http://en.wikipedia.org/wiki/Confidence_intervals'&gt;confidence&lt;/a&gt; when creating and evaluating tests.&lt;/li&gt;

&lt;li&gt;There are no good answers for what you should do when A performs just as well as B. Was the sample size just too small (implying you should try again with a large sample)? Go with A? Go with B? Does it matter? The reality is it may matter - but you won&amp;#8217;t know.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id='the_bandit_method'&gt;The Bandit Method&lt;/h2&gt;

&lt;p&gt;The ultimate goal of A/B testing is to increase conversions. The problem can be described terms that differ greatly from the multitude of questions A/B testing brings (i.e., &amp;#8220;Is A better than B?&amp;#8221; followed by &amp;#8220;Is B better than C?&amp;#8221; followed by &amp;#8220;Is C better than D?&amp;#8221; &lt;em&gt;ad&lt;/em&gt; &lt;em&gt;infinitum&lt;/em&gt;). Instead, imagine you have a multitude of possible alternatives, and you want to make a decent choice between alternatives you know perform well and alternatives you haven&amp;#8217;t tried very often each time a user requests a page. With each page load, pick the best alternative most of the time and an alternative that hasn&amp;#8217;t been displayed much some of the time. After each display, monitor the conversions and update what you consider the &amp;#8220;better&amp;#8221; alternatives to be. This is the basic method of a solution to what is called the multi-armed bandit problem.&lt;/p&gt;

&lt;p&gt;With a bandit solution, there is no concept of a &amp;#8220;test&amp;#8221;. At no point does the system announce a winner and a loser. Alternatives can be added or removed at any time. The better performing alternatives will be displayed more often, and the worst alternatives will rarely be displayed. At any point, if one of the poorly performing alternatives begins to perform better it will be shown more often. This provides solutions to all of the problems listed above:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go ahead and try something crazy. If it performs poorly, it won&amp;#8217;t be shown very often.&lt;/li&gt;

&lt;li&gt;Pick as many alternatives as you&amp;#8217;d like and add them.&lt;/li&gt;

&lt;li&gt;There&amp;#8217;s no &amp;#8220;test&amp;#8221;, and no minimal sample size needed before optimization can start.&lt;/li&gt;

&lt;li&gt;Information about conversions is utilized as users convert or do not convert. There is no pause before results can be immediately used in selecting the next alternative to display to a visitor.&lt;/li&gt;

&lt;li&gt;Designers and developers can add alternatives or remove them at any time. The system will adjust immediately. If an alternative seems to be consistently performing poorly, it can be removed at any time. Alternatively, it can just be left forever. The best option will always be displayed the most often. There are no complicated decisions that have to be made up front or requirements that designers or developers know anything about proper statistical hypothesis testing.&lt;/li&gt;

&lt;li&gt;If one alternative performs the same as another, they will both be displayed with the same regularity. There would be no need to choose one over the other or remove either of them.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id='bandit_gem'&gt;Bandit Gem&lt;/h2&gt;

&lt;p&gt;While there are a few A/B testing libraries for Rails out there, the preeminant one (&lt;a href='http://vanity.labnotes.org/'&gt;Vanity&lt;/a&gt;) has &lt;a href='/vanity/statistics/testing/2011/03/17/statistical-analysis-and-a-b-testing-correctly.html'&gt;statistical issues&lt;/a&gt; and is &lt;a href='https://github.com/assaf/vanity/issues/11'&gt;unreliable&lt;/a&gt; in a production environment. &lt;a href='https://github.com/bmuller/bandit'&gt;Bandit&lt;/a&gt; was created to test the feasibility of a multi-armed bandit based alternative to A/B testing and to solve the issues with the Rails based A/B testing gems. It is still in development, though - use at your own risk.&lt;/p&gt;

&lt;h2 id='resources'&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://github.com/bmuller/bandit'&gt;bandit gem&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://untyped.com/untyping/2011/02/11/stop-ab-testing-and-make-out-like-a-bandit'&gt;http://untyped.com/untyping/2011/02/11/stop-ab-testing-and-make-out-like-a-bandit&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://en.wikipedia.org/wiki/Multi-armed_bandit'&gt;http://en.wikipedia.org/wiki/Multi-armed_bandit&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://www.evanmiller.org/how-not-to-run-an-ab-test.html'&gt;http://www.evanmiller.org/how-not-to-run-an-ab-test.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/1wCkCctCOYE" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/rails/vanity/statistics/testing/2011/11/12/bandit%3A-a-b-testing-alternative-for-rails.html</feedburner:origLink></entry>
  
  <entry>
    <title>Campfirer.com - A Jabber to Campfirenow.com Gateway</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/kKRU3w6soPE/campfirer.com-a-jabber-to-campfirenow.com-gateway.html" />
    <id>tag:findingscience.com:/campfirer_project/campfirenow/jabber/2011/09/13/campfirer.com-a-jabber-to-campfirenow.com-gateway</id>
    <updated>2011-09-13T17:21:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p><a href="http://campfirenow.com">Campfire</a> is a web-based group chat service that is directed at businesses. Rather than using a standard protocol, the folk at <a href="http://37signals.com">37 Signals</a> decided to invent their own. This has led to the necessary creation of a number of custom clients to interact with the API using their unique, one-of-a-kind protocol (for those who don’t want to have to chat in a browser window).</p>

<p>I heart <a href="http://xmpp.org">Jabber (XMPP)</a>. There’s a good reason <a href="http://en.wikipedia.org/wiki/Google_Talk#Interoperability">Google</a> and <a href="http://developers.facebook.com/docs/chat/">Facebook</a> chose that protocol to power their chat. I have no idea why 37 Signals didn’t use Jabber too. Maybe they’re <a href="http://www.youtube.com/watch?v=sBzXVHoF-pI">mavericks</a>.</p>

<p>Naturally, I’d like to be able to use one of many <a href="http://xmpp.org/xmpp-software/clients/">Jabber clients</a> to access Campfire, along with all of my other Jabber based accounts. To do this, I wrote a <a href="http://xmpp.org/extensions/xep-0114.html">Jabber Component</a>. It provides Multi-User Chat (MUC) support for Jabber servers that utilizes Campfire’s API, so you can “join” a room, “talk”, and see other posts by other users. It’s called Campfirer (campfire + jabber = campfirer).</p>

<p>I’ve set up a running instance of the service at <a href="http://campfirer.com">campfirer.com</a>. A description of how to download / set up the code for your own Jabber server can be found there.</p>

<p>The code and more info can be found on the <a href="http://github.com/bmuller/campfirer">github project page</a>. Pull requests welcome.</p></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;&lt;a href='http://campfirenow.com'&gt;Campfire&lt;/a&gt; is a web-based group chat service that is directed at businesses. Rather than using a standard protocol, the folk at &lt;a href='http://37signals.com'&gt;37 Signals&lt;/a&gt; decided to invent their own. This has led to the necessary creation of a number of custom clients to interact with the API using their unique, one-of-a-kind protocol (for those who don&amp;#8217;t want to have to chat in a browser window).&lt;/p&gt;

&lt;p&gt;I heart &lt;a href='http://xmpp.org'&gt;Jabber (XMPP)&lt;/a&gt;. There&amp;#8217;s a good reason &lt;a href='http://en.wikipedia.org/wiki/Google_Talk#Interoperability'&gt;Google&lt;/a&gt; and &lt;a href='http://developers.facebook.com/docs/chat/'&gt;Facebook&lt;/a&gt; chose that protocol to power their chat. I have no idea why 37 Signals didn&amp;#8217;t use Jabber too. Maybe they&amp;#8217;re &lt;a href='http://www.youtube.com/watch?v=sBzXVHoF-pI'&gt;mavericks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Naturally, I&amp;#8217;d like to be able to use one of many &lt;a href='http://xmpp.org/xmpp-software/clients/'&gt;Jabber clients&lt;/a&gt; to access Campfire, along with all of my other Jabber based accounts. To do this, I wrote a &lt;a href='http://xmpp.org/extensions/xep-0114.html'&gt;Jabber Component&lt;/a&gt;. It provides Multi-User Chat (MUC) support for Jabber servers that utilizes Campfire&amp;#8217;s API, so you can &amp;#8220;join&amp;#8221; a room, &amp;#8220;talk&amp;#8221;, and see other posts by other users. It&amp;#8217;s called Campfirer (campfire + jabber = campfirer).&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve set up a running instance of the service at &lt;a href='http://campfirer.com'&gt;campfirer.com&lt;/a&gt;. A description of how to download / set up the code for your own Jabber server can be found there.&lt;/p&gt;

&lt;p&gt;The code and more info can be found on the &lt;a href='http://github.com/bmuller/campfirer'&gt;github project page&lt;/a&gt;. Pull requests welcome.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/kKRU3w6soPE" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/campfirer_project/campfirenow/jabber/2011/09/13/campfirer.com-a-jabber-to-campfirenow.com-gateway.html</feedburner:origLink></entry>
  
  <entry>
    <title>Incr/Decr Counters Using memcache-client</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/R6RajMSSqvc/counters-using-memcache-client-gem.html" />
    <id>tag:findingscience.com:/memcache/ruby/2011/08/13/counters-using-memcache-client-gem</id>
    <updated>2011-08-13T20:59:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p>Based on some recent changes in the <a href="http://memcached.org/">memcached</a> library, the <a href="http://seattlerb.rubyforge.org/memcache-client/classes/MemCache.html#M000009">incr</a> method in the <a href="http://seattlerb.rubyforge.org/memcache-client/">memcache-client</a> gem no longer works as expected. For instance, the following:</p>
<div class="highlight"><pre><code class="ruby"><span class="nb">require</span> <span class="s1">'rubygems'</span>
<span class="nb">require</span> <span class="s1">'memcache-client'</span>

<span class="n">m</span> <span class="o">=</span> <span class="no">MemCache</span><span class="o">.</span><span class="n">new</span> <span class="s1">'localhost'</span>
<span class="n">m</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s1">'counter'</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">m</span><span class="o">.</span><span class="n">incr</span><span class="p">(</span><span class="s1">'counter'</span><span class="p">)</span>
</code></pre>
</div>
<p>will result in the following error:</p>
<pre>
MemCache::MemCacheError: cannot increment or decrement non-numeric value
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:926:in `raise_on_error_response!'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:831:in `cache_incr'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:865:in `call'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:865:in `with_socket_management'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:827:in `cache_incr'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:342:in `incr'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:886:in `with_server'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:341:in `incr'
			 from (irb):5
			 from /usr/local/lib/site_ruby/1.8/rubygems.rb:123
</pre>
<p>This is caused by the memcache-client gem <a href="http://ruby-doc.org/core/classes/Marshal.html">marshalling</a> everything before it’s stored in memcache. Memcache needs the actual, unmarshalled, integer value to be stored. The code above should be changed to:</p>
<div class="highlight"><pre><code class="ruby"><span class="nb">require</span> <span class="s1">'rubygems'</span>
<span class="nb">require</span> <span class="s1">'memcache-client'</span>

<span class="n">m</span> <span class="o">=</span> <span class="no">MemCache</span><span class="o">.</span><span class="n">new</span> <span class="s1">'localhost'</span>

<span class="c1"># set the raw value initially by passing in a fourth argument of true</span>
<span class="n">m</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s1">'counter'</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="kp">true</span><span class="p">)</span>

<span class="c1"># increment the raw integer value</span>
<span class="n">m</span><span class="o">.</span><span class="n">incr</span><span class="p">(</span><span class="s1">'counter'</span><span class="p">)</span>

<span class="c1"># you can now decrement the raw integer value as well</span>
<span class="n">m</span><span class="o">.</span><span class="n">decr</span><span class="p">(</span><span class="s1">'counter'</span><span class="p">)</span>
</code></pre>
</div>
<p>The fix is simple, but not noted anywhere (I can find it) in the <a href="http://seattlerb.rubyforge.org/memcache-client/">memcache-client</a> documentation. Besides a few mentions on Google-groups <i>sans</i> solution, I couldn’t find any references to this issue elsewhere on the world wide intertubes. I find the atomic incr/decr functionality in memcache to be quite useful; I hope this can help alleviate any issues others might be having with this problem.</p></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;Based on some recent changes in the &lt;a href='http://memcached.org/'&gt;memcached&lt;/a&gt; library, the &lt;a href='http://seattlerb.rubyforge.org/memcache-client/classes/MemCache.html#M000009'&gt;incr&lt;/a&gt; method in the &lt;a href='http://seattlerb.rubyforge.org/memcache-client/'&gt;memcache-client&lt;/a&gt; gem no longer works as expected. For instance, the following:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='nb'&gt;require&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;rubygems&amp;#39;&lt;/span&gt;
&lt;span class='nb'&gt;require&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;memcache-client&amp;#39;&lt;/span&gt;

&lt;span class='n'&gt;m&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='no'&gt;MemCache&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;new&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;localhost&amp;#39;&lt;/span&gt;
&lt;span class='n'&gt;m&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;set&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;counter&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='n'&gt;m&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;incr&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;counter&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;will result in the following error:&lt;/p&gt;
&lt;pre&gt;
MemCache::MemCacheError: cannot increment or decrement non-numeric value
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:926:in `raise_on_error_response!'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:831:in `cache_incr'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:865:in `call'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:865:in `with_socket_management'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:827:in `cache_incr'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:342:in `incr'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:886:in `with_server'
			 from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:341:in `incr'
			 from (irb):5
			 from /usr/local/lib/site_ruby/1.8/rubygems.rb:123
&lt;/pre&gt;
&lt;p&gt;This is caused by the memcache-client gem &lt;a href='http://ruby-doc.org/core/classes/Marshal.html'&gt;marshalling&lt;/a&gt; everything before it&amp;#8217;s stored in memcache. Memcache needs the actual, unmarshalled, integer value to be stored. The code above should be changed to:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='nb'&gt;require&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;rubygems&amp;#39;&lt;/span&gt;
&lt;span class='nb'&gt;require&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;memcache-client&amp;#39;&lt;/span&gt;

&lt;span class='n'&gt;m&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='no'&gt;MemCache&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;new&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;localhost&amp;#39;&lt;/span&gt;

&lt;span class='c1'&gt;# set the raw value initially by passing in a fourth argument of true&lt;/span&gt;
&lt;span class='n'&gt;m&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;set&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;counter&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kp'&gt;true&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;

&lt;span class='c1'&gt;# increment the raw integer value&lt;/span&gt;
&lt;span class='n'&gt;m&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;incr&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;counter&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;

&lt;span class='c1'&gt;# you can now decrement the raw integer value as well&lt;/span&gt;
&lt;span class='n'&gt;m&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;decr&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;counter&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The fix is simple, but not noted anywhere (I can find it) in the &lt;a href='http://seattlerb.rubyforge.org/memcache-client/'&gt;memcache-client&lt;/a&gt; documentation. Besides a few mentions on Google-groups &lt;i&gt;sans&lt;/i&gt; solution, I couldn&amp;#8217;t find any references to this issue elsewhere on the world wide intertubes. I find the atomic incr/decr functionality in memcache to be quite useful; I hope this can help alleviate any issues others might be having with this problem.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/R6RajMSSqvc" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/memcache/ruby/2011/08/13/counters-using-memcache-client-gem.html</feedburner:origLink></entry>
  
  <entry>
    <title>HBaseRB: A Ruby HBase Library</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/SvopBQk2pUE/hbaserb%3A-a-ruby-hbase-library.html" />
    <id>tag:findingscience.com:/hbase/ruby/hadoop/2011/08/01/hbaserb:-a-ruby-hbase-library</id>
    <updated>2011-08-01T18:00:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p>I recently upgraded the <a href="https://github.com/bmuller/hbaserb">HBaseRb</a> library I wrote a few months ago. HBaseRB provides a means for Ruby to interact with <a href="http://hbase.apache.org/">HBase</a> using a <a href="http://thrift.apache.org/">Thrift</a> interface. Most other libraries (like <a href="https://github.com/sishen/hbase-ruby">hbase-ruby</a>, for instance) use the REST interface provided by HBase. This may work in many situations, but for our applications at <a href="http://livingsocial.com">LivingSocial</a> we wanted the benefit of using a binary protocol without the overhead of XML parsing.</p>

<p>Some Google searching elucidated the fact that <a href="https://github.com/bmuller/hbaserb">HBaseRb</a> is a bit hard to find, so I thought I’d mention it here.</p></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;I recently upgraded the &lt;a href='https://github.com/bmuller/hbaserb'&gt;HBaseRb&lt;/a&gt; library I wrote a few months ago. HBaseRB provides a means for Ruby to interact with &lt;a href='http://hbase.apache.org/'&gt;HBase&lt;/a&gt; using a &lt;a href='http://thrift.apache.org/'&gt;Thrift&lt;/a&gt; interface. Most other libraries (like &lt;a href='https://github.com/sishen/hbase-ruby'&gt;hbase-ruby&lt;/a&gt;, for instance) use the REST interface provided by HBase. This may work in many situations, but for our applications at &lt;a href='http://livingsocial.com'&gt;LivingSocial&lt;/a&gt; we wanted the benefit of using a binary protocol without the overhead of XML parsing.&lt;/p&gt;

&lt;p&gt;Some Google searching elucidated the fact that &lt;a href='https://github.com/bmuller/hbaserb'&gt;HBaseRb&lt;/a&gt; is a bit hard to find, so I thought I&amp;#8217;d mention it here.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/SvopBQk2pUE" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/hbase/ruby/hadoop/2011/08/01/hbaserb%3A-a-ruby-hbase-library.html</feedburner:origLink></entry>
  
  <entry>
    <title>Changing Namenode Hostname Breaks Hive</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/a69QvOzkcMk/changing-namenode-breaks-hive.html" />
    <id>tag:findingscience.com:/hive/hadoop/2011/07/18/changing-namenode-breaks-hive</id>
    <updated>2011-07-18T18:50:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p><a href="http://wiki.apache.org/hadoop/Hive">Hive</a> is a great piece of software - but there are still some major issues. I ran into one recently when I changed the hostname of the Hadoop namenode. I couldn’t figure out why hive was using the old hostname, even after changing all of the config files in the <strong>$HADOOP_HOME</strong> to use the new one and testing other map/red jobs.</p>

<p>Apparently, Hive stores all partition information with full references to the location (for instance, <em>hdfs://host:9000/user/hive/warehouse/some/path</em>). This makes lookups faster in the metastore, but makes it impossible to easily change the hostname of your namenode.</p>

<p>The best way I could find to do this was the following:</p>

<ol>
<li><strong>mysqldump</strong> the metadata database to a local file</li>

<li>Edit the dump and do a global search and replace on any instances of the old hostname</li>

<li>Reimport the dump</li>
</ol>

<p>If the location was saved in a separate table (w/ a one to many relationship between partitions and hosts / locations) it would make this process quite a bit easier.</p></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;&lt;a href='http://wiki.apache.org/hadoop/Hive'&gt;Hive&lt;/a&gt; is a great piece of software - but there are still some major issues. I ran into one recently when I changed the hostname of the Hadoop namenode. I couldn&amp;#8217;t figure out why hive was using the old hostname, even after changing all of the config files in the &lt;strong&gt;$HADOOP_HOME&lt;/strong&gt; to use the new one and testing other map/red jobs.&lt;/p&gt;

&lt;p&gt;Apparently, Hive stores all partition information with full references to the location (for instance, &lt;em&gt;hdfs://host:9000/user/hive/warehouse/some/path&lt;/em&gt;). This makes lookups faster in the metastore, but makes it impossible to easily change the hostname of your namenode.&lt;/p&gt;

&lt;p&gt;The best way I could find to do this was the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;mysqldump&lt;/strong&gt; the metadata database to a local file&lt;/li&gt;

&lt;li&gt;Edit the dump and do a global search and replace on any instances of the old hostname&lt;/li&gt;

&lt;li&gt;Reimport the dump&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the location was saved in a separate table (w/ a one to many relationship between partitions and hosts / locations) it would make this process quite a bit easier.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/a69QvOzkcMk" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/hive/hadoop/2011/07/18/changing-namenode-breaks-hive.html</feedburner:origLink></entry>
  
  <entry>
    <title>Good DC Coffee Shops</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/MXH6JUtpG7I/good-dc-coffeeshops.html" />
    <id>tag:findingscience.com:/dc/coffee/2011/06/19/good-dc-coffeeshops</id>
    <updated>2011-06-19T12:55:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p>I moved to DC about four months ago, and since then, my weekends have been frequently occupied with one quest: find the best DC coffee shop. When I was in Charleston, the answer used to to be easy (<a href="http://kuducharleston.com/">Kudu Coffee</a>, if you’re wondering). In Baltimore, it was even easier (<a href="http://redemmas.org">Red Emma’s Bookstore Coffeehouse</a>). In The District, however, I’ve had a much harder time. There are many meretricious options to choose from, and few are real winners. There are quite a few convenience stores, bars, and restaurants that call themselves a “cafe” and really shouldn’t.</p>

<p>What’s a winner? Admittedly, it has a lot to do with a place that I can break out a laptop, drink some coffee, do some work, and be just distracted enough by nearby conversation that I don’t mind the fact that I’m working. Here are the metrics I take into consideration:</p>

<ul>
<li>outdoor seating</li>

<li>free wifi</li>

<li>ample seating</li>

<li>power outlets</li>

<li>eavesdropping payoffs (audible interesting conversations, often philosophical in nature)</li>

<li>quality music or live performances</li>

<li>good collaborative space (big tables, etc)</li>

<li>proximity to public transportation</li>
</ul>

<p>So here are some top performers on this list, with a final entry of what I believe to be the winner.</p>

<h2 id="ebenezers_coffeehouse"><a href="http://ebenezerscoffeehouse.com/">Ebenezer’s Coffeehouse</a></h2>

<p>This is the first place I went to in DC. It’s right next to Union Station, so it’s quite accessible. I didn’t realize it at first, but this establishment is owned and operated by a Christian church. This, naturally, leads to a rather homogeneous clientele makeup, which often consists of small Bible study groups and prayer groups. Seating is generally available, the coffee is alright, and there is free timed wifi (with a purchase) - but don’t expect an interesting space, interesting characters, or any stimulating conversation.</p>

<h2 id="tryst"><a href="http://www.trystdc.com/">Tryst</a></h2>

<p>This place is more of a restaurant / cafe. It’s generally completely packed on the weekends with hungover college students looking for food and coffee. This is not a good work place, even if you decide to wait for a seat.</p>

<h2 id="big_bear_cafe"><a href="http://www.bigbearcafe-dc.com">Big Bear Cafe</a></h2>

<p>A great location with plenty of hits on my list of important qualities. There’s outdoor seating, free wifi, good collaborative space, great music, and more. The disadvantages are major, though - seating is impossible on the weekends and there’s no nearby metro stop.</p>

<h2 id="chinatown_coffee_co"><a href="http://chinatowncoffee.com/">Chinatown Coffee Co.</a></h2>

<p>Excellent coffee can be found here. There’s generally enough seating, free wifi, and it’s right next to the Chinatown metro stop. You’re not likely to overhear any juicy conversations though, most stick to themselves at tables meant for one or two.</p>

<h2 id="filter_coffeehouse_and_espresso_bar"><a href="http://www.filtercoffeehouse.com/">Filter Coffeehouse and Espresso Bar</a></h2>

<p>Great coffee here, too, and it’s a short walk from the Dupont metro stop. There’s outdoor seating as well, though that and all of the few seats indoors are generally taken. With better seating options or fewer patrons, this place would be a real winner.</p>

<h2 id="midcity_caffe"><a href="http://www.midcitycaffe.com/">MidCity Caffe</a></h2>

<p>The winner at this point is MidCity. They always have enough seating (though all seats are really close to each other, so you’ll probably make a friend), free wifi, great coffee, and excellent music. I’ve even seen a live performance or two there. It’s not too far from the U St metro stop. Another great thing about this place is the owners have made a special effort to put power strips everywhere.</p>

<p>There are plenty of mediocre places I’ve left off (Jolt n Bolt Coffee &amp; Tea House, <a href="http://windowscafedc.com/">Windows Cafe &amp; Market</a>, and many more not worth mentioning), so this short list is by no means comprehensive. I’ll add to it if I find any other locations worth a plug.</p></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;I moved to DC about four months ago, and since then, my weekends have been frequently occupied with one quest: find the best DC coffee shop. When I was in Charleston, the answer used to to be easy (&lt;a href='http://kuducharleston.com/'&gt;Kudu Coffee&lt;/a&gt;, if you&amp;#8217;re wondering). In Baltimore, it was even easier (&lt;a href='http://redemmas.org'&gt;Red Emma&amp;#8217;s Bookstore Coffeehouse&lt;/a&gt;). In The District, however, I&amp;#8217;ve had a much harder time. There are many meretricious options to choose from, and few are real winners. There are quite a few convenience stores, bars, and restaurants that call themselves a &amp;#8220;cafe&amp;#8221; and really shouldn&amp;#8217;t.&lt;/p&gt;

&lt;p&gt;What&amp;#8217;s a winner? Admittedly, it has a lot to do with a place that I can break out a laptop, drink some coffee, do some work, and be just distracted enough by nearby conversation that I don&amp;#8217;t mind the fact that I&amp;#8217;m working. Here are the metrics I take into consideration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;outdoor seating&lt;/li&gt;

&lt;li&gt;free wifi&lt;/li&gt;

&lt;li&gt;ample seating&lt;/li&gt;

&lt;li&gt;power outlets&lt;/li&gt;

&lt;li&gt;eavesdropping payoffs (audible interesting conversations, often philosophical in nature)&lt;/li&gt;

&lt;li&gt;quality music or live performances&lt;/li&gt;

&lt;li&gt;good collaborative space (big tables, etc)&lt;/li&gt;

&lt;li&gt;proximity to public transportation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So here are some top performers on this list, with a final entry of what I believe to be the winner.&lt;/p&gt;

&lt;h2 id='ebenezers_coffeehouse'&gt;&lt;a href='http://ebenezerscoffeehouse.com/'&gt;Ebenezer&amp;#8217;s Coffeehouse&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;This is the first place I went to in DC. It&amp;#8217;s right next to Union Station, so it&amp;#8217;s quite accessible. I didn&amp;#8217;t realize it at first, but this establishment is owned and operated by a Christian church. This, naturally, leads to a rather homogeneous clientele makeup, which often consists of small Bible study groups and prayer groups. Seating is generally available, the coffee is alright, and there is free timed wifi (with a purchase) - but don&amp;#8217;t expect an interesting space, interesting characters, or any stimulating conversation.&lt;/p&gt;

&lt;h2 id='tryst'&gt;&lt;a href='http://www.trystdc.com/'&gt;Tryst&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;This place is more of a restaurant / cafe. It&amp;#8217;s generally completely packed on the weekends with hungover college students looking for food and coffee. This is not a good work place, even if you decide to wait for a seat.&lt;/p&gt;

&lt;h2 id='big_bear_cafe'&gt;&lt;a href='http://www.bigbearcafe-dc.com'&gt;Big Bear Cafe&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;A great location with plenty of hits on my list of important qualities. There&amp;#8217;s outdoor seating, free wifi, good collaborative space, great music, and more. The disadvantages are major, though - seating is impossible on the weekends and there&amp;#8217;s no nearby metro stop.&lt;/p&gt;

&lt;h2 id='chinatown_coffee_co'&gt;&lt;a href='http://chinatowncoffee.com/'&gt;Chinatown Coffee Co.&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;Excellent coffee can be found here. There&amp;#8217;s generally enough seating, free wifi, and it&amp;#8217;s right next to the Chinatown metro stop. You&amp;#8217;re not likely to overhear any juicy conversations though, most stick to themselves at tables meant for one or two.&lt;/p&gt;

&lt;h2 id='filter_coffeehouse_and_espresso_bar'&gt;&lt;a href='http://www.filtercoffeehouse.com/'&gt;Filter Coffeehouse and Espresso Bar&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;Great coffee here, too, and it&amp;#8217;s a short walk from the Dupont metro stop. There&amp;#8217;s outdoor seating as well, though that and all of the few seats indoors are generally taken. With better seating options or fewer patrons, this place would be a real winner.&lt;/p&gt;

&lt;h2 id='midcity_caffe'&gt;&lt;a href='http://www.midcitycaffe.com/'&gt;MidCity Caffe&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;The winner at this point is MidCity. They always have enough seating (though all seats are really close to each other, so you&amp;#8217;ll probably make a friend), free wifi, great coffee, and excellent music. I&amp;#8217;ve even seen a live performance or two there. It&amp;#8217;s not too far from the U St metro stop. Another great thing about this place is the owners have made a special effort to put power strips everywhere.&lt;/p&gt;

&lt;p&gt;There are plenty of mediocre places I&amp;#8217;ve left off (Jolt n Bolt Coffee &amp;amp; Tea House, &lt;a href='http://windowscafedc.com/'&gt;Windows Cafe &amp;amp; Market&lt;/a&gt;, and many more not worth mentioning), so this short list is by no means comprehensive. I&amp;#8217;ll add to it if I find any other locations worth a plug.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/MXH6JUtpG7I" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/dc/coffee/2011/06/19/good-dc-coffeeshops.html</feedburner:origLink></entry>
  
  <entry>
    <title>Asynchronous MySQL in Python: Twistar 1.0</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/XxTy8CgIqm4/asynchronous-mysql-in-python%3A-twistar-1.0.html" />
    <id>tag:findingscience.com:/python/twisted/twistar_project/2011/06/18/asynchronous-mysql-in-python:-twistar-1.0</id>
    <updated>2011-06-18T16:13:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p>After a few more updates and contributions, I’ve finally decided to release version 1.0 of <a href="http://findingscience.com/twistar">Twistar</a>. The recent work and contributions have brought it in line with what I consider to be a feature rich enough library ready for a version one release.</p>

<p>Description from the website:</p>

<p>Twistar is a Python implementation of the <a href="http://en.wikipedia.org/wiki/Active_record_pattern">active record pattern</a> (also known as an object-relational mapper or ORM) that uses the <a href="http://twistedmatrix.com/trac/">Twisted</a> framework’s <a href="http://twistedmatrix.com/documents/current/core/howto/rdbms.html">RDBMS support</a> to provide a non-blocking interface to relational databases.</p>

<p>Twistar currently features:</p>

<ul>
<li>A thoroughly asynchronous API</li>

<li>Object validations (and support for the easy creation of new validation methods)</li>

<li>Support for callbacks before saving / creating / updating / deleting</li>

<li>Support for object relational models that can be queried asynchronously</li>

<li>A simple interface to <a href="http://www.python.org/dev/peps/pep-0249/">DBAPI</a> objects</li>

<li>A framework to support any relational database that has a module that implements the <a href="http://www.python.org/dev/peps/pep-0249/">Python Database API Specification v2.0</a> (MySQL, PostgreSQL, and SQLite are all supported now)</li>

<li>Support for object polymorphism</li>

<li>Unit tests</li>
</ul>

<p>For more information, check out <a href="http://findingscience.com/twistar">the website</a> or <a href="http://github.com/bmuller/twistar">the github page</a>.</p></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;After a few more updates and contributions, I&amp;#8217;ve finally decided to release version 1.0 of &lt;a href='http://findingscience.com/twistar'&gt;Twistar&lt;/a&gt;. The recent work and contributions have brought it in line with what I consider to be a feature rich enough library ready for a version one release.&lt;/p&gt;

&lt;p&gt;Description from the website:&lt;/p&gt;

&lt;p&gt;Twistar is a Python implementation of the &lt;a href='http://en.wikipedia.org/wiki/Active_record_pattern'&gt;active record pattern&lt;/a&gt; (also known as an object-relational mapper or ORM) that uses the &lt;a href='http://twistedmatrix.com/trac/'&gt;Twisted&lt;/a&gt; framework&amp;#8217;s &lt;a href='http://twistedmatrix.com/documents/current/core/howto/rdbms.html'&gt;RDBMS support&lt;/a&gt; to provide a non-blocking interface to relational databases.&lt;/p&gt;

&lt;p&gt;Twistar currently features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A thoroughly asynchronous API&lt;/li&gt;

&lt;li&gt;Object validations (and support for the easy creation of new validation methods)&lt;/li&gt;

&lt;li&gt;Support for callbacks before saving / creating / updating / deleting&lt;/li&gt;

&lt;li&gt;Support for object relational models that can be queried asynchronously&lt;/li&gt;

&lt;li&gt;A simple interface to &lt;a href='http://www.python.org/dev/peps/pep-0249/'&gt;DBAPI&lt;/a&gt; objects&lt;/li&gt;

&lt;li&gt;A framework to support any relational database that has a module that implements the &lt;a href='http://www.python.org/dev/peps/pep-0249/'&gt;Python Database API Specification v2.0&lt;/a&gt; (MySQL, PostgreSQL, and SQLite are all supported now)&lt;/li&gt;

&lt;li&gt;Support for object polymorphism&lt;/li&gt;

&lt;li&gt;Unit tests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more information, check out &lt;a href='http://findingscience.com/twistar'&gt;the website&lt;/a&gt; or &lt;a href='http://github.com/bmuller/twistar'&gt;the github page&lt;/a&gt;.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/XxTy8CgIqm4" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/python/twisted/twistar_project/2011/06/18/asynchronous-mysql-in-python%3A-twistar-1.0.html</feedburner:origLink></entry>
  
  <entry>
    <title>Fun with Ruby Symbol Expressions</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/TEQ-vhDhDmg/fun-with-ruby-symbol-expressions.html" />
    <id>tag:findingscience.com:/ruby/metaprogramming/2011/05/24/fun-with-ruby-symbol-expressions</id>
    <updated>2011-05-24T19:02:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p>Groupon released an interesting extension to the <strong>Symbol#to_proc</strong> method named <a href="https://github.com/groupon/symbol_expressions">symbol_expressions</a> over a year ago (I didn’t notice it until recently). It allows you to compose procs based on combinations of existing methods. For instance, to split and then join strings:</p>
<div class="highlight"><pre><code class="ruby"><span class="o">[</span><span class="s2">"foo"</span><span class="p">,</span> <span class="s2">"bar"</span><span class="o">].</span><span class="n">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:split</span><span class="o">[</span><span class="s1">''</span><span class="o">]+</span><span class="ss">:join</span><span class="o">[</span><span class="s1">'_'</span><span class="o">]</span><span class="p">)</span>
<span class="c1"># =&gt; ["f_o_o", "b_a_r"]</span>
</code></pre>
</div>
<p>I thought this was nifty, but the syntax is a bit odd (brackets are not generally used as argument list boundaries). Additionally, this sort of <strong>Proc</strong> composition is something a <strong>Proc</strong> should know how to create, but it doesn’t make sense to have a <strong>Symbol</strong> keeping track of a list of other <strong>Symbol</strong>s that have been “added” to it (especially via an <a href="https://github.com/groupon/symbol_expressions/blob/master/lib/symbol_expressions.rb#L80">internal array class</a>). It just seems like a bit of a hack to have <strong>Symbols</strong> acting as lists of other <strong>Symbol</strong>s.</p>

<p>Based on these ideas, I reduced the <em>symbol_expressions</em> lib to the following lines:</p>
<script src="https://gist.github.com/989964.js">
</script>
<p>With this little bit of code (which simply prefixes argument lists with a <strong>|</strong> symbol), you can now do stuff like this:</p>
<div class="highlight"><pre><code class="ruby"><span class="c1"># composition using Proc (rather than Symbols that have lists of Symbols in them)</span>
<span class="n">splitjoin</span> <span class="o">=</span> <span class="no">Proc</span><span class="o">.</span><span class="n">from_sym</span><span class="p">(</span><span class="ss">:split</span> <span class="o">|</span> <span class="s1">''</span><span class="p">,</span> <span class="ss">:join</span> <span class="o">|</span> <span class="s2">" "</span><span class="p">,</span> <span class="ss">:upcase</span><span class="p">)</span>
<span class="n">splitjoin</span><span class="o">.</span><span class="n">call</span> <span class="s2">"what"</span>
<span class="c1"># =&gt; "W H A T"</span>

<span class="o">[</span><span class="s2">"foo"</span><span class="p">,</span> <span class="s2">"bar"</span><span class="o">].</span><span class="n">map</span><span class="p">(</span><span class="o">&amp;</span><span class="n">splitjoin</span><span class="p">)</span>
<span class="c1"># =&gt; ["F O O", "B A R"]</span>

<span class="o">[</span><span class="s2">"foo"</span><span class="p">,</span> <span class="s2">"bar"</span><span class="o">].</span><span class="n">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:split</span> <span class="o">|</span> <span class="s1">''</span><span class="p">)</span>
<span class="c1"># =&gt; [["f", "o", "o"], ["b", "a", "r"]]</span>
</code></pre>
</div>
<p>Fun stuff. Ruby consistently amazes me with its expressiveness.</p></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;Groupon released an interesting extension to the &lt;strong&gt;Symbol#to_proc&lt;/strong&gt; method named &lt;a href='https://github.com/groupon/symbol_expressions'&gt;symbol_expressions&lt;/a&gt; over a year ago (I didn&amp;#8217;t notice it until recently). It allows you to compose procs based on combinations of existing methods. For instance, to split and then join strings:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;foo&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;bar&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;].&lt;/span&gt;&lt;span class='n'&gt;map&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='ss'&gt;:split&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;]+&lt;/span&gt;&lt;span class='ss'&gt;:join&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;_&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='c1'&gt;# =&amp;gt; [&amp;quot;f_o_o&amp;quot;, &amp;quot;b_a_r&amp;quot;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;I thought this was nifty, but the syntax is a bit odd (brackets are not generally used as argument list boundaries). Additionally, this sort of &lt;strong&gt;Proc&lt;/strong&gt; composition is something a &lt;strong&gt;Proc&lt;/strong&gt; should know how to create, but it doesn&amp;#8217;t make sense to have a &lt;strong&gt;Symbol&lt;/strong&gt; keeping track of a list of other &lt;strong&gt;Symbol&lt;/strong&gt;s that have been &amp;#8220;added&amp;#8221; to it (especially via an &lt;a href='https://github.com/groupon/symbol_expressions/blob/master/lib/symbol_expressions.rb#L80'&gt;internal array class&lt;/a&gt;). It just seems like a bit of a hack to have &lt;strong&gt;Symbols&lt;/strong&gt; acting as lists of other &lt;strong&gt;Symbol&lt;/strong&gt;s.&lt;/p&gt;

&lt;p&gt;Based on these ideas, I reduced the &lt;em&gt;symbol_expressions&lt;/em&gt; lib to the following lines:&lt;/p&gt;
&lt;script src='https://gist.github.com/989964.js'&gt;
&lt;/script&gt;
&lt;p&gt;With this little bit of code (which simply prefixes argument lists with a &lt;strong&gt;|&lt;/strong&gt; symbol), you can now do stuff like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='c1'&gt;# composition using Proc (rather than Symbols that have lists of Symbols in them)&lt;/span&gt;
&lt;span class='n'&gt;splitjoin&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='no'&gt;Proc&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;from_sym&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='ss'&gt;:split&lt;/span&gt; &lt;span class='o'&gt;|&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='ss'&gt;:join&lt;/span&gt; &lt;span class='o'&gt;|&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot; &amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='ss'&gt;:upcase&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='n'&gt;splitjoin&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;call&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;what&amp;quot;&lt;/span&gt;
&lt;span class='c1'&gt;# =&amp;gt; &amp;quot;W H A T&amp;quot;&lt;/span&gt;

&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;foo&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;bar&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;].&lt;/span&gt;&lt;span class='n'&gt;map&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;splitjoin&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='c1'&gt;# =&amp;gt; [&amp;quot;F O O&amp;quot;, &amp;quot;B A R&amp;quot;]&lt;/span&gt;

&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;foo&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;bar&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;].&lt;/span&gt;&lt;span class='n'&gt;map&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='ss'&gt;:split&lt;/span&gt; &lt;span class='o'&gt;|&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='c1'&gt;# =&amp;gt; [[&amp;quot;f&amp;quot;, &amp;quot;o&amp;quot;, &amp;quot;o&amp;quot;], [&amp;quot;b&amp;quot;, &amp;quot;a&amp;quot;, &amp;quot;r&amp;quot;]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Fun stuff. Ruby consistently amazes me with its expressiveness.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/TEQ-vhDhDmg" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/ruby/metaprogramming/2011/05/24/fun-with-ruby-symbol-expressions.html</feedburner:origLink></entry>
  
  <entry>
    <title>HiveSwarm: Additional User Defined Functions for Hive</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/74m89z3esY0/hiveswarm%3A-additional-user-defined-functions-for-hive.html" />
    <id>tag:findingscience.com:/hive/hadoop/2011/04/09/hiveswarm:-additional-user-defined-functions-for-hive</id>
    <updated>2011-04-09T16:17:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p>There are a number of user defined functions that would be quite useful in Hive but that have not been created and added to the library. Hive does provide the ability to define custom functions, but, as I’ve <a href="/hadoop/hive/2011/01/07/compiling-user-defined-functions-for-hive-on-hadoop.html">noted before</a>, the documentation is sparse and sometimes simply wrong. For instance, the instructions for createing a user defined table generating function (found <a href="http://wiki.apache.org/hadoop/Hive/DeveloperGuide/UDTF">here</a>) incorrectly show the <em>close</em> method calling <em>forward</em> which will cause an error when you try to run the function in even Hive 0.5.0.</p>

<p>In an effort to both collect useful functions that we are writing at <a href="http://livingsocial.com">LivingSocial</a> as well as to make the compiling process easier, we’ve created a new open source project on Github called <a href="https://github.com/livingsocial/HiveSwarm">HiveSwarm</a>. There are only a few functions there now, but more and more functions will be added over time.</p>
<table>
  <tr class="thead">
    <td>server</td>
    <td>page_load</td>
  </tr>
  <tr>
    <td>10.0.0.1</td><td>2011-04-01 10:01:01</td>
  </tr>
  <tr>
    <td>10.0.0.1</td><td>2011-04-01 10:01:05</td>
  </tr>
  <tr>
    <td>10.0.0.1</td><td>2011-04-01 10:03:00</td>
  </tr>
  <tr>
    <td>10.0.0.2</td><td>2011-04-01 10:01:02</td>
  </tr>
  <tr>
    <td>10.0.0.2</td><td>2011-04-01 10:01:05</td>
  </tr>
</table>
<p>One of the most useful new functions is called <strong>intervals</strong>. The function will generate a table with the intervals between values in an input table. For instance, let’s say you have a table that has one column for server IP addresses and another that has dates and times for page loads (shown in the table on the left). Imagine you wish to know the intervals between page loads per server.</p>

<p>After compiling <a href="https://github.com/livingsocial/HiveSwarm">HiveSwarm</a>, you can load the jar and add the function:</p>
<div class="highlight"><pre><code class="sql"><span class="k">add</span> <span class="n">jar</span> <span class="o">/</span><span class="n">path</span><span class="o">/</span><span class="k">to</span><span class="o">/</span><span class="n">HiveSwarm</span><span class="p">.</span><span class="n">jar</span><span class="p">;</span>
<span class="k">create</span> <span class="k">temporary</span> <span class="k">function</span> <span class="n">intervals</span> <span class="k">as</span> <span class="s1">'com.livingsocial.hive.udtf.Intervals'</span><span class="p">;</span>
</code></pre>
</div>
<p>Then, to select the intervals, just specify the grouping column and the column you wish to get intervals from:</p>
<div class="highlight"><pre><code class="sql"><span class="k">select</span> <span class="n">intervals</span><span class="p">(</span><span class="n">server</span><span class="p">,</span> <span class="n">page_load</span><span class="p">)</span> <span class="k">as</span> <span class="p">(</span><span class="n">server</span><span class="p">,</span> <span class="n">intervals</span><span class="p">)</span> <span class="k">from</span> <span class="n">server_page_loads</span><span class="p">;</span>
</code></pre>
</div>
<p>This will produce the results shown in the second table (with intervals in seconds).</p>
<table>
  <tr class="thead">
    <td>server</td>
    <td>intervals</td>
  </tr>
  <tr>
    <td>10.0.0.1</td><td>4.0</td>
  </tr>
  <tr>
    <td>10.0.0.1</td><td>115.0</td>
  </tr>
  <tr>
    <td>10.0.0.2</td><td>3.0</td>
  </tr>
</table>
<p>The column to pull intervals from can be either numeric or a string type. If it is a string, then it will be converted into a timestamp (so the resulting difference will be calculated in seconds). All numberical types (including timestamps from strings) will be converted into floats.</p>

<p>Pull requests are welcomed if you have a function you’d like to see added.</p>

<p>Additional information can be found on the <a href="https://github.com/livingsocial/HiveSwarm">github page</a>.</p></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;There are a number of user defined functions that would be quite useful in Hive but that have not been created and added to the library. Hive does provide the ability to define custom functions, but, as I&amp;#8217;ve &lt;a href='/hadoop/hive/2011/01/07/compiling-user-defined-functions-for-hive-on-hadoop.html'&gt;noted before&lt;/a&gt;, the documentation is sparse and sometimes simply wrong. For instance, the instructions for createing a user defined table generating function (found &lt;a href='http://wiki.apache.org/hadoop/Hive/DeveloperGuide/UDTF'&gt;here&lt;/a&gt;) incorrectly show the &lt;em&gt;close&lt;/em&gt; method calling &lt;em&gt;forward&lt;/em&gt; which will cause an error when you try to run the function in even Hive 0.5.0.&lt;/p&gt;

&lt;p&gt;In an effort to both collect useful functions that we are writing at &lt;a href='http://livingsocial.com'&gt;LivingSocial&lt;/a&gt; as well as to make the compiling process easier, we&amp;#8217;ve created a new open source project on Github called &lt;a href='https://github.com/livingsocial/HiveSwarm'&gt;HiveSwarm&lt;/a&gt;. There are only a few functions there now, but more and more functions will be added over time.&lt;/p&gt;
&lt;table&gt;
  &lt;tr class='thead'&gt;
    &lt;td&gt;server&lt;/td&gt;
    &lt;td&gt;page_load&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;10.0.0.1&lt;/td&gt;&lt;td&gt;2011-04-01 10:01:01&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;10.0.0.1&lt;/td&gt;&lt;td&gt;2011-04-01 10:01:05&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;10.0.0.1&lt;/td&gt;&lt;td&gt;2011-04-01 10:03:00&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;10.0.0.2&lt;/td&gt;&lt;td&gt;2011-04-01 10:01:02&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;10.0.0.2&lt;/td&gt;&lt;td&gt;2011-04-01 10:01:05&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;One of the most useful new functions is called &lt;strong&gt;intervals&lt;/strong&gt;. The function will generate a table with the intervals between values in an input table. For instance, let&amp;#8217;s say you have a table that has one column for server IP addresses and another that has dates and times for page loads (shown in the table on the left). Imagine you wish to know the intervals between page loads per server.&lt;/p&gt;

&lt;p&gt;After compiling &lt;a href='https://github.com/livingsocial/HiveSwarm'&gt;HiveSwarm&lt;/a&gt;, you can load the jar and add the function:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='sql'&gt;&lt;span class='k'&gt;add&lt;/span&gt; &lt;span class='n'&gt;jar&lt;/span&gt; &lt;span class='o'&gt;/&lt;/span&gt;&lt;span class='n'&gt;path&lt;/span&gt;&lt;span class='o'&gt;/&lt;/span&gt;&lt;span class='k'&gt;to&lt;/span&gt;&lt;span class='o'&gt;/&lt;/span&gt;&lt;span class='n'&gt;HiveSwarm&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;jar&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='k'&gt;create&lt;/span&gt; &lt;span class='k'&gt;temporary&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='n'&gt;intervals&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;com.livingsocial.hive.udtf.Intervals&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Then, to select the intervals, just specify the grouping column and the column you wish to get intervals from:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='sql'&gt;&lt;span class='k'&gt;select&lt;/span&gt; &lt;span class='n'&gt;intervals&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;server&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;page_load&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;server&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;intervals&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;from&lt;/span&gt; &lt;span class='n'&gt;server_page_loads&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This will produce the results shown in the second table (with intervals in seconds).&lt;/p&gt;
&lt;table&gt;
  &lt;tr class='thead'&gt;
    &lt;td&gt;server&lt;/td&gt;
    &lt;td&gt;intervals&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;10.0.0.1&lt;/td&gt;&lt;td&gt;4.0&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;10.0.0.1&lt;/td&gt;&lt;td&gt;115.0&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;10.0.0.2&lt;/td&gt;&lt;td&gt;3.0&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;The column to pull intervals from can be either numeric or a string type. If it is a string, then it will be converted into a timestamp (so the resulting difference will be calculated in seconds). All numberical types (including timestamps from strings) will be converted into floats.&lt;/p&gt;

&lt;p&gt;Pull requests are welcomed if you have a function you&amp;#8217;d like to see added.&lt;/p&gt;

&lt;p&gt;Additional information can be found on the &lt;a href='https://github.com/livingsocial/HiveSwarm'&gt;github page&lt;/a&gt;.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/74m89z3esY0" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/hive/hadoop/2011/04/09/hiveswarm%3A-additional-user-defined-functions-for-hive.html</feedburner:origLink></entry>
  
  <entry>
    <title>Statistical Analysis and A/B Testing (Correctly)</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/SzATApjgcHM/statistical-analysis-and-a-b-testing-correctly.html" />
    <id>tag:findingscience.com:/vanity/statistics/testing/2011/03/17/statistical-analysis-and-a-b-testing-correctly</id>
    <updated>2011-03-17T16:01:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p>We’ve been playing with <a href="http://vanity.labnotes.org">Vanity</a> recently at <a href="http://livingsocial.com">LivingSocial</a> and have found it to be generally useful. During a recent test, however, we saw the option listed as the “best choice” change with almost every dashboard page load. This should not generally be happening if the test for significance is implemented correctly. The first thing we did was add some numbers to the dashboard showing total views for each option and the number of track events for our conversion metric. That’s when I noticed a problem: what Vanity was claiming as a significant difference (at a 95% confidence level) wasn’t actually significant (based on a <a href="http://en.wikipedia.org/wiki/G-test">G-test</a>). After some digging in the <a href="https://github.com/assaf/vanity">source</a>, I found the following issues.</p>

<h2 id="issue_one_wrong_number_of_tails">Issue One: Wrong Number of Tails</h2>

<p>The first two issue relates to the way in which the two-proportion <a href="http://en.wikipedia.org/wiki/Z-test">Z-test</a> is implemented. Vanity links to <a href="http://20bits.com/articles/statistical-analysis-and-ab-testing/">this instructional post</a> on <a href="http://vanity.labnotes.org/ab_testing.html#interpret">their result interpretation page</a>, and I assume it was used as the basis for Vanity’s implementation. While I think there are a few things wrong with the post (see the next issue), I believe one of the biggest issues in Vanity is the impropper use of a <a href="http://en.wikipedia.org/wiki/Two-tailed_test">one-tailed test</a>. The instructional post on stats and ab-testing describes the correct use of the one-tailed test in the case where you have identified a “control” (presumably the original page) and a “experiment” page and want to only test whether the new page performs better than the old one. One-tailed tests are used in this sort of case, when one wants to know if a statistic from one defined group is greater than another defined group (say, case over control proportion).</p>

<p>Vanity, however, picks the second best performing group and then uses it as the “control” group in a one-tailed test to see whether the best group’s proportion is greater than the second best. This “control” group may be a different group on each dashboard page load. The result is a test to see whether the proportions are equal or not equal, as opposed to a test to see whether or not one specific proportion is greater than another specific proportion. Essentially, a one-tailed test is being used for a two-tailed hypothesis.</p>

<p>Why does this matter? Well, in our case, it mattered quite a bit. Vanity was calling a difference significant when it shouldn’t have been. The counts are in the following table.</p>
<table>
  <tr class="thead">
    <td>Group</td>
    <td>Viewed</td>
    <td>Converted</td>
  </tr>
  <tr>
    <td>A</td><td>409199</td><td>22399</td>
  </tr>
  <tr>
    <td>B</td><td>409351</td><td>22779</td>
  </tr>
</table>
<p>Vanity’s conclusion was:</p>

<pre><code>With 95% probability this result is statistically significant.</code></pre>

<p>For a one-tailed test, this conclusion is correct. For a two-tailed test, however, the confidence level is only 92.5% and is not significant. To see how far off the result is, the results of my <a href="http://en.wikipedia.org/wiki/G-test">G-test</a> produced a <a href="http://en.wikipedia.org/wiki/P-value">p-value</a> of 0.0721, which is not significant. Based on Vanity’s conclusion, though, we might have assumed a difference and then put in effort into making changes that would not have actually mattered.</p>

<p>Ultimately, what you generally want to know in A/B testing isn’t just want the post Vanity links to claims, i.e., “does A perform better than B.” What you actually want to know is “Does A perform better <strong>or worse</strong> than B”. These questions might seem equivalent, but they have very different implications in terms of choosing a statistical hypotheses and resulting test. The one-tailed test chosen by Vanity is only applicable when you want to specifically test whether or not some well-defined A performs better than a well-defined B. Not only is that not what an A/B tester probably wants to know (rather, they want to know “better or worse”), but the test itself is implemented incorrectly because the A vs B groups can flip back and forth depending on which is currently performing better at the time the dashboard is loaded.</p>

<p>These combined problems result in false positives in terms of identifying significant differences between proportions and can lead to wasted development time in terms of making unnecessary changes. Additionally, because the rate of false positives is high due to the incorrect implementation of a one-tailed test, Vanity will vacillate between calling an option significantly different and not.</p>

<h2 id="issue_two_wrong_test_application">Issue Two: Wrong Test Application</h2>

<p>The second issue is related to the <a href="http://en.wikipedia.org/wiki/Z-test">Z-test</a> itself. The implementation used in Vanity does not pool the sample proportion, which is necessary to produce the best estimate for sample variance. I’ll leave out an explanation as to why pooling the proportion produces a more accurate result (it’s rather involved), but I will say that it is trivial to modify existing code to use a pooled method. For those interested in learning more about the reasoning behind pooling proportions, more information can be found <a href="http://apcentral.collegeboard.com/apc/members/courses/teachers_corner/49013.html">here</a>.</p>

<h2 id="fix">Fix</h2>

<p>To fix the above issues, I’m going to <a href="https://github.com/livingsocial/vanity">fork Vanity</a> and switch to a completely different test. Since the result of an A/B test is categorical data, it’s perfect for a <a href="http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test#Test_of_independence">Pearson’s chi-square test of independence</a> or better yet a <a href="http://en.wikipedia.org/wiki/G-test">G-test</a>. Such a test will show the amount of difference, if any, between any number of testing variations. In addition, we will be adding more information about the extent of the difference, with a recommendation noting whether or not a user should continue running a test.</p>

<p>Our modifications will be available on <a href="https://github.com/livingsocial/vanity">github</a>.</p></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;We&amp;#8217;ve been playing with &lt;a href='http://vanity.labnotes.org'&gt;Vanity&lt;/a&gt; recently at &lt;a href='http://livingsocial.com'&gt;LivingSocial&lt;/a&gt; and have found it to be generally useful. During a recent test, however, we saw the option listed as the &amp;#8220;best choice&amp;#8221; change with almost every dashboard page load. This should not generally be happening if the test for significance is implemented correctly. The first thing we did was add some numbers to the dashboard showing total views for each option and the number of track events for our conversion metric. That&amp;#8217;s when I noticed a problem: what Vanity was claiming as a significant difference (at a 95% confidence level) wasn&amp;#8217;t actually significant (based on a &lt;a href='http://en.wikipedia.org/wiki/G-test'&gt;G-test&lt;/a&gt;). After some digging in the &lt;a href='https://github.com/assaf/vanity'&gt;source&lt;/a&gt;, I found the following issues.&lt;/p&gt;

&lt;h2 id='issue_one_wrong_number_of_tails'&gt;Issue One: Wrong Number of Tails&lt;/h2&gt;

&lt;p&gt;The first two issue relates to the way in which the two-proportion &lt;a href='http://en.wikipedia.org/wiki/Z-test'&gt;Z-test&lt;/a&gt; is implemented. Vanity links to &lt;a href='http://20bits.com/articles/statistical-analysis-and-ab-testing/'&gt;this instructional post&lt;/a&gt; on &lt;a href='http://vanity.labnotes.org/ab_testing.html#interpret'&gt;their result interpretation page&lt;/a&gt;, and I assume it was used as the basis for Vanity&amp;#8217;s implementation. While I think there are a few things wrong with the post (see the next issue), I believe one of the biggest issues in Vanity is the impropper use of a &lt;a href='http://en.wikipedia.org/wiki/Two-tailed_test'&gt;one-tailed test&lt;/a&gt;. The instructional post on stats and ab-testing describes the correct use of the one-tailed test in the case where you have identified a &amp;#8220;control&amp;#8221; (presumably the original page) and a &amp;#8220;experiment&amp;#8221; page and want to only test whether the new page performs better than the old one. One-tailed tests are used in this sort of case, when one wants to know if a statistic from one defined group is greater than another defined group (say, case over control proportion).&lt;/p&gt;

&lt;p&gt;Vanity, however, picks the second best performing group and then uses it as the &amp;#8220;control&amp;#8221; group in a one-tailed test to see whether the best group&amp;#8217;s proportion is greater than the second best. This &amp;#8220;control&amp;#8221; group may be a different group on each dashboard page load. The result is a test to see whether the proportions are equal or not equal, as opposed to a test to see whether or not one specific proportion is greater than another specific proportion. Essentially, a one-tailed test is being used for a two-tailed hypothesis.&lt;/p&gt;

&lt;p&gt;Why does this matter? Well, in our case, it mattered quite a bit. Vanity was calling a difference significant when it shouldn&amp;#8217;t have been. The counts are in the following table.&lt;/p&gt;
&lt;table&gt;
  &lt;tr class='thead'&gt;
    &lt;td&gt;Group&lt;/td&gt;
    &lt;td&gt;Viewed&lt;/td&gt;
    &lt;td&gt;Converted&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;A&lt;/td&gt;&lt;td&gt;409199&lt;/td&gt;&lt;td&gt;22399&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;B&lt;/td&gt;&lt;td&gt;409351&lt;/td&gt;&lt;td&gt;22779&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;Vanity&amp;#8217;s conclusion was:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;With 95% probability this result is statistically significant.&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For a one-tailed test, this conclusion is correct. For a two-tailed test, however, the confidence level is only 92.5% and is not significant. To see how far off the result is, the results of my &lt;a href='http://en.wikipedia.org/wiki/G-test'&gt;G-test&lt;/a&gt; produced a &lt;a href='http://en.wikipedia.org/wiki/P-value'&gt;p-value&lt;/a&gt; of 0.0721, which is not significant. Based on Vanity&amp;#8217;s conclusion, though, we might have assumed a difference and then put in effort into making changes that would not have actually mattered.&lt;/p&gt;

&lt;p&gt;Ultimately, what you generally want to know in A/B testing isn&amp;#8217;t just want the post Vanity links to claims, i.e., &amp;#8220;does A perform better than B.&amp;#8221; What you actually want to know is &amp;#8220;Does A perform better &lt;strong&gt;or worse&lt;/strong&gt; than B&amp;#8221;. These questions might seem equivalent, but they have very different implications in terms of choosing a statistical hypotheses and resulting test. The one-tailed test chosen by Vanity is only applicable when you want to specifically test whether or not some well-defined A performs better than a well-defined B. Not only is that not what an A/B tester probably wants to know (rather, they want to know &amp;#8220;better or worse&amp;#8221;), but the test itself is implemented incorrectly because the A vs B groups can flip back and forth depending on which is currently performing better at the time the dashboard is loaded.&lt;/p&gt;

&lt;p&gt;These combined problems result in false positives in terms of identifying significant differences between proportions and can lead to wasted development time in terms of making unnecessary changes. Additionally, because the rate of false positives is high due to the incorrect implementation of a one-tailed test, Vanity will vacillate between calling an option significantly different and not.&lt;/p&gt;

&lt;h2 id='issue_two_wrong_test_application'&gt;Issue Two: Wrong Test Application&lt;/h2&gt;

&lt;p&gt;The second issue is related to the &lt;a href='http://en.wikipedia.org/wiki/Z-test'&gt;Z-test&lt;/a&gt; itself. The implementation used in Vanity does not pool the sample proportion, which is necessary to produce the best estimate for sample variance. I&amp;#8217;ll leave out an explanation as to why pooling the proportion produces a more accurate result (it&amp;#8217;s rather involved), but I will say that it is trivial to modify existing code to use a pooled method. For those interested in learning more about the reasoning behind pooling proportions, more information can be found &lt;a href='http://apcentral.collegeboard.com/apc/members/courses/teachers_corner/49013.html'&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id='fix'&gt;Fix&lt;/h2&gt;

&lt;p&gt;To fix the above issues, I&amp;#8217;m going to &lt;a href='https://github.com/livingsocial/vanity'&gt;fork Vanity&lt;/a&gt; and switch to a completely different test. Since the result of an A/B test is categorical data, it&amp;#8217;s perfect for a &lt;a href='http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test#Test_of_independence'&gt;Pearson&amp;#8217;s chi-square test of independence&lt;/a&gt; or better yet a &lt;a href='http://en.wikipedia.org/wiki/G-test'&gt;G-test&lt;/a&gt;. Such a test will show the amount of difference, if any, between any number of testing variations. In addition, we will be adding more information about the extent of the difference, with a recommendation noting whether or not a user should continue running a test.&lt;/p&gt;

&lt;p&gt;Our modifications will be available on &lt;a href='https://github.com/livingsocial/vanity'&gt;github&lt;/a&gt;.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/SzATApjgcHM" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/vanity/statistics/testing/2011/03/17/statistical-analysis-and-a-b-testing-correctly.html</feedburner:origLink></entry>
  
  <entry>
    <title>Joshua Davis on Watson's Face</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/aoaJPilBRRM/joshua-davis-on-watsons-face.html" />
    <id>tag:findingscience.com:/design/interfaces/2011/02/17/joshua-davis-on-watsons-face</id>
    <updated>2011-02-17T10:40:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p>The following is a video describing the process of creating the voice and image of <a href="http://en.wikipedia.org/wiki/Watson_%28artificial_intelligence_software%29">Watson</a>. The excellent techie/designer <a href="http://en.wikipedia.org/wiki/Joshua_Davis_%28web_designer%29">Joshua Davis</a> was tapped to create Watson’s face - his methods and process are pretty interesting.</p>
<iframe title="YouTube video player" src="http://www.youtube.com/embed/WIKM732oEek" frameborder="0" height="390" width="640">
</iframe></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;The following is a video describing the process of creating the voice and image of &lt;a href='http://en.wikipedia.org/wiki/Watson_%28artificial_intelligence_software%29'&gt;Watson&lt;/a&gt;. The excellent techie/designer &lt;a href='http://en.wikipedia.org/wiki/Joshua_Davis_%28web_designer%29'&gt;Joshua Davis&lt;/a&gt; was tapped to create Watson&amp;#8217;s face - his methods and process are pretty interesting.&lt;/p&gt;
&lt;iframe title='YouTube video player' src='http://www.youtube.com/embed/WIKM732oEek' frameborder='0' height='390' width='640'&gt;
&lt;/iframe&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/aoaJPilBRRM" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/design/interfaces/2011/02/17/joshua-davis-on-watsons-face.html</feedburner:origLink></entry>
  
  <entry>
    <title>Memcache Memoizing in Ruby</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/nxbzbX0tC00/memcache-memoizing-in-ruby.html" />
    <id>tag:findingscience.com:/ruby/memcache/2011/01/26/memcache-memoizing-in-ruby</id>
    <updated>2011-01-26T15:48:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p>I realized recently with some amount of astonishment that the <a href="https://rubygems.org/gems/memcache">ruby memcache gem</a> does not have the ability to <a href="http://en.wikipedia.org/wiki/Memoization">memoize</a> values. In the general case of memoization, it can be suffienct to simply “remember” values by setting some sort of static class variable. In the case, however, where you have multiple machines which may all need to calculate the same value, it is better to have a single location for the collective memory so that if one machine has recently calcuated it all of the other machines can use the result of that calcuation. Enter <a href="http://memcached.org/">memcached</a>.</p>

<p>I’ve written some very simple code to provide memoization within the <strong>MemCache</strong> client class. There are two additional methods - one which allows simple memoization and the other allows you to wrap the key and age for the value into a Proc that can then be called. Here’s the code:</p>
<div class="highlight"><pre><code class="ruby"><span class="nb">require</span> <span class="s1">'memcache'</span>

<span class="k">class</span> <span class="nc">MemCache</span>
  <span class="k">def</span> <span class="nf">memoize</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">age</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
    <span class="n">value</span> <span class="o">=</span> <span class="n">get</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">value</span><span class="o">.</span><span class="n">nil?</span> <span class="ow">and</span> <span class="nb">block_given?</span>
      <span class="n">value</span> <span class="o">=</span> <span class="k">yield</span>
      <span class="n">add</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">age</span><span class="p">)</span>
    <span class="k">end</span>
    <span class="n">value</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">memoize_proc</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">age</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">block</span><span class="p">)</span>
    <span class="no">Proc</span><span class="o">.</span><span class="n">new</span> <span class="p">{</span> <span class="n">memoize</span> <span class="n">key</span><span class="p">,</span> <span class="n">age</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">block</span> <span class="p">}</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre>
</div>
<p>Then, if there’s some expensive code that is run quite a bit within a method, you can easily memoize the result:</p>
<div class="highlight"><pre><code class="ruby"><span class="n">m</span> <span class="o">=</span> <span class="no">MemCache</span><span class="o">.</span><span class="n">new</span> <span class="s1">'localhost'</span>

<span class="k">class</span> <span class="nc">Record</span>
  <span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">lookup</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span>
    <span class="n">m</span><span class="o">.</span><span class="n">memoize</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span> <span class="p">{</span> 
      <span class="c1"># perform intensive lookup here - as example, return some dummy data</span>
      <span class="p">{</span> <span class="ss">:name</span> <span class="o">=&gt;</span> <span class="s1">'value'</span><span class="p">,</span> <span class="ss">:age</span> <span class="o">=&gt;</span> <span class="s1">'20'</span> <span class="p">}</span>
    <span class="p">}</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre>
</div>
<p>If you need to fetch the result frequently and don’t want to keep calling the memoize method with the key name and age, you can have your code wrapped in a Proc:</p>
<div class="highlight"><pre><code class="ruby"><span class="n">m</span> <span class="o">=</span> <span class="no">MemCache</span><span class="o">.</span><span class="n">new</span> <span class="s1">'localhost'</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">m</span><span class="o">.</span><span class="n">memoize_proc</span><span class="p">(</span><span class="s1">'key'</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1"># this is your computationally intensive part</span>
  <span class="s2">"something hard to compute"</span>
<span class="p">}</span>

<span class="nb">puts</span> <span class="n">f</span><span class="o">.</span><span class="n">call</span>  <span class="c1"># calls codeblock, stores result in memcache</span>
<span class="nb">puts</span> <span class="n">f</span><span class="o">.</span><span class="n">call</span>  <span class="c1"># gets result from memcache</span>
</code></pre>
</div>
<p>There are probably other examples of this out there - but since I was doing it anyway I thought I’d put it up on the tubes.</p></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;I realized recently with some amount of astonishment that the &lt;a href='https://rubygems.org/gems/memcache'&gt;ruby memcache gem&lt;/a&gt; does not have the ability to &lt;a href='http://en.wikipedia.org/wiki/Memoization'&gt;memoize&lt;/a&gt; values. In the general case of memoization, it can be suffienct to simply &amp;#8220;remember&amp;#8221; values by setting some sort of static class variable. In the case, however, where you have multiple machines which may all need to calculate the same value, it is better to have a single location for the collective memory so that if one machine has recently calcuated it all of the other machines can use the result of that calcuation. Enter &lt;a href='http://memcached.org/'&gt;memcached&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve written some very simple code to provide memoization within the &lt;strong&gt;MemCache&lt;/strong&gt; client class. There are two additional methods - one which allows simple memoization and the other allows you to wrap the key and age for the value into a Proc that can then be called. Here&amp;#8217;s the code:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='nb'&gt;require&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;memcache&amp;#39;&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;MemCache&lt;/span&gt;
  &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;memoize&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;key&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;age&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='n'&gt;value&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;get&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;key&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='n'&gt;value&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nil?&lt;/span&gt; &lt;span class='ow'&gt;and&lt;/span&gt; &lt;span class='nb'&gt;block_given?&lt;/span&gt;
      &lt;span class='n'&gt;value&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;yield&lt;/span&gt;
      &lt;span class='n'&gt;add&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;key&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;value&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;age&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='k'&gt;end&lt;/span&gt;
    &lt;span class='n'&gt;value&lt;/span&gt;
  &lt;span class='k'&gt;end&lt;/span&gt;

  &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;memoize_proc&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;key&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;age&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;block&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='no'&gt;Proc&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;new&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='n'&gt;memoize&lt;/span&gt; &lt;span class='n'&gt;key&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;age&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;block&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;
  &lt;span class='k'&gt;end&lt;/span&gt;
&lt;span class='k'&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Then, if there&amp;#8217;s some expensive code that is run quite a bit within a method, you can easily memoize the result:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='n'&gt;m&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='no'&gt;MemCache&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;new&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;localhost&amp;#39;&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;Record&lt;/span&gt;
  &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nc'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='nf'&gt;lookup&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;id&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='n'&gt;m&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;memoize&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;id&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;10&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt; 
      &lt;span class='c1'&gt;# perform intensive lookup here - as example, return some dummy data&lt;/span&gt;
      &lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='ss'&gt;:name&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;value&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='ss'&gt;:age&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;20&amp;#39;&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
  &lt;span class='k'&gt;end&lt;/span&gt;
&lt;span class='k'&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;If you need to fetch the result frequently and don&amp;#8217;t want to keep calling the memoize method with the key name and age, you can have your code wrapped in a Proc:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='n'&gt;m&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='no'&gt;MemCache&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;new&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;localhost&amp;#39;&lt;/span&gt;
&lt;span class='n'&gt;f&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;m&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;memoize_proc&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;key&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;10&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
  &lt;span class='c1'&gt;# this is your computationally intensive part&lt;/span&gt;
  &lt;span class='s2'&gt;&amp;quot;something hard to compute&amp;quot;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nb'&gt;puts&lt;/span&gt; &lt;span class='n'&gt;f&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;call&lt;/span&gt;  &lt;span class='c1'&gt;# calls codeblock, stores result in memcache&lt;/span&gt;
&lt;span class='nb'&gt;puts&lt;/span&gt; &lt;span class='n'&gt;f&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;call&lt;/span&gt;  &lt;span class='c1'&gt;# gets result from memcache&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;There are probably other examples of this out there - but since I was doing it anyway I thought I&amp;#8217;d put it up on the tubes.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/nxbzbX0tC00" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/ruby/memcache/2011/01/26/memcache-memoizing-in-ruby.html</feedburner:origLink></entry>
  
  <entry>
    <title>Sending Email with Python Twisted</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FindingScience/~3/-qDUpsHYcQ4/sending-email-with-python-twisted.html" />
    <id>tag:findingscience.com:/python/twisted/2011/01/22/sending-email-with-python-twisted</id>
    <updated>2011-01-22T13:21:00Z</updated>

    <author>
      <name>Brian Muller</name>
      <uri>http://findingscience.com</uri>
      <email>bamuller@gmail.com</email>
    </author>

    <summary><p><a href="http://twistedmatrix.com">Twisted</a> is an great asynchronous networking library for Python. I was looking for an example of using it to send mail locally, but most examples assume you have access to an SMTP server that you can send through. After some trial and error, I figured out a method that seems to work based on connecting directly to each recipient’s mail host.</p>

<p>The following is an example that does not assume that you have access to an SMTP server. It first resolves the DNS MX record for the recipient’s domain and then attempts to connect to it to send the message:</p>
<div class="highlight"><pre><code class="python"><span class="kn">from</span> <span class="nn">twisted.internet</span> <span class="kn">import</span> <span class="n">defer</span>
<span class="kn">from</span> <span class="nn">twisted.mail</span> <span class="kn">import</span> <span class="n">smtp</span><span class="p">,</span> <span class="n">relaymanager</span>
<span class="kn">from</span> <span class="nn">twisted.internet</span> <span class="kn">import</span> <span class="n">reactor</span>
<span class="kn">from</span> <span class="nn">cStringIO</span> <span class="kn">import</span> <span class="n">StringIO</span>

<span class="n">MXCALCULATOR</span> <span class="o">=</span> <span class="n">relaymanager</span><span class="o">.</span><span class="n">MXCalculator</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">getMailExchange</span><span class="p">(</span><span class="n">host</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">cbMX</span><span class="p">(</span><span class="n">mxRecord</span><span class="p">):</span>
        <span class="k">return</span> <span class="nb">str</span><span class="p">(</span><span class="n">mxRecord</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">MXCALCULATOR</span><span class="o">.</span><span class="n">getMX</span><span class="p">(</span><span class="n">host</span><span class="p">)</span><span class="o">.</span><span class="n">addCallback</span><span class="p">(</span><span class="n">cbMX</span><span class="p">)</span>

<span class="k">def</span> <span class="nf">sendEmail</span><span class="p">(</span><span class="n">mailFrom</span><span class="p">,</span> <span class="n">mailTo</span><span class="p">,</span> <span class="n">msg</span><span class="p">,</span> <span class="n">subject</span><span class="o">=</span><span class="s">""</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">dosend</span><span class="p">(</span><span class="n">host</span><span class="p">):</span>
        <span class="k">print</span> <span class="s">"emailing </span><span class="si">%s</span><span class="s"> (using host </span><span class="si">%s</span><span class="s">) from </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">mailTo</span><span class="p">,</span> <span class="n">host</span><span class="p">,</span> <span class="n">mailFrom</span><span class="p">)</span>
	<span class="n">mstring</span> <span class="o">=</span> <span class="s">"From: </span><span class="si">%s</span><span class="se">\n</span><span class="s">To: </span><span class="si">%s</span><span class="se">\n</span><span class="s">Subject: </span><span class="si">%s</span><span class="se">\n\n</span><span class="si">%s</span><span class="se">\n</span><span class="s">"</span>
	<span class="n">msgfile</span> <span class="o">=</span> <span class="n">StringIO</span><span class="p">(</span><span class="n">mstring</span> <span class="o">%</span> <span class="p">(</span><span class="n">mailFrom</span><span class="p">,</span> <span class="n">mailTo</span><span class="p">,</span> <span class="n">subject</span><span class="p">,</span> <span class="n">msg</span><span class="p">))</span>
	<span class="n">d</span> <span class="o">=</span> <span class="n">defer</span><span class="o">.</span><span class="n">Deferred</span><span class="p">()</span>
        <span class="n">factory</span> <span class="o">=</span> <span class="n">smtp</span><span class="o">.</span><span class="n">ESMTPSenderFactory</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="n">mailFrom</span><span class="p">,</span> <span class="n">mailTo</span><span class="p">,</span> <span class="n">msgfile</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span>
                                          <span class="n">requireAuthentication</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> 	
					  <span class="n">requireTransportSecurity</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
        <span class="n">reactor</span><span class="o">.</span><span class="n">connectTCP</span><span class="p">(</span><span class="n">host</span><span class="p">,</span> <span class="mi">25</span><span class="p">,</span> <span class="n">factory</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">d</span>
    <span class="k">return</span> <span class="n">getMailExchange</span><span class="p">(</span><span class="n">mailTo</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">"@"</span><span class="p">)[</span><span class="mi">1</span><span class="p">])</span><span class="o">.</span><span class="n">addCallback</span><span class="p">(</span><span class="n">dosend</span><span class="p">)</span>


<span class="n">d</span> <span class="o">=</span> <span class="n">sendEmail</span><span class="p">(</span><span class="s">'YOU@localhost'</span><span class="p">,</span> <span class="s">'TO ADDY'</span><span class="p">,</span> <span class="s">'this is a message'</span><span class="p">,</span> <span class="s">'this is a test subject'</span><span class="p">)</span>
<span class="n">d</span><span class="o">.</span><span class="n">addCallback</span><span class="p">(</span><span class="k">lambda</span> <span class="n">_</span><span class="p">:</span> <span class="n">reactor</span><span class="o">.</span><span class="n">stop</span><span class="p">())</span>

<span class="n">reactor</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
</code></pre>
</div>
<p>You could use this code to send a crap ton of emails all asynchronously. Fun stuff.</p></summary>
    <content type="html" xml:lang="en" xml:base="http://findingscience.com/">
      &lt;p&gt;&lt;a href='http://twistedmatrix.com'&gt;Twisted&lt;/a&gt; is an great asynchronous networking library for Python. I was looking for an example of using it to send mail locally, but most examples assume you have access to an SMTP server that you can send through. After some trial and error, I figured out a method that seems to work based on connecting directly to each recipient&amp;#8217;s mail host.&lt;/p&gt;

&lt;p&gt;The following is an example that does not assume that you have access to an SMTP server. It first resolves the DNS MX record for the recipient&amp;#8217;s domain and then attempts to connect to it to send the message:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='kn'&gt;from&lt;/span&gt; &lt;span class='nn'&gt;twisted.internet&lt;/span&gt; &lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='n'&gt;defer&lt;/span&gt;
&lt;span class='kn'&gt;from&lt;/span&gt; &lt;span class='nn'&gt;twisted.mail&lt;/span&gt; &lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='n'&gt;smtp&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;relaymanager&lt;/span&gt;
&lt;span class='kn'&gt;from&lt;/span&gt; &lt;span class='nn'&gt;twisted.internet&lt;/span&gt; &lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='n'&gt;reactor&lt;/span&gt;
&lt;span class='kn'&gt;from&lt;/span&gt; &lt;span class='nn'&gt;cStringIO&lt;/span&gt; &lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='n'&gt;StringIO&lt;/span&gt;

&lt;span class='n'&gt;MXCALCULATOR&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;relaymanager&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;MXCalculator&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;getMailExchange&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;host&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;cbMX&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;mxRecord&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nb'&gt;str&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;mxRecord&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;MXCALCULATOR&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;getMX&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;host&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;addCallback&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;cbMX&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;

&lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;sendEmail&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;mailFrom&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;mailTo&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;msg&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;subject&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;dosend&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;host&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
        &lt;span class='k'&gt;print&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;emailing &lt;/span&gt;&lt;span class='si'&gt;%s&lt;/span&gt;&lt;span class='s'&gt; (using host &lt;/span&gt;&lt;span class='si'&gt;%s&lt;/span&gt;&lt;span class='s'&gt;) from &lt;/span&gt;&lt;span class='si'&gt;%s&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;&lt;/span&gt; &lt;span class='o'&gt;%&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;mailTo&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;host&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;mailFrom&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
	&lt;span class='n'&gt;mstring&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;From: &lt;/span&gt;&lt;span class='si'&gt;%s&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s'&gt;To: &lt;/span&gt;&lt;span class='si'&gt;%s&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s'&gt;Subject: &lt;/span&gt;&lt;span class='si'&gt;%s&lt;/span&gt;&lt;span class='se'&gt;\n\n&lt;/span&gt;&lt;span class='si'&gt;%s&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;&lt;/span&gt;
	&lt;span class='n'&gt;msgfile&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;StringIO&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;mstring&lt;/span&gt; &lt;span class='o'&gt;%&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;mailFrom&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;mailTo&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;subject&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;msg&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
	&lt;span class='n'&gt;d&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;defer&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Deferred&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
        &lt;span class='n'&gt;factory&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;smtp&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;ESMTPSenderFactory&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;None&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='bp'&gt;None&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;mailFrom&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;mailTo&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;msgfile&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;d&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
                                          &lt;span class='n'&gt;requireAuthentication&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='bp'&gt;False&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; 	
					  &lt;span class='n'&gt;requireTransportSecurity&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='bp'&gt;False&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
        &lt;span class='n'&gt;reactor&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;connectTCP&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;host&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;25&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;factory&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;d&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;getMailExchange&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;mailTo&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;split&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;@&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)[&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;])&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;addCallback&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;dosend&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;


&lt;span class='n'&gt;d&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;sendEmail&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;#39;YOU@localhost&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s'&gt;&amp;#39;TO ADDY&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s'&gt;&amp;#39;this is a message&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s'&gt;&amp;#39;this is a test subject&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='n'&gt;d&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;addCallback&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;lambda&lt;/span&gt; &lt;span class='n'&gt;_&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt; &lt;span class='n'&gt;reactor&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stop&lt;/span&gt;&lt;span class='p'&gt;())&lt;/span&gt;

&lt;span class='n'&gt;reactor&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;run&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;You could use this code to send a crap ton of emails all asynchronously. Fun stuff.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/FindingScience/~4/-qDUpsHYcQ4" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://findingscience.com/python/twisted/2011/01/22/sending-email-with-python-twisted.html</feedburner:origLink></entry>
  
</feed>

