<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>X-Combinator</title>
	
	<link>http://www.xcombinator.com</link>
	<description>making the human scalable</description>
	<lastBuildDate>Mon, 06 Sep 2010 20:18:14 +0000</lastBuildDate>
	
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/xcombinator" /><feedburner:info uri="xcombinator" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>a crawler using wget and xargs</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/6p9i5Z4sOQQ/</link>
		<comments>http://www.xcombinator.com/2010/09/06/a-crawler-using-wget-and-xargs/#comments</comments>
		<pubDate>Mon, 06 Sep 2010 20:18:14 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[crawling]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/?p=252</guid>
		<description><![CDATA[How long would it take to crawl a billion pages using wget and xargs?
We&#8217;re on a quest to write a scalable web crawler.  Our goal is to build a web crawler that will download a billion pages a week.  We&#8217;ve calculated that to download a billion pages in a week we need to [...]]]></description>
			<content:encoded><![CDATA[<h2>How long would it take to crawl a billion pages using <code>wget</code> and <code>xargs</code>?</h2>
<p>We&#8217;re on a quest to write a scalable web crawler.  Our goal is to build a web crawler that will download a billion pages a week.  We&#8217;ve calculated that to download a billion pages in a week we need to sustain a rate of <em>1653 pages per second</em> . </p>
<p>The problem with these kinds of numbers is that, unless you are familiar with web-crawling, it is not obvious how fast that really is. How fast  can a simple crawler go? 10 pages per second? A thousand?  </p>
<p>We set out to benchmark the simplest thing that could possibly work: <code>wget</code> and <code>xargs</code>.</p>
<h2>Our Tools</h2>
<p><a href="http://en.wikipedia.org/wiki/Wget"><code>wget</code></a> is a popular tool used for downloading files from the web. It has a flexible set of options and built in support for crawling. </p>
<p><a href="http://en.wikipedia.org/wiki/Xargs"><code>xargs</code></a> is used to run a command repeatedly over a given set of inputs.  In our case, we&#8217;re using a fixed URL list as our input. We use <code>xargs</code> as our &#8220;thread-pool&#8221; (it&#8217;s actually a &#8220;process-pool&#8221;).  Using the <code>-P &lt;numprocs&gt;</code> option. <code>xargs</code> will run through the input file of URLs and each wget process will take a URL off the stack and run until it finishes the crawl for that domain. The number of concurrent processes is limited by <code>&lt;numprocs&gt;</code>.</p>
<h2>Napkin Calculations</h2>
<p>Before we actually run our jobs, let&#8217;s try to predict the kind of results we&#8217;ll get. I&#8217;m running the jobs on my MacBookPro Intel Core 2 Duo with 4GB RAM. I&#8217;m on my home network where I have AT&amp;T U-Verse with advertised download speed of 18Mbps (mega <em>bits</em> per second).</p>
<p><code>wget</code> measures rate limiting in kilobytes rather than kilobits, so we&#8217;ll use bytes rather than bits:</p>
<pre><code>18Mbps = 2.25 megabytes per second =~ 2300 kilobytes/s
</code></pre>
<p>We&#8217;re just doing rough calculations at this point, so lets just guess that the mean size of each page is 10KB. At this page size, the absolute best number we can expect to get is around 230 pages/second before we saturate my connection.</p>
<h2>Politeness</h2>
<p>While we want our crawler as a system to go as fast as possible, we don&#8217;t want to hit any one server too many times. Not only might we get banned, but we it isn&#8217;t kind to the site owners. Many smaller servers can&#8217;t handle the load of a crawler thrown against it at full speed. </p>
<p>So if we want to get to 200+ pages/sec we&#8217;re going to have to have many concurrent connections. <code>wget</code> has a number of options that we can set to be nicer to each individual server. So our strategy will be to crawl many servers concurrently, but only hit a particular server lightly.</p>
<p>Here are a few of the relevant <code>wget</code> options we will set:</p>
<ul>
<li><code>--wait=2</code> and <code>--random-wait</code> &#8211; wait a random amount of time between requests averaging 2 seconds. The waiting time is for the servers benefit but the random time is for ours. Given that we are going to be running a large number of processes in parallel, we&#8217;d rather have them be out of step with each other.</li>
<li><code>--tries=5</code> &#8211; only retry 5 times</li>
<li><code>--timestamping</code> &#8211; If the file exists on disk, send the server a HEAD request and check the <code>Last-Modified</code> header. If the file on disk has a timestamp greater than or equal to the <code>Last-Modified</code> date, we don&#8217;t request the whole page. This extra <code>HEAD</code> request doesn&#8217;t really slow us down because <code>wget</code> will only request it if the file already exists on disk. This is just a little extra protection in case our separate processes start to visit the same sites.</li>
</ul>
<h2>DMOZ Sample Set</h2>
<p>For multiple runs of our test we don&#8217;t want to hit one particular server repeatedly. We&#8217;re going to use <a href="http://www.dmoz.org/">DMOZ</a> to get a random sample of URLs to test and use a few commands to extract some random URLs:</p>
<pre><code>mkdir -p data/dmoz
curl -0 http://rdf.dmoz.org/rdf/content.rdf.u8.gz &gt; data/dmoz/dmoz-content.rdf.u8.gz
cd data/dmoz
unzip data/dmoz/dmoz-content.rdf.u8.gz
cat dmoz-content.rdf.u8 | grep http | grep r:resource | \
    grep -o '&lt;link r:resource=['"'"'"][^"'"'"']*['"'"'"]' | \
    sed -e 's/^&lt;link r:resource=["'"'"']//' -e 's/["'"'"']$//' \
    &gt; urls.txt
ruby random-lines.rb urls.txt 300 &gt; random-urls.txt
</code></pre>
<blockquote>
<p>(You can get the <a href="http://gist.github.com/raw/262758/1c981f2c77d32614da8bfcfe36366a19fccfea4a/random-lines.rb"><code>random-lines.rb</code> script here</a>).</p>
</blockquote>
<p>The DMOZ file is around 300MB so this will take a few minutes. The DMOZ RDF file is well formed, so we&#8217;re just using <code>grep</code> and <code>sed</code> to extract the URLs.</p>
<h2>Shaping <code>wget</code></h2>
<p>Our <code>wget</code> command is below. You can see we aren&#8217;t trying very hard to access a page that doesn&#8217;t respond quickly (the various <code>timeout</code> options). Also, we&#8217;re only looking 5 pages deep per URL. We are not visiting any &#8220;parent&#8221; pages, that is, we&#8217;re not crawling up any directories. We don&#8217;t want any images or binary files (the <code>reject</code>) options and we don&#8217;t care about invalid SSL certificates (<code>no-check-certificate</code>).</p>
<pre><code>wget \
  --tries=5 \
  --dns-timeout=30 \
  --connect-timeout=5 \
  --read-timeout=5 \
  --timestamping \
  --directory-prefix=data/pages \
  --wait=2 \
  --random-wait \
  --recursive \
  --level=5 \
  --no-parent \
  --no-verbose \
  --reject *.jpg --reject *.gif \
  --reject *.png --reject *.css \
  --reject *.pdf --reject *.bz2 \
  --reject *.gz  --reject *.zip \
  --reject *.mov --reject *.fla \
  --reject *.xml \
  --no-check-certificate
</code></pre>
<h2>DNS</h2>
<p>A nice thing about our setup is that each <code>wget</code> process is assigned to one domain. <code>wget</code> caches the DNS lookup so we only need to make one DNS request per process. A problem with this setup is that <code>wget</code> uses <code>gethostbyname</code> (or <code>getaddrinfo</code> depending on your platform). A quick check on <code>man gethostbyname</code> shows that on my BSD-based Mac <code>gethostbyname</code> is thread-safe e.g. it is synchronized. The result is that there is going to be some resource starvation when we have hundreds of <code>wget</code> processes all trying to call BIND all at the same time.  <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.6331">1</a></p>
<p>We set a DNS timeout of 30 seconds here, but in practice I found that it didn&#8217;t matter much. All of the processes race to grab the DNS lookup lock at the beginning, a large number time-out (waiting for the lock), but the requests even out over a couple of minutes.</p>
<h2>xargs</h2>
<p><code>xargs</code> acts as our thread-pool or, more specifically, our process-pool.   </p>
<p><code>-P</code> specifies the number of processes to use. <code>-I &lt;sub&gt;</code> is a substitution parameter. It means &#8220;for each line in <code>STDIN</code> run <code>CMD</code> substituting the current line for <code>&lt;sub&gt;</code>&#8220;. So below we substitute <code>_URL_</code> with the actual URL contained in the <code>URLS_FILE</code>. </p>
<pre><code>cat $URLS_FILE | xargs -P $CRAWLERS -I _URL_ $WGET_CMD _URL_
</code></pre>
<h2>Code</h2>
<p>Our crawler script looks like this:</p>
<pre><code>#!/bin/bash
# a basic crawler in bash
# usage: crawl.sh urlfile.txt &lt;num-procs&gt;
URLS_FILE=$1
CRAWLERS=$2

mkdir -p data/pages

WGET_CMD="wget \
  --tries=5 \
  --dns-timeout=30 \
  --connect-timeout=5 \
  --read-timeout=5 \
  --timestamping \
  --directory-prefix=data/pages \
  --wait=2 \
  --random-wait \
  --recursive \
  --level=5 \
  --no-parent \
  --no-verbose \
  --reject *.jpg --reject *.gif \
  --reject *.png --reject *.css \
  --reject *.pdf --reject *.bz2 \
  --reject *.gz  --reject *.zip \
  --reject *.mov --reject *.fla \
  --reject *.xml \
  --no-check-certificate"

cat $URLS_FILE | xargs  -P $CRAWLERS -I _URL_ $WGET_CMD _URL_
</code></pre>
<p>I&#8217;ve put this <a href="http://github.com/jashmenn/bashpider">code on github</a> with a <code>Rakefile</code> so you can follow along.</p>
<pre><code>git clone git://github.com/jashmenn/bashpider.git
cd bashpider
rake data:get_urls # downloads and parses DMOZ, will take a while
rake crawl:restart # this will run a crawl
</code></pre>
<p>If you want to monitor the downloads per second, in another window type the following:</p>
<pre><code>rake crawl:watch
</code></pre>
<p>When you feel you&#8217;ve gathered enough data, <code>CTRL-C</code> to kill both windows and then type:</p>
<pre><code>rake results:process
</code></pre>
<h2>Results at Home</h2>
<p>As you can see from the chart, on my home computer through u-verse we max out at about 150 agents at 27 pages/sec, far below our original estimate of 200 pages/sec.</p>
<p>
<img src="http://www.xcombinator.com/wp-content/uploads/2010/09/wget-pages-per-second-500.png" alt="wget-pages-per-second" title="wget-pages-per-second" width="500" height="335" class="aligncenter size-full wp-image-255" />
</p>
<pre><code>procs pages/sec
10     3.9
25     8.9
50    15.8
75    19.7
100   25.8
125   26.1
150   27.3
175   22.3
200    6.0
</code></pre>
<p>First of all, our initial estimate of 10KB per page was too low. In reality we observed a mean page size of around 37KB. This means on our 2300KB connection we can only expect a best-case download rate of 62 pages/sec. Still, 27 pages/sec is not even half that speed. </p>
<p>The other problem could be DNS requests. My home router also serves as my DNS server. It&#8217;s good enough for home use, but I&#8217;m pretty sure it&#8217;s not up to this task. Come to think of it, I&#8217;m not even sure how fast this CAT5 cable is.  </p>
<p>I think it&#8217;s time to try out this setup in a better environment.</p>
<h2>In the Data Center</h2>
<p>For this experiment we loaded our script onto a beefy 8-core machine with a fat bandwidth connection. </p>
<p>The results were much better:</p>
<p>
<img src="http://www.xcombinator.com/wp-content/uploads/2010/09/wget-pages-per-second-datacenter-500.png" alt="wget-pages-per-second-datacenter" title="wget-pages-per-second-datacenter" width="500" height="334" class="aligncenter size-full wp-image-257" /></p>
<pre><code>procs  pages/sec
  150   54
  200   71
  300  107
  400  141
  500  178
  600  214
  700  244
  800  386
  900  327
 1000  203
 1100  222
 1200  392
 1300  202
 1500  249
 1600  485
 2000  577
 3000  679
 3500  459
 4000  336
</code></pre>
<blockquote>
<p>Take these numbers as rough estimates. For each of these entries I only let them run for a few minutes.</p>
</blockquote>
<h3>Processes</h3>
<p>When I started getting into the thousands of processes, I wondered if I would hit the user process limit. <a href="http://yakkstr.com/users/ddonnell">Jay Donnell</a> pointed out to me that <code>uname</code> will also give the process limit:</p>
<pre><code> ulimit -a
 max user processes              (-u) 268287
</code></pre>
<p>So with 260k+ processes available, we have no problem there.</p>
<h3>Files</h3>
<p>Using <code>wget</code> process gets its own file, which is uncompressed. So we&#8217;ve got a lot of disk IO going on. We&#8217;d probably save a good amount of time if each process just opened one file and appended content to it. We&#8217;d also save the file system from creating hundreds of thousands of inodes.</p>
<p>Also, the decline we see around 3000 agents may be due, in part, to the max number of open files on our system:</p>
<pre><code>$ ulimit -a
open files                      (-n) 1024
</code></pre>
<p>Each crawler waits an average of 2 seconds before making the next request, at which time it makes the request and then downloads the file. So each process is making 1 file every >2 seconds. This means the number of <code>wget</code> processes we can run should be at least twice the max number of open files (2048). </p>
<p>My theory is this: at around 3000 concurrent agents the time it takes to actually download the content means that the probability that we will have enough file descriptors available.  However, once we have 4000 concurrent agents the probability that any two agents will need to write a file at the same time is much higher, we see a significant performance drop.</p>
<p>I think we&#8217;re going to need to look at the file system format. Currently we&#8217;re using ext3, but I&#8217;m not sure if we should switch to xfs. While monitoring the file count using <code>find</code> I kept getting the following error: </p>
<pre><code>find: WARNING: Hard link count is wrong for &lt;some file&gt;: this may be a bug
    in your filesystem driver.  Automatically turning on find's -noleaf option.
    Earlier results may have failed to include directories that should have been searched.
</code></pre>
<p>Also <code>kjournald</code> seemed to be working very hard to keep up with all the file writes. I&#8217;m not sure if this is unavoidable or not. I&#8217;m going to leave this problem for future work.</p>
<h2>Summary</h2>
<p>This crawler is just a baseline to see what performance is possible of basic unix utilities. Obviously, this approach used a list of static URLs and in a &#8220;real&#8221; crawler you probably want to have a mechanism for communicating and prioritizing URLs throughout the system.</p>
<p>That said, if you already know the list of URLs you want to download, you could download tens-of-million pages over a 24-hour window. For instance, if we assume a sustained rate of 600 pages per second you could download <em>51.8 million</em> pages in 24 hours.</p>
<p>So how long would it take to download a billion pages with <code>xargs</code> and <code>wget</code>?</p>
<p>If you had the list of URLs beforehand, according to these numbers it would<br />
take <em>19 days</em> .</p>
<p>To download a billion pages is a week we&#8217;re going to need to figure out a way to download at least 1000 more pages per second. </p>
<p>What we&#8217;ve learned:</p>
<ul>
<li>watch out for file limits</li>
<li>append to a single file rather than creating thousands of tiny files</li>
<li>run your own non-locking DNS server</li>
<li>unix tools are handy and powerful</li>
</ul>
<p>What do you think?</p>
<p>Any suggestions for cranking out more performance out of <code>wget</code>? Should I try increasing my open file limit and see what happens? Think these numbers are ridiculous? Leave a comment below!</p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F06%2Fa-crawler-using-wget-and-xargs%2F&amp;title=a%20crawler%20using%20wget%20and%20xargs&amp;notes=How%20long%20would%20it%20take%20to%20crawl%20a%20billion%20pages%20using%20wget%20and%20xargs%3F%0D%0A%0D%0AWe%27re%20on%20a%20quest%20to%20write%20a%20scalable%20web%20crawler.%20%20Our%20goal%20is%20to%20build%20a%20web%20crawler%20that%20will%20download%20a%20billion%20pages%20a%20week.%20%20We%27ve%20calculated%20that%20to%20download%20a%20billion%20pag" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F06%2Fa-crawler-using-wget-and-xargs%2F&amp;title=a%20crawler%20using%20wget%20and%20xargs" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F06%2Fa-crawler-using-wget-and-xargs%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=a%20crawler%20using%20wget%20and%20xargs%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F06%2Fa-crawler-using-wget-and-xargs%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F06%2Fa-crawler-using-wget-and-xargs%2F&amp;t=a%20crawler%20using%20wget%20and%20xargs" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F06%2Fa-crawler-using-wget-and-xargs%2F&amp;title=a%20crawler%20using%20wget%20and%20xargs&amp;annotation=How%20long%20would%20it%20take%20to%20crawl%20a%20billion%20pages%20using%20wget%20and%20xargs%3F%0D%0A%0D%0AWe%27re%20on%20a%20quest%20to%20write%20a%20scalable%20web%20crawler.%20%20Our%20goal%20is%20to%20build%20a%20web%20crawler%20that%20will%20download%20a%20billion%20pages%20a%20week.%20%20We%27ve%20calculated%20that%20to%20download%20a%20billion%20pag" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F06%2Fa-crawler-using-wget-and-xargs%2F&amp;t=a%20crawler%20using%20wget%20and%20xargs" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F06%2Fa-crawler-using-wget-and-xargs%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2010/09/06/a-crawler-using-wget-and-xargs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2010/09/06/a-crawler-using-wget-and-xargs/</feedburner:origLink></item>
		<item>
		<title>index and working tree do not reflect changes that are now in HEAD</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/NSFHnyDj_VE/</link>
		<comments>http://www.xcombinator.com/2010/09/04/index-and-working-tree-do-not-reflect-changes-that-are-now-in-head/#comments</comments>
		<pubDate>Sat, 04 Sep 2010 17:30:54 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/?p=296</guid>
		<description><![CDATA[After my recent git class someone asked this question:

 I was trying some remote git repo tests last night, and when I went to push my changes, I received this warning message:
warning: updating the currently checked out branch; this may cause confusion, as the index and working tree do not reflect changes that are now [...]]]></description>
			<content:encoded><![CDATA[<p>After my recent git class someone asked this question:</p>
<blockquote><p>
 I was trying some remote git repo tests last night, and when I went to push my changes, I received this warning message:</p>
<p><code>warning: updating the currently checked out branch; this may cause confusion, as the index and working tree do not reflect changes that are now in HEAD.</code></p>
<p>On remote server I created a directory and did &#8220;<code>git init</code>&#8220;. Then cloned it from my local machine, did changes, committed, and then push. All seemed straightforward there. Any thoughts?</p>
</blockquote>
<p>The issue git is warning you about is that your remote has both the <em>repository</em> and a <em>working copy</em>. That is, on the remote server you have a directory <code>project/</code> with files in it (the working copy) and the folder <code>project/.git</code> (the repository).</p>
<p>If you push from your local machine to the remote, you will only be updating files in the repository and <strong>not</strong> the working copy. That is, the non-git files will not be changed. This can be confusing because you might log into the remote after you push an expect the working copy to be different.</p>
<p>To deal with this possible confusion <code>git init</code> provides a <code>--bare</code> option. What this does is create the repository only (no working copy).  You can then <code>push</code> and <code>pull</code> from the remote like you might a central svn server.</p>
<p>Let me show an example. Say I have an existing git repository on my local machine and I want to create a new remote to back it up. My workflow would look like this:<br />
<code><br />
ssh me@myserver.com<br />
mkdir ~/git/newproject.git<br />
cd ~/git/newproject.git<br />
git init --bare<br />
exit<br />
git remote add myserver me@myserver.com:/home/nmurray/git/newproject.git<br />
git push myserver master</code></p>
<p>If you want, you could even chain these together as a single command:</p>
<p><code>ssh me@myserver.com "mkdir ~/git/newproject.git &#038;&#038; cd ~/git/newproject.git &#038;&#038; git init --bare" &#038;&#038; echo git remote add myserver me@myservercom:/home/nmurray/git/newproject.git</code></p>
<p>Hope this helps!</p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F04%2Findex-and-working-tree-do-not-reflect-changes-that-are-now-in-head%2F&amp;title=index%20and%20working%20tree%20do%20not%20reflect%20changes%20that%20are%20now%20in%20HEAD&amp;notes=After%20my%20recent%20git%20class%20someone%20asked%20this%20question%3A%0D%0A%0D%0A%0D%0A%20I%20was%20trying%20some%20remote%20git%20repo%20tests%20last%20night%2C%20and%20when%20I%20went%20to%20push%20my%20changes%2C%20I%20received%20this%20warning%20message%3A%0D%0A%0D%0Awarning%3A%20updating%20the%20currently%20checked%20out%20branch%3B%20this%20may%20caus" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F04%2Findex-and-working-tree-do-not-reflect-changes-that-are-now-in-head%2F&amp;title=index%20and%20working%20tree%20do%20not%20reflect%20changes%20that%20are%20now%20in%20HEAD" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F04%2Findex-and-working-tree-do-not-reflect-changes-that-are-now-in-head%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=index%20and%20working%20tree%20do%20not%20reflect%20changes%20that%20are%20now%20in%20HEAD%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F04%2Findex-and-working-tree-do-not-reflect-changes-that-are-now-in-head%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F04%2Findex-and-working-tree-do-not-reflect-changes-that-are-now-in-head%2F&amp;t=index%20and%20working%20tree%20do%20not%20reflect%20changes%20that%20are%20now%20in%20HEAD" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F04%2Findex-and-working-tree-do-not-reflect-changes-that-are-now-in-head%2F&amp;title=index%20and%20working%20tree%20do%20not%20reflect%20changes%20that%20are%20now%20in%20HEAD&amp;annotation=After%20my%20recent%20git%20class%20someone%20asked%20this%20question%3A%0D%0A%0D%0A%0D%0A%20I%20was%20trying%20some%20remote%20git%20repo%20tests%20last%20night%2C%20and%20when%20I%20went%20to%20push%20my%20changes%2C%20I%20received%20this%20warning%20message%3A%0D%0A%0D%0Awarning%3A%20updating%20the%20currently%20checked%20out%20branch%3B%20this%20may%20caus" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F04%2Findex-and-working-tree-do-not-reflect-changes-that-are-now-in-head%2F&amp;t=index%20and%20working%20tree%20do%20not%20reflect%20changes%20that%20are%20now%20in%20HEAD" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F04%2Findex-and-working-tree-do-not-reflect-changes-that-are-now-in-head%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2010/09/04/index-and-working-tree-do-not-reflect-changes-that-are-now-in-head/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2010/09/04/index-and-working-tree-do-not-reflect-changes-that-are-now-in-head/</feedburner:origLink></item>
		<item>
		<title>Getting Cascading to Read Sequence Files Created Somewhere Else</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/YhjxENmpopc/</link>
		<comments>http://www.xcombinator.com/2010/09/02/getting-cascading-to-read-sequence-files-created-somewhere-else/#comments</comments>
		<pubDate>Thu, 02 Sep 2010 20:54:09 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/?p=291</guid>
		<description><![CDATA[Sometimes you can&#8217;t control where your data comes from or how it&#8217;s formatted. For instance, where I work a lot data is stored in SequenceFiles. Unfortunately, the files are not taking advantage of the typing SequenceFiles provide and instead each record is a single field containing delimited string.
I like to use Cascading (or cascalog) for [...]]]></description>
			<content:encoded><![CDATA[<p><em>Sometimes you can&#8217;t control</em> where your data comes from or how it&#8217;s formatted. For instance, where I work a lot data is stored in <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html"><code>SequenceFile</code>s</a>. Unfortunately, the files are not taking advantage of the typing <code>SequenceFile</code>s provide and instead each record is a single field containing delimited string.</p>
<p>I like to use Cascading (or cascalog) for my Hadoop jobs, but out of the box Cascading doesn&#8217;t support using <code>SequenceFile</code>s that were created outside of Cascading. That is to say, Cascading requires that your <code>SequenceFile</code>s values be an instance of <code>Tuple</code>.</p>
<p>The solution is to create your own <code>Scheme</code> that parses a <code>SequenceFile</code> according to your own format. In my case I just want to parse each line as the string list.</p>
<p>The code is simple but may not be obvious for a first-time Cascading user. I hope this will save someone a few minutes.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">    <span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">com.xcombinator</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.IOException</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tap.Tap</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tuple.Fields</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tuple.Tuple</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tuple.TupleEntry</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tuple.Tuples</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.scheme.SequenceFile</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.JobConf</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.OutputCollector</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.SequenceFileInputFormat</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.SequenceFileOutputFormat</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #008000; font-style: italic; font-weight: bold;">/**
     * A SequenceFileAsText is a type of {@link SequenceFile}, however the
     * SequenceFile has been created outside of Cascading and is assumed to have a
     * value of a string.
     */</span>
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> SequenceFileAsText <span style="color: #000000; font-weight: bold;">extends</span> SequenceFile
      <span style="color: #009900;">&#123;</span>
      <span style="color: #008000; font-style: italic; font-weight: bold;">/** Field serialVersionUID */</span>
      <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000066; font-weight: bold;">long</span> serialVersionUID <span style="color: #339933;">=</span> 1L<span style="color: #339933;">;</span>
&nbsp;
      <span style="color: #008000; font-style: italic; font-weight: bold;">/** Protected for use by TempDfs and other subclasses. Not for general consumption. */</span>
      <span style="color: #000000; font-weight: bold;">protected</span> SequenceFileAsText<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">super</span><span style="color: #009900;">&#40;</span> <span style="color: #000066; font-weight: bold;">null</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
      <span style="color: #008000; font-style: italic; font-weight: bold;">/**
       * Creates a new SequenceFileAsText instance that stores the given field names.
       *
       * @param fields
       */</span>
      <span style="color: #000000; font-weight: bold;">public</span> SequenceFileAsText<span style="color: #009900;">&#40;</span> Fields fields <span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">super</span><span style="color: #009900;">&#40;</span> fields <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
      @Override
      <span style="color: #000000; font-weight: bold;">public</span> Tuple source<span style="color: #009900;">&#40;</span> <span style="color: #003399;">Object</span> key, <span style="color: #003399;">Object</span> value <span style="color: #009900;">&#41;</span>
      <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>value <span style="color: #000000; font-weight: bold;">instanceof</span> Tuple<span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
          <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #009900;">&#40;</span>Tuple<span style="color: #009900;">&#41;</span> value<span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
        <span style="color: #000000; font-weight: bold;">else</span> <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>value <span style="color: #000000; font-weight: bold;">instanceof</span> <span style="color: #003399;">Comparable</span><span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
          <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #000000; font-weight: bold;">new</span> Tuple<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #003399;">Comparable</span><span style="color: #009900;">&#41;</span> value<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
        <span style="color: #000000; font-weight: bold;">else</span> <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>value <span style="color: #339933;">!=</span> <span style="color: #000066; font-weight: bold;">null</span><span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
          <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #000000; font-weight: bold;">new</span> Tuple<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span>.<span style="color: #006633;">valueOf</span><span style="color: #009900;">&#40;</span>value<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
        <span style="color: #000000; font-weight: bold;">else</span>
        <span style="color: #009900;">&#123;</span>
          <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #000000; font-weight: bold;">new</span> Tuple<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #003399;">Comparable</span><span style="color: #009900;">&#41;</span><span style="color: #000066; font-weight: bold;">null</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
      <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #009900;">&#125;</span></pre></div></div>

<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F02%2Fgetting-cascading-to-read-sequence-files-created-somewhere-else%2F&amp;title=Getting%20Cascading%20to%20Read%20Sequence%20Files%20Created%20Somewhere%20Else&amp;notes=Sometimes%20you%20can%27t%20control%20where%20your%20data%20comes%20from%20or%20how%20it%27s%20formatted.%20For%20instance%2C%20where%20I%20work%20a%20lot%20data%20is%20stored%20in%20SequenceFiles.%20Unfortunately%2C%20the%20files%20are%20not%20taking%20advantage%20of%20the%20typing%20SequenceFiles%20provide%20and%20instead%20each%20rec" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F02%2Fgetting-cascading-to-read-sequence-files-created-somewhere-else%2F&amp;title=Getting%20Cascading%20to%20Read%20Sequence%20Files%20Created%20Somewhere%20Else" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F02%2Fgetting-cascading-to-read-sequence-files-created-somewhere-else%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=Getting%20Cascading%20to%20Read%20Sequence%20Files%20Created%20Somewhere%20Else%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F02%2Fgetting-cascading-to-read-sequence-files-created-somewhere-else%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F02%2Fgetting-cascading-to-read-sequence-files-created-somewhere-else%2F&amp;t=Getting%20Cascading%20to%20Read%20Sequence%20Files%20Created%20Somewhere%20Else" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F02%2Fgetting-cascading-to-read-sequence-files-created-somewhere-else%2F&amp;title=Getting%20Cascading%20to%20Read%20Sequence%20Files%20Created%20Somewhere%20Else&amp;annotation=Sometimes%20you%20can%27t%20control%20where%20your%20data%20comes%20from%20or%20how%20it%27s%20formatted.%20For%20instance%2C%20where%20I%20work%20a%20lot%20data%20is%20stored%20in%20SequenceFiles.%20Unfortunately%2C%20the%20files%20are%20not%20taking%20advantage%20of%20the%20typing%20SequenceFiles%20provide%20and%20instead%20each%20rec" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F02%2Fgetting-cascading-to-read-sequence-files-created-somewhere-else%2F&amp;t=Getting%20Cascading%20to%20Read%20Sequence%20Files%20Created%20Somewhere%20Else" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F02%2Fgetting-cascading-to-read-sequence-files-created-somewhere-else%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2010/09/02/getting-cascading-to-read-sequence-files-created-somewhere-else/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2010/09/02/getting-cascading-to-read-sequence-files-created-somewhere-else/</feedburner:origLink></item>
		<item>
		<title>git cheatsheet and class notes</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/rYKdIGfUNpg/</link>
		<comments>http://www.xcombinator.com/2010/09/01/git-cheat-sheet-and-class-notes/#comments</comments>
		<pubDate>Wed, 01 Sep 2010 19:27:48 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/?p=274</guid>
		<description><![CDATA[I recently gave a talk at work about git. I created a cheat sheet based on <a href="http://clojure.org/cheatsheet">Steve Tayon's Clojure Cheatsheet</a>. 

[caption id="attachment_276" align="center" width="496" caption="Git Cheat Sheet Preview"]<a href="http://www.xcombinator.com/wp-content/uploads/2010/09/git-class-cheat-sheet.pdf"><img src="http://www.xcombinator.com/wp-content/uploads/2010/09/git-cheat-sheet-preview.jpg" alt="Git Cheat Sheet Preview" title="Git Cheat Sheet Preview" width="496" height="347" class="size-full wp-image-276" /></a>[/caption]

I realize there are a <a href="http://zrusin.blogspot.com/2007/09/git-cheat-sheet.html">number</a> <a href="http://github.com/guides/git-cheat-sheet">of</a> <a href="http://cheat.errtheblog.com/s/git">cheatsheets</a> for git already. However, I wanted a simple, one-page sheet specifically for my attendees. 

You can download it here:
<ul>
	<li><a href="http://www.xcombinator.com/wp-content/uploads/2010/09/git-class-cheat-sheet.pdf">git cheatsheet pdf</a></li>
	<li><a href="http://github.com/jashmenn/talks/raw/master/git/cheat-sheet/git-class-cheat-sheet.tex">git cheatsheet LaTeX source</a></li>
</ul>

You can find the raw notes of my talk after the jump.




 
]]></description>
			<content:encoded><![CDATA[<p>I recently gave a talk at work about git. I created a cheatsheet based on <a href="http://clojure.org/cheatsheet">Steve Tayon&#8217;s Clojure Cheatsheet</a>. </p>
<div id="attachment_276" class="wp-caption center" style="width: 506px"><a href="http://www.xcombinator.com/wp-content/uploads/2010/09/git-class-cheat-sheet.pdf"><img src="http://www.xcombinator.com/wp-content/uploads/2010/09/git-cheat-sheet-preview.jpg" alt="Git Cheat Sheet Preview" title="Git Cheat Sheet Preview" width="496" height="347" class="size-full wp-image-276" /></a><p class="wp-caption-text">Git Cheat Sheet Preview</p></div>
<p>I realize there are a <a href="http://zrusin.blogspot.com/2007/09/git-cheat-sheet.html">number</a> <a href="http://github.com/guides/git-cheat-sheet">of</a> <a href="http://cheat.errtheblog.com/s/git">cheatsheets</a> for git already. However, I wanted a simple, one-page sheet specifically for my attendees. </p>
<p>You can download it here:</p>
<ul>
<li><a href="http://www.xcombinator.com/wp-content/uploads/2010/09/git-class-cheat-sheet.pdf">git cheatsheet pdf</a></li>
<li><a href="http://github.com/jashmenn/talks/raw/master/git/cheat-sheet/git-class-cheat-sheet.tex">git cheatsheet LaTeX source</a></li>
</ul>
<p>Like it? Hate it? Find a typo? <a href="http://www.xcombinator.com/2010/09/01/git-cheat-sheet-and-class-notes/#comments">Leave your feedback in the comments!</a></p>
<hr/>
<p>Here are my raw notes from the talk:<br />
<code></p>
<pre>
;; -*- mode: Markdown; -*-

# How to read:
commands are indented
actions to perform while presenting are marked with @
left to right

# Welcome
see progit.org
what is version control

why use it:

  * backup/restore
  * synchronization sharing
  * track changes
  * ownership
  * branching and merging

who has used subversion 

git
  * you've heard its distributed
  * b/c branching and merging

pace - slow, no slides

leave with practical understanding

# Install &amp; Config

    sudo port install git-core +svn
    git config --global user.name "Nate Murray"
    git config --global user.email "nate@natemurray.com"

# Basic Commands

    cd ~
    mkdir -p projects/demo       # explain only a little
    cd projects/demo
    git init
    git status                   # nothing here
    ls -a                        # talk .git repository vs. working copy
    echo "version 1" > README.txt
    git status                   # untracked file
    git add README.txt
    git status                   # changes to be committed
    git commit -m "added version one of the file"
    git status                   # clean

stop, draw the picture of the local operation phases - e.g. svn vs. git

> Principle 1: (almost) everything is local

so now that you know about the staging area, lets do it again

    echo "new file" > sheep.rb
    git status                   # draw untracked
    git add sheep.rb
    git status                   # draw staged
    git commit -m "added"

    cat README.txt                 # draw unmodified
    echo "version 2" > README.txt
    git status                   # draw modified
    git commit -a -m "updated version" # shorthand for git add
    git status

Tips:

    git config --global alias.st status
    git st

# Git Internals

* Before we can talk about branching you *have* to understand how git (tried to avoid this)
* files and folders

three objects -  @ Draw first commit

  * blob        - raw data
  * tree        - folder (stores blobs and trees)
  * commit      - snapshot of the repo + meta 

You won't need to use `git cat-file` on a daily basis. however, understanding
the concepts we're going to talk about is really important for branching.

    git log # view the log
    git show ----  # first commit, whatever it is

    git cat-file -p  ---- # first commit
    git cat-file -p  ---- # tree
    git cat-file -p  ---- # blob

draw the rest using git `cat-file`

    git log           # show the log again
    git cat-file -p ---- # second commit

draw the picture. point out the parent connection.
note committer / author

    git cat-file -p ---- # tree

note here there are two blobs!

finish drawing out the second commit
* git stores reference to first file.
* snapshot of the *whole project*
* git stores each file once
* filename is in the `tree` 

draw the last commit

     git log
     git cat-file -p ---- # third commit

> Principle #2 : Git commits are snapshots

* A commit in git is a snapshot of the entire project, not just a list of diffs.
* snapshot is based on the SHA hash function. guarantees file integrity

# refs/branches

questions?

@ stop. redraw commits as *linear* . looking only at commits

ready to define a branch
a branch is a pointer to a commi
text file with a sha. thats it. 

start with one branch called `master`

    git branch

bash prompt

    # skip this
    tree .git/refs/
    cat .git/refs/heads/master
    git log
    # compare the SHAs

update diagram by adding a `ref` to our commit. (`master`). 

@ draw circle pointing to commit

create testing branch

# branching

So lets create another branch:

    git branch testing
    git branch

only created, didn't switch. just created a ref pointing to this
commit

@ update diagram

How does git know what branch we are "on"?

special ref called `HEAD` that points to the local branch
since we are still on master HEAD points to master

@ add HEAD

To switch working copy, use the `git checkout`

    git checkout testing
    git branch

HEAD moves from `master` to `testing`

@ update diagram

master and testing point to the same commit, working directory isn't changed

checkout means something different in git than it does in svn.
checkout in git to switch our working directory to a particular commit. 

now make changes:

    cat README.txt
    echo "we are on the testing branch!" > README.txt
    cat README.txt
    git commit -a -m "updated the readme"
    git log

@ update diagram, adding new commit. move the testing ref and the HEAD ref with it

add a "test"

    echo "this is a test" > test.rb
    git add test.rb                    # stage it for our commit
    git commit -m "added a test"       # now commit
    git log

@ update diagram - should have two commits

hotfix - scenario: you need to switch back to master

    git checkout master
    ls

@ move HEAD

so notice two things.
1) switching to this branch was fast - everything is local
2) our file test.rb is absent

and if we

    cat README.txt

it says 'version 2' just like we would expect

    echo "applying fix" >> sheep.rb
    cat sheep.rb
    git commit -a -m "applied important fix"
    git log
    git cat-file -p ---- # last commit

@ draw the new commit, and draw its reference back to the parent. move HEAD and master

now fixed, can push into production
and get back to work in `testing`

    git checkout testing
    cat README.txt
    cat test.rb

This is a general pattern:

> Principle #3: Branching is cheap, use it often

If you are working on a particular feature, create a branch. 

If you're coming from svn, making frequent branches might seem unnatural.
in svn, a branch is global -> namespace issues.
vs. git: private branches
name your branch 'test' and it won't collide with anyone elses

But branching itself isn't that useful unless its easy to merge.

* how many of you have merged a branch in svn?
* how many of you enjoyed it?

merging is one of git's strength and git makes it relatively easy

# merging

    cat sheep.rb

two branches: `master` and `testing` - need to merge

    git checkout master
    git merge testing
    git show HEAD

instead of a 'parent' we have a line that says 'merge'
a merge commit has more than one parent

@ draw the commit object
@ draw lines to the commits

    gitx

sometimes merging doesn't go as planned - conflicts

    git checkout -b breaker

this is shorthand for create and then checkout a new branch based on the
current HEAD

    vi sheep.rb # changing fix
    git commit -a -m "changed the fix"
    git checkout master
    vi sheep.rb # improving fix
    git commit -a -m "improved the fix"

@(update diagram, adding breaker and master refs)

    git merge breaker
    git status

there are many diff viewing tools.
* perforce
* opendiff - from apple

    git mergetool -t opendiff

I don't really like using the visual tools.
Sometimes you need character level editing

    vi sheep.rb
    git add sheep.rb
    git commit -a

talk about merge with conflicts

@ update diagram draw new merge commit

    gitx

Questions?

# Remotes

Everything so far on one machine. 

I work offline (I take the train)
If I break something I can rollback see where I was an hour ago 

want to share our changes.
might seem scary or messy because changes to totally independent lines of the code.
but in practice its not a problem.

svn version numbers are incremental - so two repos would get out of
step
no easy way of merging two separate repostories. 

git blob identifiers are a SHA of the content.
if the same content is created anywhere in the universe you'll still
have the same SHA

git doesn't care about where your commits come from or how you get them

Protocols:
  * ssh
  * git
  * http
  * local file system

sample project on our github

    cd ..
    open http://XXX/nmurray/simple-echo
    git clone git@XXX:nmurray/simple-echo.git
    cd simple-echo
    git log

svn checkout just HEAD
vs. git - whole repo

To be able to collaborate with others you have to manage 'remote repositories'.
When you clone a project, you have a default remote called 'origin'. 

    git remote -v

Remotes are pointers to other repositories that are _usually_ over the network.
'pull' and 'push' changes.

    vi README.mkd
    # make a change
    git commit -a -m "make a change"
    git push

If someone else makes a change:

    git pull origin master

This means pull from `origin` the branch `master` into local branch `master`. You can often to just

    git pull

which means pull from origin whatever branch Im on (i.e. HEAD) into this branch.

Now let's say someone pushes a change and I make a change
I can't push unless I pull first. This is good.

# remote forks

So that is while we are on the same line. What if were on different lines?

@(open up webbrowser again)

Bh also has forked my project. But when we say forked, all the means is he has
created his own development line from some of my commits

    git remote add bh git@XXX:bhenderson/simple-echo.git
    git remote -v

Now you shouldn't be surprised to learn that adding the remote doesn't change
anything. First we have to `fetch` hist changes

    git fetch bh

`fetch` brings his commits into my repo but again, doesnt change my working copy.

fetch brought branches + commits into repo
work with those branches just like any other branch.

    git branch -a

So you see here we have 

* `master`, which is our local master
* we have at the bottom `origin/master` which is the origin where we pulled from branch master
* and then we have `bh/master`, which is bhendersons master branch

These are all regular branches: they are just pointers to commits. We
can even checkout as branch 

    git checkout bh/master

scary message

    git checkout master

So how would we merge bhendersons changes with our own? I'm sure you could guess by now. Simply:

    git merge bh/master # don't press enter!!!

But lets take it up a notch.
say you didn't want to merge bh changes in your master branch.
real world, you might not know if his changes would merge cleanly
don't want to mess up your master branch.  

What we are going to do is
* create a new branch,
* merge bhs branch in THAT branch
* then we're going to merge to master.

It will make more sense when we do it. Lets try:

Okay we first want to create a new branch based on our master

    git checkout -b bh-merge
    git branch -a 

Now lets merge his changes

    cat simple-echo.rb
    git merge bh/master
    cat simple-echo.rb
    git log                  # see bh as the author of the commit

okay everything was clean! *phew* now lets go back to master

    git checkout master
    git merge bh-merge
    git log

and there we go! merged nicely.
now I don't need bhendersons merge branch anymore, so lets delete it

    git branch -d bh-merge
    git branch -a

git is distributed

Instead of one central server, that everyone has to sync to,
* independent lines of work can go on.
* If someone creates something good in their branch, they just tell people about it.
* permission-less 

you can see why it is so good for open-source development

questions about branching?

# Advanced

* tagging
* rebase
* cherry pick
* git bisect
* hooks
* tracking branches
* submodules
* interactive staging
* squashing commits
* git-svn
* setting up your own server
* patches via email
* gitjour
</pre>
<p></code></p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fgit-cheat-sheet-and-class-notes%2F&amp;title=git%20cheatsheet%20and%20class%20notes&amp;notes=I%20recently%20gave%20a%20talk%20at%20work%20about%20git.%20I%20created%20a%20cheat%20sheet%20based%20on%20Steve%20Tayon%27s%20Clojure%20Cheatsheet.%20%0D%0A%0D%0A%0D%0AI%20realize%20there%20are%20a%20number%20of%20cheatsheets%20for%20git%20already.%20However%2C%20I%20wanted%20a%20simple%2C%20one-page%20sheet%20specifically%20for%20my%20attendees.%20%0D%0A%0D%0AYou%20can%20download%20it%20here%3A%0D%0A%0D%0A%09git%20cheatsheet%20pdf%0D%0A%09git%20cheatsheet%20LaTeX%20source%0D%0A%0D%0A%0D%0AYou%20can%20find%20the%20raw%20notes%20of%20my%20talk%20after%20the%20jump.%0D%0A%0D%0A%0D%0A%0D%0A%0D%0A%20%0D%0A" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fgit-cheat-sheet-and-class-notes%2F&amp;title=git%20cheatsheet%20and%20class%20notes" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fgit-cheat-sheet-and-class-notes%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=git%20cheatsheet%20and%20class%20notes%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fgit-cheat-sheet-and-class-notes%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fgit-cheat-sheet-and-class-notes%2F&amp;t=git%20cheatsheet%20and%20class%20notes" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fgit-cheat-sheet-and-class-notes%2F&amp;title=git%20cheatsheet%20and%20class%20notes&amp;annotation=I%20recently%20gave%20a%20talk%20at%20work%20about%20git.%20I%20created%20a%20cheat%20sheet%20based%20on%20Steve%20Tayon%27s%20Clojure%20Cheatsheet.%20%0D%0A%0D%0A%0D%0AI%20realize%20there%20are%20a%20number%20of%20cheatsheets%20for%20git%20already.%20However%2C%20I%20wanted%20a%20simple%2C%20one-page%20sheet%20specifically%20for%20my%20attendees.%20%0D%0A%0D%0AYou%20can%20download%20it%20here%3A%0D%0A%0D%0A%09git%20cheatsheet%20pdf%0D%0A%09git%20cheatsheet%20LaTeX%20source%0D%0A%0D%0A%0D%0AYou%20can%20find%20the%20raw%20notes%20of%20my%20talk%20after%20the%20jump.%0D%0A%0D%0A%0D%0A%0D%0A%0D%0A%20%0D%0A" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fgit-cheat-sheet-and-class-notes%2F&amp;t=git%20cheatsheet%20and%20class%20notes" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fgit-cheat-sheet-and-class-notes%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2010/09/01/git-cheat-sheet-and-class-notes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2010/09/01/git-cheat-sheet-and-class-notes/</feedburner:origLink></item>
		<item>
		<title>Desirable Properties for a Web Crawler</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/UF5IJ2avvt0/</link>
		<comments>http://www.xcombinator.com/2010/09/01/desirable-properties-for-a-web-crawler/#comments</comments>
		<pubDate>Wed, 01 Sep 2010 15:15:13 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[crawling]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/?p=247</guid>
		<description><![CDATA[I aim to build a web crawler that can download a billion pages in a week.
Below are some desirable properties any web crawler should have:
Scalability
The web is enormous and continually growing. A crawler should scale
linearly with the number of agent-machines that are added to the
system. This allows us to add more agents as our needs [...]]]></description>
			<content:encoded><![CDATA[<p>I aim to build a web crawler that can download a billion pages in a week.<br />
Below are some desirable properties any web crawler should have:</p>
<h2>Scalability</h2>
<p>The web is enormous and continually growing. A crawler should scale<br />
linearly with the number of agent-machines that are added to the<br />
system. This allows us to add more agents as our needs increase.</p>
<h2>Speed</h2>
<p>Speed is a significant issue at this scale. For example, if we want to crawl 1<br />
billion pages in a week (this is less than 1/1000th of the web), our system<br />
will have to sustain a rate of 1653 downloads per second.</p>
<p>To achieve this speed we need to employ a number of techniques such as<br />
concurrent connections, data compression, dns caching, minimize disk seeks,<br />
etc.</p>
<h2>Politeness</h2>
<p>While we need a high rate of download, we must be <em>polite</em> and not<br />
overload one particular server. Najork et. al. propose limiting requests to<br />
a single server by waiting 10 times the time it took to download the<br />
last page. <a href="http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-173.html">1</a></p>
<h2>Quality</h2>
<p>We aim to build a crawler that visits &#8220;high-quality&#8221; or &#8220;relevant&#8221; pages<br />
well-known quality metric. However, Najork et. al. found that a simple<br />
breadth-first crawl tends to visit high-quality pages first <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.77.6162&amp;rep=rep1&amp;type=pdf">2</a>.<br />
We&#8217;ll eventually build a more intelligent page-selection mechanism, but for now<br />
breadth-first will work.</p>
<h2>Agents as a Distributed System</h2>
<p>Given that the crawling problem cannot be solved by a single machine,<br />
we are required to form our solution as a distributed<br />
system. Distributed systems introduce more room for failures<br />
and errors in coordination. Therefore we define the following desirable<br />
features for our distributed system:</p>
<h3>Fault Tolerance</h3>
<p>Hardware failure is unavoidable. The failure of one node should not<br />
prevent survivors from continuing to operate.</p>
<h3>Even Partitioning</h3>
<p>The URL frontier should be evenly distributed across all agents to<br />
evenly assign the work. Many crawlers use a hashing function to distribute<br />
the URLs among machines.</p>
<h3>Minimize Overlap</h3>
<p>Overlap is defined as <code>(n-u)/u</code>, where <code>n</code> is the total number of<br />
crawled pages and <code>u</code> is the number of <em>unique</em> pages (sometimes <code>u &lt; n</code><br />
because the same page has been erroneously fetched several times).<br />
Optimally, we want an overlap of 0. <a href="http://vigna.dsi.unimi.it/ftp/papers/UbiCrawler.pdf">3</a></p>
<h3>Agent churn</h3>
<p>During the crawl we may want to add additional resources. The system<br />
should support agents coming and leaving the group. </p>
<h2>Next Steps</h2>
<p>There are five major parts to a web crawler:</p>
<ul>
<li>The URL frontier</li>
<li>IP address lookup</li>
<li>Page download</li>
<li>Page processing</li>
<li>Tracking URLs encountered</li>
</ul>
<p>Over the next few articles we will be designing a each of the five<br />
components. The list of desirable features give us guidelines<br />
that help shape the decisions about each component.</p>
<ol>
<li>Najork M. High-Performance Web Crawling. Systems Research. 2001.</li>
<li>Najork M, Wiener JL. Breadth-First Search Crawling Yields High-Quality Pages. Systems Research. 2001:114-118.</li>
<li>Boldi P, Codenotti B, Santini M, Vigna S. UbiCrawler: a scalable fully distributed Web crawler. Software: Practice and Experience. 2004;34(8):711-726. </li>
</ol>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fdesirable-properties-for-a-web-crawler%2F&amp;title=Desirable%20Properties%20for%20a%20Web%20Crawler&amp;notes=I%20aim%20to%20build%20a%20web%20crawler%20that%20can%20download%20a%20billion%20pages%20in%20a%20week.%20%0D%0ABelow%20are%20some%20desirable%20properties%20any%20web%20crawler%20should%20have%3A%0D%0A%0D%0AScalability%0D%0A%0D%0AThe%20web%20is%20enormous%20and%20continually%20growing.%20A%20crawler%20should%20scale%0D%0Alinearly%20with%20the%20numb" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fdesirable-properties-for-a-web-crawler%2F&amp;title=Desirable%20Properties%20for%20a%20Web%20Crawler" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fdesirable-properties-for-a-web-crawler%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=Desirable%20Properties%20for%20a%20Web%20Crawler%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fdesirable-properties-for-a-web-crawler%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fdesirable-properties-for-a-web-crawler%2F&amp;t=Desirable%20Properties%20for%20a%20Web%20Crawler" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fdesirable-properties-for-a-web-crawler%2F&amp;title=Desirable%20Properties%20for%20a%20Web%20Crawler&amp;annotation=I%20aim%20to%20build%20a%20web%20crawler%20that%20can%20download%20a%20billion%20pages%20in%20a%20week.%20%0D%0ABelow%20are%20some%20desirable%20properties%20any%20web%20crawler%20should%20have%3A%0D%0A%0D%0AScalability%0D%0A%0D%0AThe%20web%20is%20enormous%20and%20continually%20growing.%20A%20crawler%20should%20scale%0D%0Alinearly%20with%20the%20numb" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fdesirable-properties-for-a-web-crawler%2F&amp;t=Desirable%20Properties%20for%20a%20Web%20Crawler" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F09%2F01%2Fdesirable-properties-for-a-web-crawler%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2010/09/01/desirable-properties-for-a-web-crawler/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2010/09/01/desirable-properties-for-a-web-crawler/</feedburner:origLink></item>
		<item>
		<title>a simple netty HTTP server in clojure</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/6JWDW8ge7YI/</link>
		<comments>http://www.xcombinator.com/2010/07/30/a-simple-netty-http-server-in-clojure/#comments</comments>
		<pubDate>Fri, 30 Jul 2010 22:42:45 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/?p=237</guid>
		<description><![CDATA[Recently I&#8217;ve been toying with various clojure wrappers around java web servers. My goal is to write a small evented server that can queue up HTTP requests and then kick off some long-running processes. 
So far I&#8217;ve tried aleph, compjure/ring/jetty and saturnine. 
The compojure stack is by far the cleanest, but it&#8217;s geared more
towards synchronous [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I&#8217;ve been toying with various clojure wrappers around java web servers. My goal is to write a small evented server that can queue up HTTP requests and then kick off some long-running processes. </p>
<p>So far I&#8217;ve tried <a href="http://github.com/ztellman/aleph">aleph</a>, <a href="http://mmcgrana.github.com/2010/07/develop-deploy-clojure-web-applications.html">compjure/ring/jetty</a> and <a href="http://github.com/texodus/saturnine">saturnine</a>. </p>
<p>The <code>compojure</code> stack is by far the cleanest, but it&#8217;s geared more<br />
towards synchronous request/response cycles. I&#8217;d like to use<br />
something like to EventMachine and <code>saturnine</code> seems the closest to<br />
that goal. However, <code>saturnine</code>&#8217;s current HTTP implementation is<br />
lacking.</p>
<p>Now before I could contribute to <code>saturnine</code> I first needed to understand <a href="http://jboss.org/netty">netty</a>. If you&#8217;re trying to learn netty, I recommend you first read the <a href="http://www.jboss.org/netty/documentation.html">users guide</a> and then jump straight to the API docs on <a href="http://docs.jboss.org/netty/3.2/api/org/jboss/netty/channel/ChannelPipeline.html"><code>ChannelPipeline</code></a>.</p>
<p>Next, I needed to write a basic HTTP server using netty. <a href="http://stackoverflow.com/questions/1735776/server-programming-with-clojure">This post</a> on StackOverflow and <a href="http://docs.jboss.org/netty/3.2/xref/org/jboss/netty/example/http/snoop/package-summary.html">this sample code</a> on netty&#8217;s website helped me get a basic HTTP server up an running. </p>
<p>Below is a a nieve translation of the netty sample code into clojure. Note that this is <em>not</em> listed here as an example of sexy clojure code, but rather a starting point for someone looking to get dirty with the netty libraries in clojure.  </p>
<pre><code>    (ns xcombinator.netty.server
      (:gen-class)
      (:use clojure.contrib.import-static)
      (:import
         [java.net InetSocketAddress]
         [java.util.concurrent Executors]
         [org.jboss.netty.bootstrap ServerBootstrap]
         [org.jboss.netty.channel Channels ChannelPipelineFactory
           SimpleChannelHandler SimpleChannelUpstreamHandler]
         [org.jboss.netty.channel.socket.nio NioServerSocketChannelFactory]
         [org.jboss.netty.buffer ChannelBuffers]
         [org.jboss.netty.handler.codec.http HttpRequestDecoder
           HttpResponseEncoder DefaultHttpResponse]
    ))

    (import-static org.jboss.netty.handler.codec.http.HttpVersion HTTP_1_1)
    (import-static org.jboss.netty.handler.codec.http.HttpResponseStatus OK)
    (import-static org.jboss.netty.handler.codec.http.HttpHeaders$Names CONTENT_TYPE)

    (declare make-handler)

    (defrecord Server [#^ServerBootstrap bootstrap channel])

    (defn start
      "Start a Netty server. Returns the pipeline."
      [port handler]
      (let [channel-factory (NioServerSocketChannelFactory.
                              (Executors/newCachedThreadPool)
                              (Executors/newCachedThreadPool))
            bootstrap (ServerBootstrap. channel-factory)
            pipeline (.getPipeline bootstrap)]
        (.addLast pipeline "decoder" (new HttpRequestDecoder))
        (.addLast pipeline "encoder" (new HttpResponseEncoder))
        (.addLast pipeline "handler" (make-handler))
        (.setOption bootstrap "child.tcpNoDelay", true)
        (.setOption bootstrap "child.keepAlive", true)
        (new Server bootstrap (.bind bootstrap (InetSocketAddress. port))))) 

    (defn stop-server
      {:doc "Stops a Server instance"
       :arglists '([server])}
      [{bootstrap :bootstrap channel :channel}]
      (do (.unbind channel)
          (.releaseExternalResources bootstrap)))

    (defn http-response
      [status]
      (doto (DefaultHttpResponse. HTTP_1_1 status)
        (.setHeader CONTENT_TYPE "text/plain; charset=UTF-8")
        (.setContent (ChannelBuffers/copiedBuffer
                       (str "Success: " status) "UTF-8"))))

    (defn make-handler
      "Returns a Netty handler."
      []
      (proxy [SimpleChannelUpstreamHandler] []
        (messageReceived [ctx e]
          (let [c (.getChannel e)
                cb (.getMessage e)
                ]
            (println "HTTP request from" c)
            (.write c (http-response OK))
            (-&gt; e .getChannel .close)))

        (exceptionCaught
          [ctx e]
          (let [throwable (.getCause e)]
            (println "@exceptionCaught" throwable))
          (-&gt; e .getChannel .close))))

    (comment
      (def *server* (start 3335 make-handler))
      (stop-server *server*)
    )
</code></pre>
<p>Here&#8217;s a <code>project.clj</code> that will load up the right dependencies:</p>
<pre><code>    (defproject server "0.0.1"
      :description "Simple Netty HTTP server"
      :repositories [["JBoss" "http://repository.jboss.org/maven2"]]
      :dependencies
        [[org.clojure/clojure "1.2.0-beta1"]
        [org.clojure/clojure-contrib "1.2.0-beta1"]
        [org.jboss.netty/netty "3.2.0.BETA1"]
        [log4j/log4j           "1.2.14"]]
      :dev-dependencies [[autodoc              "0.7.0"]
                         [lein-clojars         "0.5.0-SNAPSHOT"]
                         [lein-run "1.0.0-SNAPSHOT"]
                         [swank-clojure "1.2.1"]]

      :namespaces [xcombinator.netty.server])
</code></pre>
<p>Then <code>lein swank</code> and evaluate the <code>(def *server*...</code> line in your<br />
REPL. You&#8217;ll have a simple netty HTTP server running on port <code>3335</code>.</p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F30%2Fa-simple-netty-http-server-in-clojure%2F&amp;title=a%20simple%20netty%20HTTP%20server%20in%20clojure&amp;notes=Recently%20I%27ve%20been%20toying%20with%20various%20clojure%20wrappers%20around%20java%20web%20servers.%20My%20goal%20is%20to%20write%20a%20small%20evented%20server%20that%20can%20queue%20up%20HTTP%20requests%20and%20then%20kick%20off%20some%20long-running%20processes.%20%0D%0A%0D%0ASo%20far%20I%27ve%20tried%20aleph%2C%20compjure%2Fring%2Fjett" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F30%2Fa-simple-netty-http-server-in-clojure%2F&amp;title=a%20simple%20netty%20HTTP%20server%20in%20clojure" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F30%2Fa-simple-netty-http-server-in-clojure%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=a%20simple%20netty%20HTTP%20server%20in%20clojure%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F30%2Fa-simple-netty-http-server-in-clojure%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F30%2Fa-simple-netty-http-server-in-clojure%2F&amp;t=a%20simple%20netty%20HTTP%20server%20in%20clojure" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F30%2Fa-simple-netty-http-server-in-clojure%2F&amp;title=a%20simple%20netty%20HTTP%20server%20in%20clojure&amp;annotation=Recently%20I%27ve%20been%20toying%20with%20various%20clojure%20wrappers%20around%20java%20web%20servers.%20My%20goal%20is%20to%20write%20a%20small%20evented%20server%20that%20can%20queue%20up%20HTTP%20requests%20and%20then%20kick%20off%20some%20long-running%20processes.%20%0D%0A%0D%0ASo%20far%20I%27ve%20tried%20aleph%2C%20compjure%2Fring%2Fjett" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F30%2Fa-simple-netty-http-server-in-clojure%2F&amp;t=a%20simple%20netty%20HTTP%20server%20in%20clojure" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F30%2Fa-simple-netty-http-server-in-clojure%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2010/07/30/a-simple-netty-http-server-in-clojure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2010/07/30/a-simple-netty-http-server-in-clojure/</feedburner:origLink></item>
		<item>
		<title>fileutils for clojure</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/y2Omz3hokb0/</link>
		<comments>http://www.xcombinator.com/2010/07/28/fileutils-for-clojure/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 22:37:08 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/?p=231</guid>
		<description><![CDATA[I just pushed clj-file-utils to clojars. 
I was looking for an easy way to replicate ruby&#8217;s fileutils library in clojure and I came across a version by Mark McGranaghan in his clj-garden project which wraps around the Apache Commons IO library. 
I extended Mark&#8217;s library by using multimethods to allow the use of strings (rather [...]]]></description>
			<content:encoded><![CDATA[<p>I just pushed <a href="http://clojars.org/clj-file-utils"><code>clj-file-utils</code></a> to clojars. </p>
<p>I was looking for an easy way to replicate ruby&#8217;s <code>fileutils</code> library in clojure and I came across a version by Mark McGranaghan in his <a href="http://github.com/mmcgrana/clj-garden"><code>clj-garden</code></a> project which wraps around the Apache Commons IO library. </p>
<p>I extended Mark&#8217;s library by using <a href="http://clojure.org/multimethods">multimethods</a> to allow the use of strings (rather than requiring File objects) for parameters.</p>
<h2>Usage</h2>
<pre><code>user=&gt; (use 'clj-file-utils.core)
nil
user=&gt; (exist "foo.txt")
false
user=&gt; (touch "foo.txt")
nil
user=&gt; (exist "foo.txt")
true
user=&gt; (rm "foo.txt")
nil
user=&gt; (file "foo.txt")
#&lt;File foo.txt&gt;
user=&gt; (.getParent (file "/path/to/foo.txt"))
"/path/to"
</code></pre>
<h2>As A Dependency</h2>
<p>leiningen</p>
<pre><code>[clj-file-utils "0.1.1"]
</code></pre>
<p>maven</p>
<pre><code>&lt;dependency&gt;
  &lt;groupId&gt;clj-file-utils&lt;/groupId&gt;
  &lt;artifactId&gt;clj-file-utils&lt;/artifactId&gt;
  &lt;version&gt;0.1.1&lt;/version&gt;
&lt;/dependency&gt;
</code></pre>
<h2>Code</h2>
<ul>
<li><a href="http://github.com/jashmenn/clj-file-utils">jashmenn/clj-file-utils on github</a></li>
</ul>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F28%2Ffileutils-for-clojure%2F&amp;title=fileutils%20for%20clojure&amp;notes=I%20just%20pushed%20clj-file-utils%20to%20clojars.%20%0D%0A%0D%0AI%20was%20looking%20for%20an%20easy%20way%20to%20replicate%20ruby%27s%20fileutils%20library%20in%20clojure%20and%20I%20came%20across%20a%20version%20by%20Mark%20McGranaghan%20in%20his%20clj-garden%20project%20which%20wraps%20around%20the%20Apache%20Commons%20IO%20library.%20%0D%0A" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F28%2Ffileutils-for-clojure%2F&amp;title=fileutils%20for%20clojure" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F28%2Ffileutils-for-clojure%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=fileutils%20for%20clojure%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F28%2Ffileutils-for-clojure%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F28%2Ffileutils-for-clojure%2F&amp;t=fileutils%20for%20clojure" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F28%2Ffileutils-for-clojure%2F&amp;title=fileutils%20for%20clojure&amp;annotation=I%20just%20pushed%20clj-file-utils%20to%20clojars.%20%0D%0A%0D%0AI%20was%20looking%20for%20an%20easy%20way%20to%20replicate%20ruby%27s%20fileutils%20library%20in%20clojure%20and%20I%20came%20across%20a%20version%20by%20Mark%20McGranaghan%20in%20his%20clj-garden%20project%20which%20wraps%20around%20the%20Apache%20Commons%20IO%20library.%20%0D%0A" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F28%2Ffileutils-for-clojure%2F&amp;t=fileutils%20for%20clojure" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F07%2F28%2Ffileutils-for-clojure%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2010/07/28/fileutils-for-clojure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2010/07/28/fileutils-for-clojure/</feedburner:origLink></item>
		<item>
		<title>Cascading, TF-IDF, and BufferedSum (Part 2)</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/W5lola3Wyy8/</link>
		<comments>http://www.xcombinator.com/2010/05/14/cascading-tf-idf-and-bufferedsum-part-2/#comments</comments>
		<pubDate>Fri, 14 May 2010 15:49:55 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/?p=222</guid>
		<description><![CDATA[Introduction
The tf-idf weight (term frequency-inverse document frequency) is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. [1]
To calculate tf-idf we need the following four values:

The number of times a term appears [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>The tf-idf weight (term frequency-inverse document frequency) is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. [<a href="http://en.wikipedia.org/wiki/Tf-idf" class="external-link">1</a>]</p>
<p>To calculate tf-idf we need the following four values:</p>
<ul>
<li>The number of times a term appears in a given document (<tt>n<sub>i,j</sub></tt>)</li>
<li>The total number of terms in a given document (<tt>sum k, n<sub>k,j</sub></tt>)</li>
<li>The number of documents that contain a given term (<tt>|{d : t<sub>i</sub> E d}|</tt>)</li>
<li>The total number of documents in the corpus (<tt>D</tt>)</li>
</ul>
<h2>Mathematical Details</h2>
<p>We want to score the importance of term <tt>t<sub>i</sub></tt> in document <tt>d<sub>j</sub></tt>.</p>
<p>Term frequency is defined by:</p>
<p><img src="http://www.xcombinator.com/wp-content/uploads/2010/05/tf.png" alt="tf" title="tf" width="119" height="43" class="size-full wp-image-225" /></p>
<p>Where <tt>n<sub>i,j</sub></tt> is the number of occurrences of term <tt>t<sub>i</sub></tt> in document <tt>d<sub>j</sub></tt>.</p>
<p>Inverse document frequency is defined by:</p>
<p><img src="http://www.xcombinator.com/wp-content/uploads/2010/05/idf.png" alt="idf" title="idf" width="189" height="47" class="size-full wp-image-223" /></p>
<p>Where:</p>
<ul>
<li><tt>D</tt> is the total number of documents in the corpus and</li>
<li><tt>|{d : t<sub>i</sub> E d}|</tt> is the number of documents in which the term <tt>t<sub>i</sub></tt> appears.</li>
</ul>
<p>Then:</p>
<p><img src="http://www.xcombinator.com/wp-content/uploads/2010/05/tf-idf.png" alt="tf-idf" title="tf-idf" width="175" height="22" class="size-full wp-image-224" /></p>
<p>Refer to [<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.8343" class="external-link" >2</a>] for more information on tf-idf.</p>
<h2>Operation Input</h2>
<p><a href="http://www.xcombinator.com/2009/12/18/cascading-tf-idf-and-bufferedsum-part-1/">Last time</a> we discussed the technique of taking a group of records, calculating a value from that group and emitting each record with the calculated value attached. We called this operation a BufferedSum. We&#8217;re going to build on our previous work and create a reusable component (called a <a href="http://www.cascading.org/javadoc/cascading/pipe/SubAssembly.html" class="external-link">SubAssembly</a>) for calculating tf-idf using Cascading on Hadoop.</p>
<p>To make our tf-idf operation we need to decide what the input arguments will be. Last time, we used an input corpus of the format <tt>(document_id, body)</tt> and emitted <tt>(document_id, term, term_count_in_document)</tt> for all terms in each document. This last tuple will be the input format to our tf-idf operation.</p>
<h2>Creating a SubAssembly</h2>
<p>The general format of a SubAssembly in Cascading is as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> MySubAssembly <span style="color: #000000; font-weight: bold;">extends</span> SubAssembly <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">public</span> MySubAssembly<span style="color: #009900;">&#40;</span>Pipe pipe<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #666666; font-style: italic;">// do something with `pipe`</span>
    setTails<span style="color: #009900;">&#40;</span>pipe<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// must register all assembly tails</span>
  <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>In our operation we are assuming that the total number of documents in the corpus is known or could be found with a simple MapReduce job. Given that we have the total number of documents we take that number as the input to our SubAssembly:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> TfIdfIndexSubAssmbly <span style="color: #000000; font-weight: bold;">extends</span> SubAssembly <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #003399;">Integer</span> totalNumberOfDocuments<span style="color: #339933;">;</span>
  <span style="color: #000000; font-weight: bold;">public</span> TfIdfIndexSubAssmbly<span style="color: #009900;">&#40;</span>Pipe pipe, <span style="color: #003399;">Integer</span> totalNumberOfDocuments<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">totalNumberOfDocuments</span> <span style="color: #339933;">=</span> totalNumberOfDocuments<span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// do something with pipe</span>
    setTails<span style="color: #009900;">&#40;</span>pipe<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// must register all assembly tails</span>
  <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<h2>Gathering Variables</h2>
<p>To compute our final tf-idf score, we first need to compute the intermediate variables.</p>
<h3><tt>total_terms_in_document</tt></h3>
<p>Given our input is <tt>(document_id, term, term_count_in_document)</tt> then we already have the first variable <tt>n<sub>i,j</sub></tt>. We can now calculate the total number of terms in each document:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// input: (document_id, term, term_count_in_document)</span>
<span style="color: #666666; font-style: italic;">// emits: (document_id, term, term_count_in_document, total_terms_in_document)</span>
pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> GroupBy<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Every<span style="color: #009900;">&#40;</span>pipe,
    <span style="color: #000000; font-weight: bold;">new</span> BufferedSum<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;total_terms_in_document&quot;</span><span style="color: #009900;">&#41;</span>,
                   <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term_count_in_document&quot;</span><span style="color: #009900;">&#41;</span>,
                   <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;term&quot;</span>, <span style="color: #0000ff;">&quot;term_count_in_document&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>,
    Fields.<span style="color: #006633;">SWAP</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Remember that <tt>BufferedSum</tt> takes three arguments:</p>
<ul>
<li>The name of the <tt>Field</tt> to output</li>
<li>The name of the <tt>Field</tt> to sum</li>
<li>The other <tt>Fields</tt> to &#8220;pull through&#8221; the operation</li>
</ul>
<p>So here we are grouping by <tt>document_id</tt>, and summing <tt>term_count_in_document</tt> for each group and placing the value into the field <tt>total_terms_in_document</tt>.</p>
<h3><tt>number_of_documents_containing_term</tt></h3>
<p>Next we need to calculate the number of documents that contain each term. We&#8217;ve already grouped by <tt>document_id</tt> and <tt>term</tt>, therefore we know we only have one record for a given <tt>document_id</tt>/<tt>term</tt> pair.</p>
<p>Rather than counting the number of <tt>document_id</tt>/<tt>term</tt> pairs directly we are simply going to assign a count of 1 to each record and then sum that value. This allows us to reuse the code we&#8217;ve written for <tt>BufferedSum</tt>.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// calculate the number of documents containing each term</span>
<span style="color: #666666; font-style: italic;">// input: (document_id, term, term_count_in_document, total_terms_in_document)</span>
<span style="color: #666666; font-style: italic;">// emit:  (document_id, term, term_count_in_document, total_terms_in_document, number_of_documents_containing_term)</span>
pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> Insert<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term_in_doc&quot;</span><span style="color: #009900;">&#41;</span>, <span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span>, Fields.<span style="color: #006633;">ALL</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// we're going to sum these, easier than creating BufferedCount</span>
pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> GroupBy<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Every<span style="color: #009900;">&#40;</span>pipe,
    <span style="color: #000000; font-weight: bold;">new</span> BufferedSum<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;number_of_documents_containing_term&quot;</span><span style="color: #009900;">&#41;</span>,
                   <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term_in_doc&quot;</span><span style="color: #009900;">&#41;</span>,
                   <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;term&quot;</span>, <span style="color: #0000ff;">&quot;term_count_in_document&quot;</span>, <span style="color: #0000ff;">&quot;total_terms_in_document&quot;</span>, <span style="color: #0000ff;">&quot;term_in_doc&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>,
    Fields.<span style="color: #006633;">SWAP</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Here we group on <tt>term</tt> and for every term group we calculate the number of documents that contain that term. Note that if you have a very large corpus some groups may become memory constrained as very common words such as &#8220;the&#8221; have groups containing nearly the entire corpus (it would be a good idea to remove stop-words during pre-processing).</p>
<p>After we&#8217;ve calculated the value for <tt>number_of_documents_containing_term</tt> we don&#8217;t need the <tt>term_in_doc</tt> field any longer. Using Cascading&#8217;s <tt>Identity</tt> operation we can reorder and discard fields.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"> pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>pipe, <span style="color: #666666; font-style: italic;">//  reorder and rm fields</span>
    <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;term&quot;</span>, <span style="color: #0000ff;">&quot;term_count_in_document&quot;</span>, <span style="color: #0000ff;">&quot;total_terms_in_document&quot;</span>, <span style="color: #0000ff;">&quot;number_of_documents_containing_term&quot;</span><span style="color: #009900;">&#41;</span>,
    <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Identity</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<h3><tt>total_documents</tt></h3>
<p>Next we do a hard insert of the number of documents. Again, you can calculate this value with a relatively simple MapReduce job (e.g. use a counter), but here probably not the best place to do it. Because you have to count every document in the corpus, it would be better to calculate the number of documents when you are generating the input documents file.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000066; font-weight: bold;">int</span> D <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">totalNumberOfDocuments</span><span style="color: #339933;">;</span>
pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> Insert<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;total_documents&quot;</span><span style="color: #009900;">&#41;</span>, D<span style="color: #009900;">&#41;</span>, Fields.<span style="color: #006633;">ALL</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>The <tt>Insert</tt> operation simply inserts the number of documents into the tuple stream.</p>
<h3>Calculating Tf-idf with a Custom Operation</h3>
<p>Now that we have all four values we can calculate tf-idf. We are going to create a custom operation to do this. We will perform this operation on each <tt>Tuple</tt> in the <tt>Pipe</tt> with the following:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// calculate tf * idf</span>
<span style="color: #666666; font-style: italic;">// input:  (document_id, term, term_count_in_document, total_terms_in_document, number_of_documents_containing_term, total_documents)</span>
<span style="color: #666666; font-style: italic;">// emit:   (document_id, term, term_count_in_document, total_terms_in_document, number_of_documents_containing_term, total_documents, tf, idf, tfidf)</span>
pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> TfIdfOperation<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, Fields.<span style="color: #006633;">ALL</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>To create an operation in Cascading you simply subclass <tt>BaseOperation</tt>. For example:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">class</span> MyOperation <span style="color: #000000; font-weight: bold;">extends</span> BaseOperation <span style="color: #000000; font-weight: bold;">implements</span> Function <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">public</span> MyOperation<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">super</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;out_field_1&quot;</span>, <span style="color: #0000ff;">&quot;out_field_2&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> operate<span style="color: #009900;">&#40;</span>FlowProcess flowProcess, FunctionCall functionCall<span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#123;</span>
    TupleEntry inputTuple <span style="color: #339933;">=</span> functionCall.<span style="color: #006633;">getArguments</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// take values form inputTuple</span>
    <span style="color: #666666; font-style: italic;">// transform them to make outputTuple</span>
    Tuple outputTuple <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Tuple<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;a value 1&quot;</span>, <span style="color: #0000ff;">&quot;a value 2&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    functionCall.<span style="color: #006633;">getOutputCollector</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>outputTuple<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Here are the key things you need to do to create a Cascading operation:</p>
<ul>
<li>subclass BaseOperation and implement <tt>Function</tt> (there are other types of operations)</li>
<li>call <tt>super</tt> and declare the names of the Fields this operation will be emitting. See <a href="http://www.cascading.org/javadoc/cascading/operation/BaseOperation.html">BaseOperation</a> for details</li>
<li><tt>functionCall.getArguments()</tt> returns a <a href="http://www.cascading.org/javadoc/cascading/tuple/TupleEntry.html">TupleEntry</a> containing the input <tt>Tuple</tt> and input <tt>Fields</tt>.</li>
<li><tt>functionCall.getOutputCollector()</tt> is the <tt>OutputCollector</tt> you can use to emit <tt>Tuples</tt> from this operation.</li>
<li>call <tt>outputCollector.add()</tt> to emit a <tt>Tuple</tt>. You can emit 0..n <tt>Tuples</tt>.</li>
</ul>
<p>For our tf-idf operation we want to emit three Fields: <tt>tf</tt>, <tt>idf</tt>, and <tt>tfidf</tt>. We do this with the following code.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// note that we're using a private nested class. This is not required.</span>
<span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">class</span> TfIdfOperation <span style="color: #000000; font-weight: bold;">extends</span> BaseOperation <span style="color: #000000; font-weight: bold;">implements</span> Function <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">public</span> TfIdfOperation<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #000000; font-weight: bold;">super</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;tf&quot;</span>, <span style="color: #0000ff;">&quot;idf&quot;</span>, <span style="color: #0000ff;">&quot;tfidf&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span>
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> operate<span style="color: #009900;">&#40;</span>FlowProcess flowProcess, FunctionCall functionCall<span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#123;</span>
    TupleEntry arguments <span style="color: #339933;">=</span> functionCall.<span style="color: #006633;">getArguments</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// tf</span>
    <span style="color: #003399;">Double</span> termCount  <span style="color: #339933;">=</span> arguments.<span style="color: #006633;">getDouble</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term_count_in_document&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">Double</span> totalTerms <span style="color: #339933;">=</span> arguments.<span style="color: #006633;">getDouble</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;total_terms_in_document&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">BigDecimal</span> tf <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">BigDecimal</span><span style="color: #009900;">&#40;</span>termCount <span style="color: #339933;">/</span> totalTerms<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// idf</span>
    <span style="color: #003399;">Double</span> totalDocuments  <span style="color: #339933;">=</span> arguments.<span style="color: #006633;">getDouble</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;total_documents&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">Double</span> td <span style="color: #339933;">=</span> arguments.<span style="color: #006633;">getDouble</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;number_of_documents_containing_term&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">BigDecimal</span> idf <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">BigDecimal</span><span style="color: #009900;">&#40;</span><span style="color: #003399;">Math</span>.<span style="color: #006633;">log</span><span style="color: #009900;">&#40;</span>totalDocuments <span style="color: #339933;">/</span> <span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1</span> <span style="color: #339933;">+</span> td<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// tfidf</span>
    <span style="color: #003399;">BigDecimal</span> tfidf <span style="color: #339933;">=</span> tf.<span style="color: #006633;">multiply</span><span style="color: #009900;">&#40;</span>idf<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    functionCall.<span style="color: #006633;">getOutputCollector</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>
        <span style="color: #000000; font-weight: bold;">new</span> Tuple<span style="color: #009900;">&#40;</span> tf.<span style="color: #006633;">toPlainString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, idf.<span style="color: #006633;">toPlainString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, tfidf.<span style="color: #006633;">toPlainString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>When we convert, say, a <tt>Double</tt> to a string then we often get an exponent (e.g. <tt>0.02948E7</tt>). The exponent can be cumbersome to work with so we use <tt>BigDecimal</tt> to convert the number into a string without an exponent using <tt>toPlainString()</tt>.</p>
<h2>Building the Index</h2>
<p>Now that we have the <tt>term</tt>, <tt>document_id</tt>, and <tt>tfidf</tt> score we can build our index. First we strip out the unnecessary fields and move the <tt>term</tt> to the front.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// input:  (document_id, term, term_count_in_document, total_terms_in_document, number_of_documents_containing_term, total_documents, tf, idf, tfidf)</span>
<span style="color: #666666; font-style: italic;">// emit:   (term, document_id, tfidf)</span>
pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>pipe, <span style="color: #666666; font-style: italic;">// reorder and rm some fields</span>
    <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term&quot;</span>, <span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;tfidf&quot;</span><span style="color: #009900;">&#41;</span>,
    <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Identity</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>In our last step we want to build a single row that tells us &#8220;given a term, what documents are most relevant to it&#8221;. We format this list of <tt>(document_id, score)</tt> pairs as a JSON hash (<tt>JSONObject</tt>). (Formatting our records in this way is called using &#8220;stripes&#8221;.)</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> GroupBy<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// &quot;stripe&quot; our group e.g.:</span>
<span style="color: #666666; font-style: italic;">// input: (term,</span>
<span style="color: #666666; font-style: italic;">//          (document_id_1, tfidf_1),</span>
<span style="color: #666666; font-style: italic;">//          (document_id_2, tfidf_2),</span>
<span style="color: #666666; font-style: italic;">//          ...)</span>
<span style="color: #666666; font-style: italic;">// emit: (term, {document_id_1:tfidf_1, document_id_2:tfidf_2})</span>
pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Every<span style="color: #009900;">&#40;</span>pipe,
    <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;tfidf&quot;</span><span style="color: #009900;">&#41;</span>,
    <span style="color: #000000; font-weight: bold;">new</span> JSONTupleAggregator<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;scores&quot;</span><span style="color: #009900;">&#41;</span>, <span style="color: #0000ff;">&quot;JSONObject&quot;</span><span style="color: #009900;">&#41;</span>,
    <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term&quot;</span>, <span style="color: #0000ff;">&quot;scores&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>JSONTupleAggregator is an operation that can be found in the <a href="http://github.com/jashmenn/cascading.json/blob/master/src/main/java/cascading/json/operation/aggregator/JSONTupleAggregator.java" class="external-link">cascading.json</a> project. It takes a group of tuples and emits them as either a <tt>JSONArray</tt> (nested list) or <tt>JSONObject</tt> (hash).</p>
<h2>Using the Index</h2>
<p>To use the index, simply perform a <tt>CoGroup</tt> on the term in your right-hand-side to the term in our index. See Cascading&#8217;s documentation on <a href="http://www.cascading.org/javadoc/cascading/pipe/CoGroup.html" class="external-link" >CoGroup</a> for more information.</p>
<h2>Full Code Listing</h2>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">com.xcombinator.cascading.pipes</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">com.xcombinator.cascading.operations.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">com.xcombinator.cascading.operations.buffers.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tuple.Fields</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.pipe.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.regex.RegexSplitter</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.Identity</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.text.DateParser</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.Insert</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.flow.FlowProcess</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tuple.TupleEntry</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.BaseOperation</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.Function</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.FunctionCall</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.pipe.Each</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.pipe.Every</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.pipe.GroupBy</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tuple.Tuple</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.Debug</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.json.operation.aggregator.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.math.BigDecimal</span><span style="color: #339933;">;</span>
<span style="color: #008000; font-style: italic; font-weight: bold;">/**
 *
 * The goal of this SubAssembly is to create an index that can be used to find
 * the most relevant document given a term.
 * Required: you need to input the number of total documents.
 * Input: list of (document_id, term, count). e.g. :
 *
 *   (document_id,  term,  2)
 *   (document_id,  term2, 1)
 *   (document_id2, term2, 3)
 *   # etc
 *
 * where `count` is the number of times that term appears in that document
 *
 *
 * Emits: list of (term: {document_id_1:score_1,document_id_2,score_2...})
 *
 * Note that the tuple emitted is all String representation of the decimal
 * numbers. This is to allow easy (and correct) parsing if you write to a file
 * after this.
 *
 * It is assumed you've already done any normalization of the terms such as stemming etc.
 */</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> TfIdfIndexSubAssmbly <span style="color: #000000; font-weight: bold;">extends</span> SubAssembly <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">class</span> TfIdfOperation <span style="color: #000000; font-weight: bold;">extends</span> BaseOperation <span style="color: #000000; font-weight: bold;">implements</span> Function <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">public</span> TfIdfOperation<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #000000; font-weight: bold;">super</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;tf&quot;</span>, <span style="color: #0000ff;">&quot;idf&quot;</span>, <span style="color: #0000ff;">&quot;tfidf&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span>
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> operate<span style="color: #009900;">&#40;</span>FlowProcess flowProcess, FunctionCall functionCall<span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
      TupleEntry arguments <span style="color: #339933;">=</span> functionCall.<span style="color: #006633;">getArguments</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #666666; font-style: italic;">// tf</span>
      <span style="color: #003399;">Double</span> termCount  <span style="color: #339933;">=</span> arguments.<span style="color: #006633;">getDouble</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term_count_in_document&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #003399;">Double</span> totalTerms <span style="color: #339933;">=</span> arguments.<span style="color: #006633;">getDouble</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;total_terms_in_document&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #003399;">BigDecimal</span> tf <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">BigDecimal</span><span style="color: #009900;">&#40;</span>termCount <span style="color: #339933;">/</span> totalTerms<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #666666; font-style: italic;">// idf</span>
      <span style="color: #003399;">Double</span> totalDocuments  <span style="color: #339933;">=</span> arguments.<span style="color: #006633;">getDouble</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;total_documents&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #003399;">Double</span> td <span style="color: #339933;">=</span> arguments.<span style="color: #006633;">getDouble</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;number_of_documents_containing_term&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #003399;">BigDecimal</span> idf <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">BigDecimal</span><span style="color: #009900;">&#40;</span><span style="color: #003399;">Math</span>.<span style="color: #006633;">log</span><span style="color: #009900;">&#40;</span>totalDocuments <span style="color: #339933;">/</span> <span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1</span> <span style="color: #339933;">+</span> td<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #666666; font-style: italic;">// tfidf</span>
      <span style="color: #003399;">BigDecimal</span> tfidf <span style="color: #339933;">=</span> tf.<span style="color: #006633;">multiply</span><span style="color: #009900;">&#40;</span>idf<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      functionCall.<span style="color: #006633;">getOutputCollector</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>
          <span style="color: #000000; font-weight: bold;">new</span> Tuple<span style="color: #009900;">&#40;</span> tf.<span style="color: #006633;">toPlainString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, idf.<span style="color: #006633;">toPlainString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, tfidf.<span style="color: #006633;">toPlainString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span>
  <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #003399;">Integer</span> totalNumberOfDocuments<span style="color: #339933;">;</span>
  <span style="color: #000000; font-weight: bold;">public</span> TfIdfIndexSubAssmbly<span style="color: #009900;">&#40;</span>Pipe pipe, <span style="color: #003399;">Integer</span> totalNumberOfDocuments<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">totalNumberOfDocuments</span> <span style="color: #339933;">=</span> totalNumberOfDocuments<span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// calculate the total terms in each document. note that the input set is</span>
    <span style="color: #666666; font-style: italic;">// smaller because we've already counted the occurrence of each term in</span>
    <span style="color: #666666; font-style: italic;">// each document</span>
    <span style="color: #666666; font-style: italic;">// input: (document_id, term, term_count_in_document)</span>
    <span style="color: #666666; font-style: italic;">// emits: (document_id, term, term_count_in_document, total_terms_in_document)</span>
    <span style="color: #666666; font-style: italic;">// pipe = new Each(pipe, new Debug(true));</span>
    pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> GroupBy<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Every<span style="color: #009900;">&#40;</span>pipe,
        <span style="color: #000000; font-weight: bold;">new</span> BufferedSum<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;total_terms_in_document&quot;</span><span style="color: #009900;">&#41;</span>,
                       <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term_count_in_document&quot;</span><span style="color: #009900;">&#41;</span>,
                       <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;term&quot;</span>, <span style="color: #0000ff;">&quot;term_count_in_document&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>,
        Fields.<span style="color: #006633;">SWAP</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// calculate the number of documents containing each term</span>
    <span style="color: #666666; font-style: italic;">// input: (document_id, term, term_count_in_document, total_terms_in_document)</span>
    <span style="color: #666666; font-style: italic;">// emit:  (document_id, term, term_count_in_document, total_terms_in_document, number_of_documents_containing_term)</span>
    pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> Insert<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term_in_doc&quot;</span><span style="color: #009900;">&#41;</span>, <span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span>, Fields.<span style="color: #006633;">ALL</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// we're going to sum these, easier than creating BufferedCount</span>
    pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> GroupBy<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Every<span style="color: #009900;">&#40;</span>pipe,
        <span style="color: #000000; font-weight: bold;">new</span> BufferedSum<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;number_of_documents_containing_term&quot;</span><span style="color: #009900;">&#41;</span>,
                       <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term_in_doc&quot;</span><span style="color: #009900;">&#41;</span>,
                       <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;term&quot;</span>, <span style="color: #0000ff;">&quot;term_count_in_document&quot;</span>, <span style="color: #0000ff;">&quot;total_terms_in_document&quot;</span>, <span style="color: #0000ff;">&quot;term_in_doc&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>,
        Fields.<span style="color: #006633;">SWAP</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>pipe, <span style="color: #666666; font-style: italic;">//  reorder and rm some fields</span>
        <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;term&quot;</span>, <span style="color: #0000ff;">&quot;term_count_in_document&quot;</span>, <span style="color: #0000ff;">&quot;total_terms_in_document&quot;</span>, <span style="color: #0000ff;">&quot;number_of_documents_containing_term&quot;</span><span style="color: #009900;">&#41;</span>,
        <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Identity</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// here we do a hard-insert of the number of documents. again, you</span>
    <span style="color: #666666; font-style: italic;">// could/should calculate this with MR, but this is not the place. It</span>
    <span style="color: #666666; font-style: italic;">// would be better to calculate the number of documents when you are</span>
    <span style="color: #666666; font-style: italic;">// generating the input documents file.</span>
    <span style="color: #000066; font-weight: bold;">int</span> D <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">totalNumberOfDocuments</span><span style="color: #339933;">;</span>
    pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> Insert<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;total_documents&quot;</span><span style="color: #009900;">&#41;</span>, D<span style="color: #009900;">&#41;</span>, Fields.<span style="color: #006633;">ALL</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// now calculate tf * idf</span>
    <span style="color: #666666; font-style: italic;">// input:  (document_id, term, term_count_in_document, total_terms_in_document, number_of_documents_containing_term, total_documents)</span>
    <span style="color: #666666; font-style: italic;">// emit:   (document_id, term, term_count_in_document, total_terms_in_document, number_of_documents_containing_term, total_documents, tf, idf, tfidf)</span>
    pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> TfIdfOperation<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, Fields.<span style="color: #006633;">ALL</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>pipe, <span style="color: #666666; font-style: italic;">// reorder and rm some fields</span>
        <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term&quot;</span>, <span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;tfidf&quot;</span><span style="color: #009900;">&#41;</span>,
        <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Identity</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> GroupBy<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// &quot;stripe&quot; our group e.g.:</span>
    <span style="color: #666666; font-style: italic;">// input: (term,</span>
    <span style="color: #666666; font-style: italic;">//          (document_id_1, tfidf_1),</span>
    <span style="color: #666666; font-style: italic;">//          (document_id_2, tfidf_2),</span>
    <span style="color: #666666; font-style: italic;">//          ...)</span>
    <span style="color: #666666; font-style: italic;">// emit: (term, {document_id_1:tfidf_1, document_id_2:tfidf_2})</span>
    pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Every<span style="color: #009900;">&#40;</span>pipe,
        <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;tfidf&quot;</span><span style="color: #009900;">&#41;</span>,
        <span style="color: #000000; font-weight: bold;">new</span> JSONTupleAggregator<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;scores&quot;</span><span style="color: #009900;">&#41;</span>, <span style="color: #0000ff;">&quot;JSONObject&quot;</span><span style="color: #009900;">&#41;</span>,
        <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term&quot;</span>, <span style="color: #0000ff;">&quot;scores&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// must register all assembly tails</span>
    setTails<span style="color: #009900;">&#40;</span>pipe<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F05%2F14%2Fcascading-tf-idf-and-bufferedsum-part-2%2F&amp;title=Cascading%2C%20TF-IDF%2C%20and%20BufferedSum%20%28Part%202%29&amp;notes=Introduction%0D%0A%0D%0AThe%20tf-idf%20weight%20%28term%20frequency-inverse%20document%20frequency%29%20is%20a%20weight%20often%20used%20in%20information%20retrieval%20and%20text%20mining.%20This%20weight%20is%20a%20statistical%20measure%20used%20to%20evaluate%20how%20important%20a%20word%20is%20to%20a%20document%20in%20a%20collection" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F05%2F14%2Fcascading-tf-idf-and-bufferedsum-part-2%2F&amp;title=Cascading%2C%20TF-IDF%2C%20and%20BufferedSum%20%28Part%202%29" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F05%2F14%2Fcascading-tf-idf-and-bufferedsum-part-2%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=Cascading%2C%20TF-IDF%2C%20and%20BufferedSum%20%28Part%202%29%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2010%2F05%2F14%2Fcascading-tf-idf-and-bufferedsum-part-2%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F05%2F14%2Fcascading-tf-idf-and-bufferedsum-part-2%2F&amp;t=Cascading%2C%20TF-IDF%2C%20and%20BufferedSum%20%28Part%202%29" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F05%2F14%2Fcascading-tf-idf-and-bufferedsum-part-2%2F&amp;title=Cascading%2C%20TF-IDF%2C%20and%20BufferedSum%20%28Part%202%29&amp;annotation=Introduction%0D%0A%0D%0AThe%20tf-idf%20weight%20%28term%20frequency-inverse%20document%20frequency%29%20is%20a%20weight%20often%20used%20in%20information%20retrieval%20and%20text%20mining.%20This%20weight%20is%20a%20statistical%20measure%20used%20to%20evaluate%20how%20important%20a%20word%20is%20to%20a%20document%20in%20a%20collection" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F05%2F14%2Fcascading-tf-idf-and-bufferedsum-part-2%2F&amp;t=Cascading%2C%20TF-IDF%2C%20and%20BufferedSum%20%28Part%202%29" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2010%2F05%2F14%2Fcascading-tf-idf-and-bufferedsum-part-2%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2010/05/14/cascading-tf-idf-and-bufferedsum-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2010/05/14/cascading-tf-idf-and-bufferedsum-part-2/</feedburner:origLink></item>
		<item>
		<title>Cascading, TF-IDF, and BufferedSum (Part 1)</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/Rm6ltLmDFMM/</link>
		<comments>http://www.xcombinator.com/2009/12/18/cascading-tf-idf-and-bufferedsum-part-1/#comments</comments>
		<pubDate>Fri, 18 Dec 2009 18:08:17 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[cascading]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/?p=188</guid>
		<description><![CDATA[Introduction
A common technique in MapReduce is to input a group of records, calculate a value from that group, and emit each record with the new value attached. While this is easy to do in raw MR jobs, the solution in Cascading is not very obvious. This tutorial introduces a new operation to Cascading called BufferedSum. [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>A common technique in MapReduce is to input a group of records, calculate a value from that group, and emit each record with the new value attached. While this is easy to do in raw MR jobs, the solution in Cascading is not very obvious. This tutorial introduces a new operation to Cascading called <code>BufferedSum</code>. <code>BufferedSum</code> allows us to calculate values from a group of tuples and emit the group value to individual tuples in a scalable way.</p>
<p>Describing the operation of <code>BufferedSum</code> is clearer when discussed in concrete terms, so let&#8217;s work with an example.</p>
<h2>Example</h2>
<p>When dealing with large amounts of documents in Hadoop, its common to have each input file to contain many documents. Our input file in this case will contain two documents:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">a.<span style="color: #006633;">txt</span>\thello world world
b.<span style="color: #006633;">txt</span>\tgoodbye goodbye world</pre></div></div>

<p>Lets say we want to calculate <a href="http://en.wikipedia.org/wiki/Tf‚Äìidf">tf-idf</a> for these documents.  One of the first values we need is the count of the occurrence particular term within each document. </p>
<p>First, we will split each line into <code>(document_id, body)</code> pairs:  </p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">  pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>pipe, 
      <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;line&quot;</span><span style="color: #009900;">&#41;</span>, 
      <span style="color: #000000; font-weight: bold;">new</span> RegexSplitter<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;body&quot;</span><span style="color: #009900;">&#41;</span>, <span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\t</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>From there we &#8220;tokenize&#8221; the document and extract each term:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">  pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>pipe, <span style="color: #666666; font-style: italic;">// tokenize words by space</span>
      <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;body&quot;</span><span style="color: #009900;">&#41;</span>,
      <span style="color: #000000; font-weight: bold;">new</span> RegexSplitGenerator<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term&quot;</span><span style="color: #009900;">&#41;</span>, <span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\\</span>s+&quot;</span><span style="color: #009900;">&#41;</span>, 
      <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;term&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Now our tuple stream is the following:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">a.<span style="color: #006633;">txt</span> hello
a.<span style="color: #006633;">txt</span> world
a.<span style="color: #006633;">txt</span> world
b.<span style="color: #006633;">txt</span> goodbye
b.<span style="color: #006633;">txt</span> goodbye
b.<span style="color: #006633;">txt</span> world</pre></div></div>

<h2>Count of <code>term</code> in <code>document_id</code></h2>
<p>We now have <code>(document_id, term)</code> and we want to calculate <code>(document_id, term, term_count_in_document)</code>. With Cascading, this is easy, simply group by <code>document_id</code> and <code>term</code> and use the <code>Count()</code> function:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">  <span style="color: #666666; font-style: italic;">// count how many times `term` appears in `document_id`</span>
  pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> GroupBy<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;term&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Every<span style="color: #009900;">&#40;</span>pipe, 
      <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term&quot;</span><span style="color: #009900;">&#41;</span>, 
      <span style="color: #000000; font-weight: bold;">new</span> Count<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term_count_in_document&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>, 
      <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;term&quot;</span>, <span style="color: #0000ff;">&quot;term_count_in_document&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<h2>Calculating <code>total_terms_in_document</code></h2>
<p>So far, so good. Up to this point Cascading has provided everything we need.  However, next we want to get the total terms within each document and keep the tuples we have calculated thus far. Put another way, we have an input of <code>(document_id, term, term_count_in_document)</code> and we want to emit <code>(document_id, term, term_count_in_document, total_terms_in_document)</code> </p>
<p>Our first instinct might be to use <code>GroupBy()</code> and <code>Count()</code> like before. But there is a catch: <code>Every</code> operations emit the operator result with the <em>group tuple</em> (see the <a href="http://www.cascading.org/userguide/html/ch03s02.html#N20228">Each and Every Pipes</a> in the Cascading User Guide). </p>
<p>This means if we group by <code>document_id</code> and <code>Sum()</code> the <code>total_terms_in_document</code> we will emit <code>(document_id, total_terms_in_document)</code>.  The number in <code>total_terms_in_document</code> will be accurate, but we lose our <code>term</code> and <code>term_count_in_document</code>. </p>
<p>If we try to save our other fields by grouping on all three of them <code>(document_id, term, term_count_in_document)</code> then we&#8217;ve &#8220;over-grouped&#8221; and every &#8220;group&#8221; is a single tuple (the input tuple) and we won&#8217;t get the count of terms in the document as a whole. <code>BufferedSum</code> was created to solve this problem. </p>
<h2><code>BufferedSum</code></h2>
<p><code>BufferedSum</code> takes as its input three things:</p>
<ul>
<li>The name of the <code>Field</code> to output</li>
<li>The name of the <code>Field</code> to sum</li>
<li>The other <code>Fields</code> to &#8220;pull through&#8221; the operation</li>
</ul>
<p>Here is how we can use <code>BufferedSum</code> to achieve the desired effect:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// input: (document_id, term, term_count_in_document)</span>
<span style="color: #666666; font-style: italic;">// emits: (document_id, term, term_count_in_document, total_terms_in_document) </span>
pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> GroupBy<span style="color: #009900;">&#40;</span>pipe, <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
pipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Every<span style="color: #009900;">&#40;</span>pipe, 
    <span style="color: #000000; font-weight: bold;">new</span> BufferedSum<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;total_terms_in_document&quot;</span><span style="color: #009900;">&#41;</span>, 
                    <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;term_count_in_document&quot;</span><span style="color: #009900;">&#41;</span>,
                    <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;document_id&quot;</span>, <span style="color: #0000ff;">&quot;term&quot;</span>, <span style="color: #0000ff;">&quot;term_count_in_document&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>, 
    Fields.<span style="color: #006633;">SWAP</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<blockquote>
<p>Note: the output selector <code>Fields.SWAP</code> is critical due to Cascading tuple selection.</p>
</blockquote>
<h2>Memory considerations</h2>
<p>One thing to be careful of when using <code>BufferedSum</code> is to try and keep your groups small enough to fit in memory. However, this is not a requirement.  <code>BufferedSum</code> uses Cascading&#8217;s <code>SpillableTupleList</code> which will spill to the HDFS if it grows too large. That said, spilling is an expensive operation and should be avoided if possible.</p>
<h2>Summary</h2>
<p><code>BufferedSum</code> is a widely useful operation when dealing with sums in Cascading.  In Part 2 we will use <code>BufferedSum</code> and Cascading to finish calculating tf-idf.</p>
<h2>The Code</h2>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">com.xcombinator.cascading.operations.buffers</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.flow.FlowProcess</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.BaseOperation</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.Buffer</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.BufferCall</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tuple.Fields</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tuple.Tuple</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tuple.TupleEntry</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tuple.SpillableTupleList</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.Iterator</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #008000; font-style: italic; font-weight: bold;">/**
 * BufferedSum sums a value for every Tuple in a Group and emits every input
 * Tuple with the sum appended.
 * &lt;p/&gt;
 * 
 * 
 * EXAMPLE:
 *
 * {@code 
 *
 * // input: (document_id, term, term_count_in_document)
 * // emits: (document_id, term, term_count_in_document, total_terms_in_document) 
 *
 *     pipe = new GroupBy(pipe, new Fields(&quot;document_id&quot;));
 *     pipe = new Every(pipe, 
 *         new BufferedSum(new Fields(&quot;total_terms_in_document&quot;), 
 *                        new Fields(&quot;term_count_in_document&quot;),
 *                        new Fields(&quot;document_id&quot;, &quot;term&quot;, &quot;term_count_in_document&quot;)), 
 *         Fields.SWAP);
 * }
 *
 * @see BufferedSum
 * 
 */</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> BufferedSum <span style="color: #000000; font-weight: bold;">extends</span> BaseOperation <span style="color: #000000; font-weight: bold;">implements</span> Buffer
  <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #003399;">Double</span> sum<span style="color: #339933;">;</span>
  <span style="color: #000000; font-weight: bold;">private</span> SpillableTupleList list<span style="color: #339933;">;</span>
  <span style="color: #000000; font-weight: bold;">private</span> Fields extrasSelector<span style="color: #339933;">;</span>
  <span style="color: #000000; font-weight: bold;">private</span> Fields fieldToSum<span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #008000; font-style: italic; font-weight: bold;">/**
   * Returns a BufferedSum Buffer Operation. 
   *
   * @param emittedSumFieldName a {@link Fields} naming the field to emit the sum value
   * @param fieldToSum          a {@link Fields} naming the field to sum
   * @param extrasSelector      a {@link Fields} naming the other fields to &quot;pull through&quot;. These fields *must* be of the same order and size as the input Tuple
   */</span>
  <span style="color: #000000; font-weight: bold;">public</span> BufferedSum<span style="color: #009900;">&#40;</span> Fields emittedSumFieldName, Fields fieldToSum, Fields extrasSelector <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">super</span><span style="color: #009900;">&#40;</span> extrasSelector.<span style="color: #006633;">append</span><span style="color: #009900;">&#40;</span> emittedSumFieldName <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">extrasSelector</span> <span style="color: #339933;">=</span> extrasSelector<span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">fieldToSum</span> <span style="color: #339933;">=</span> fieldToSum<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> operate<span style="color: #009900;">&#40;</span> FlowProcess flowProcess, BufferCall bufferCall <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
    Iterator<span style="color: #339933;">&lt;</span>TupleEntry<span style="color: #339933;">&gt;</span> iterator <span style="color: #339933;">=</span> bufferCall.<span style="color: #006633;">getArgumentsIterator</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    sum <span style="color: #339933;">=</span> 0.0D<span style="color: #339933;">;</span>
    list <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> SpillableTupleList<span style="color: #009900;">&#40;</span> <span style="color: #cc66cc;">10000</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">while</span><span style="color: #009900;">&#40;</span> iterator.<span style="color: #006633;">hasNext</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span>
      <span style="color: #009900;">&#123;</span>
      TupleEntry arguments <span style="color: #339933;">=</span> iterator.<span style="color: #006633;">next</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// must be called</span>
      sum <span style="color: #339933;">+=</span> arguments.<span style="color: #006633;">getDouble</span><span style="color: #009900;">&#40;</span> <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">fieldToSum</span>.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      list.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span> arguments.<span style="color: #006633;">getTuple</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span> Tuple tuple <span style="color: #339933;">:</span> list <span style="color: #009900;">&#41;</span>
      <span style="color: #009900;">&#123;</span>
      bufferCall.<span style="color: #006633;">getOutputCollector</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span> tuple.<span style="color: #006633;">append</span><span style="color: #009900;">&#40;</span> <span style="color: #000000; font-weight: bold;">new</span> Tuple<span style="color: #009900;">&#40;</span> sum <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span></pre></div></div>

<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F12%2F18%2Fcascading-tf-idf-and-bufferedsum-part-1%2F&amp;title=Cascading%2C%20TF-IDF%2C%20and%20BufferedSum%20%28Part%201%29&amp;notes=Introduction%0D%0A%0D%0AA%20common%20technique%20in%20MapReduce%20is%20to%20input%20a%20group%20of%20records%2C%20calculate%20a%20value%20from%20that%20group%2C%20and%20emit%20each%20record%20with%20the%20new%20value%20attached.%20While%20this%20is%20easy%20to%20do%20in%20raw%20MR%20jobs%2C%20the%20solution%20in%20Cascading%20is%20not%20very%20obviou" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F12%2F18%2Fcascading-tf-idf-and-bufferedsum-part-1%2F&amp;title=Cascading%2C%20TF-IDF%2C%20and%20BufferedSum%20%28Part%201%29" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F12%2F18%2Fcascading-tf-idf-and-bufferedsum-part-1%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=Cascading%2C%20TF-IDF%2C%20and%20BufferedSum%20%28Part%201%29%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F12%2F18%2Fcascading-tf-idf-and-bufferedsum-part-1%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F12%2F18%2Fcascading-tf-idf-and-bufferedsum-part-1%2F&amp;t=Cascading%2C%20TF-IDF%2C%20and%20BufferedSum%20%28Part%201%29" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F12%2F18%2Fcascading-tf-idf-and-bufferedsum-part-1%2F&amp;title=Cascading%2C%20TF-IDF%2C%20and%20BufferedSum%20%28Part%201%29&amp;annotation=Introduction%0D%0A%0D%0AA%20common%20technique%20in%20MapReduce%20is%20to%20input%20a%20group%20of%20records%2C%20calculate%20a%20value%20from%20that%20group%2C%20and%20emit%20each%20record%20with%20the%20new%20value%20attached.%20While%20this%20is%20easy%20to%20do%20in%20raw%20MR%20jobs%2C%20the%20solution%20in%20Cascading%20is%20not%20very%20obviou" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F12%2F18%2Fcascading-tf-idf-and-bufferedsum-part-1%2F&amp;t=Cascading%2C%20TF-IDF%2C%20and%20BufferedSum%20%28Part%201%29" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F12%2F18%2Fcascading-tf-idf-and-bufferedsum-part-1%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/12/18/cascading-tf-idf-and-bufferedsum-part-1/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/12/18/cascading-tf-idf-and-bufferedsum-part-1/</feedburner:origLink></item>
		<item>
		<title>How to use Cascading with Hadoop Streaming</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/3NyHyyvudHw/</link>
		<comments>http://www.xcombinator.com/2009/11/18/how-to-use-cascading-with-hadoop-streaming/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 19:45:46 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/?p=152</guid>
		<description><![CDATA[Last time we talked about how to use a raw MapReduce job in Cascading. Now we are going to up the ante by using Hadoop Streaming as a Flow in Cascading. In this example, we hook a python streaming job into a Cascade.
Its pretty easy once you know how to do it: 

Create a JobConf [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.xcombinator.com/2009/11/11/how-to-use-a-raw-mapreduce-job-in-cascading/">Last time</a> we talked about how to use a raw MapReduce job in Cascading. Now we are going to up the ante by using Hadoop Streaming as a Flow in Cascading. In this example, we hook a python streaming job into a Cascade.</p>
<p>Its pretty easy once you know how to do it: </p>
<ul>
<li>Create a JobConf that defines the parameters for the streaming job</li>
<li>Send up the <code>hadoop-*-streaming.jar</code> with your cascading job by putting it in your <code>jar</code></li>
<li>Send up the scripts (python, in this case) by using the <code>-file</code> option</li>
<li>Send up any other dependencies, corpora, etc. by using the <code>-file</code>, <code>-cacheFile</code>, or <code>-cacheArchive</code> options (See the <a href="http://hadoop.apache.org/common/docs/r0.20.0/streaming.html">Hadoop Streaming</a> page for more details)</li>
</ul>
<h2>Resources</h2>
<h3>NLTK</h3>
<p>To generate the <code>nltkandyaml.mod</code> zip file do the following:</p>
<pre><code># download nltk and unzip
cd nltk
zip -r nltkandyaml.zip nltk yaml
mv nltkandyaml.zip nltkandyaml.mod
</code></pre>
<p>Note that this technique is taken from <a href="http://www.cloudera.com/node/48">Cloudera</a></p>
<h3>WordNet</h3>
<p>The WordNet zip file needs to be flat. e.g. don&#8217;t zip up the files with a subdirectory. You could create this file like so:</p>
<pre><code># download and unzip the wordnet corpus
cd wordnet
zip -r ../wordnet-flat.zip *
</code></pre>
<h2>Streaming Script</h2>
<p>In python, we&#8217;ll be using <code>zipimport.zipimporter</code> to import the <code>nltk</code> libraries from a zip file. In Hadoop 0.20.0, Hadoop didn&#8217;t decompress our <code>wordnet-flat.zip</code> file automatically (but we&#8217;ve heard reports that it will, but I&#8217;m not sure which versions). For us the <code>.zip</code> file was placed in <code>lib</code> relative to the <code>pwd</code> of the script.  This allowed us to keep the WordNet corpus as a zip and read it in that format.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">wn = WordNetCorpusReader<span style="color: black;">&#40;</span>nltk.<span style="color: black;">data</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'lib/wordnet-flat.zip'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>(In this code we&#8217;re not using the python reducer.)</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/env python </span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">os</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">re</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">zipimport</span>
&nbsp;
importer = <span style="color: #dc143c;">zipimport</span>.<span style="color: black;">zipimporter</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'nltkandyaml.mod'</span><span style="color: black;">&#41;</span>
yaml = importer.<span style="color: black;">load_module</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'yaml'</span><span style="color: black;">&#41;</span>
nltk = importer.<span style="color: black;">load_module</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'nltk'</span><span style="color: black;">&#41;</span>
punct = <span style="color: #dc143c;">re</span>.<span style="color: #008000;">compile</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'[^<span style="color: #000099; font-weight: bold;">\w</span><span style="color: #000099; font-weight: bold;">\s</span>]+'</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">from</span> nltk.<span style="color: black;">corpus</span>.<span style="color: black;">reader</span> <span style="color: #ff7700;font-weight:bold;">import</span> wordnet
<span style="color: #ff7700;font-weight:bold;">from</span> nltk.<span style="color: black;">corpus</span>.<span style="color: black;">reader</span> <span style="color: #ff7700;font-weight:bold;">import</span> WordNetCorpusReader
&nbsp;
nltk.<span style="color: black;">data</span>.<span style="color: black;">path</span> += <span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;.&quot;</span><span style="color: black;">&#93;</span>
wn = WordNetCorpusReader<span style="color: black;">&#40;</span>nltk.<span style="color: black;">data</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'lib/wordnet-flat.zip'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> mapper<span style="color: black;">&#40;</span>args<span style="color: black;">&#41;</span>:
  line = <span style="color: #dc143c;">sys</span>.<span style="color: black;">stdin</span>.<span style="color: #dc143c;">readline</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: #66cc66;">;</span>
  <span style="color: #ff7700;font-weight:bold;">try</span>:
    <span style="color: #ff7700;font-weight:bold;">while</span> line:
      line = line.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
      word = line
      all_synonyms = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
&nbsp;
      string_synsets = wn.<span style="color: black;">synsets</span><span style="color: black;">&#40;</span>word<span style="color: black;">&#41;</span>
&nbsp;
      <span style="color: #ff7700;font-weight:bold;">for</span> synset <span style="color: #ff7700;font-weight:bold;">in</span> string_synsets:
        synonyms = <span style="color: black;">&#91;</span>lemma.<span style="color: black;">name</span> <span style="color: #ff7700;font-weight:bold;">for</span> lemma <span style="color: #ff7700;font-weight:bold;">in</span> wn.<span style="color: black;">synset</span><span style="color: black;">&#40;</span>synset.<span style="color: black;">name</span><span style="color: black;">&#41;</span>.<span style="color: black;">lemmas</span><span style="color: black;">&#93;</span>
        synonyms.<span style="color: black;">pop</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> synonym <span style="color: #ff7700;font-weight:bold;">in</span> synonyms:
          synonym = <span style="color: #dc143c;">re</span>.<span style="color: black;">sub</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;_&quot;</span>, <span style="color: #483d8b;">&quot; &quot;</span>, synonym<span style="color: black;">&#41;</span>
          all_synonyms.<span style="color: black;">append</span><span style="color: black;">&#40;</span>synonym<span style="color: black;">&#41;</span> 
&nbsp;
      <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\t</span>&quot;</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>word, <span style="color: #483d8b;">','</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>all_synonyms<span style="color: black;">&#41;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
      line = <span style="color: #dc143c;">sys</span>.<span style="color: black;">stdin</span>.<span style="color: #dc143c;">readline</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
  <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #483d8b;">&quot;end of file&quot;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">None</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># we're not using this, but we could</span>
<span style="color: #ff7700;font-weight:bold;">def</span> reducer<span style="color: black;">&#40;</span>args<span style="color: black;">&#41;</span>:
  <span style="color: #ff7700;font-weight:bold;">for</span> line <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">sys</span>.<span style="color: black;">stdin</span>:
    line = line.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">print</span> line
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">&quot;__main__&quot;</span>:
  <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span> == <span style="color: #483d8b;">&quot;mapper&quot;</span>:
    mapper<span style="color: black;">&#40;</span><span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span>:<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
  <span style="color: #ff7700;font-weight:bold;">elif</span> <span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span> == <span style="color: #483d8b;">&quot;reducer&quot;</span>:
    reducer<span style="color: black;">&#40;</span><span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span>:<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span></pre></div></div>

<h2>Cascading Code</h2>
<p>Here&#8217;s the bulk of the code that will achieve the effect we want. Like last time, we&#8217;re using two intermediate taps as the input and output of the streaming job. Also, we&#8217;re just using TextLine files for simplicity.  If you don&#8217;t want the intermediate files hanging around, look at the comments towards the bottom for some example code on how to remove the files when the job is finished running. </p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">com.xcombinator.hadoopjobs.cascadingstreamingtest</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.cascade.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.flow.Flow</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.flow.FlowConnector</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.flow.MapReduceFlow</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.aggregator.Count</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.regex.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.pipe.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.scheme.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tap.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tuple.Fields</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.Identity</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.Properties</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.conf.Configuration</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.conf.Configured</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.io.LongWritable</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.io.Text</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.FileInputFormat</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.FileOutputFormat</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.JobConf</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.TextInputFormat</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.TextOutputFormat</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.lib.IdentityMapper</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.lib.IdentityReducer</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.util.Tool</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.util.ToolRunner</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.log4j.Logger</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.Debug</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.streaming.StreamJob</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.IOException</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #008000; font-style: italic; font-weight: bold;">/**
 * An example file to use a Hadoop Streaming job in cascading
 */</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> Main <span style="color: #000000; font-weight: bold;">extends</span> Configured <span style="color: #000000; font-weight: bold;">implements</span> Tool
  <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">final</span> Logger LOG <span style="color: #339933;">=</span> Logger.<span style="color: #006633;">getLogger</span><span style="color: #009900;">&#40;</span> Main.<span style="color: #000000; font-weight: bold;">class</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">int</span> run<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> args<span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#123;</span>
    JobConf conf <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> JobConf<span style="color: #009900;">&#40;</span>getConf<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">getClass</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">Properties</span> properties <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Properties</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    FlowConnector.<span style="color: #006633;">setApplicationJarClass</span><span style="color: #009900;">&#40;</span>properties, <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">getClass</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    CascadeConnector cascadeConnector <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> CascadeConnector<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    FlowConnector flowConnector <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> FlowConnector<span style="color: #009900;">&#40;</span>properties<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #003399;">String</span> inputPath  <span style="color: #339933;">=</span> args<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">String</span> outputPath <span style="color: #339933;">=</span> args<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">String</span> intermediatePath1 <span style="color: #339933;">=</span> args<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot;-mr-input&quot;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">String</span> intermediatePath2 <span style="color: #339933;">=</span> args<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot;-mr-output&quot;</span><span style="color: #339933;">;</span>
&nbsp;
    Scheme textLineScheme <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> TextLine<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    Tap sourceTap <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Hfs<span style="color: #009900;">&#40;</span>textLineScheme, inputPath<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    Tap intermediateTap1 <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Hfs<span style="color: #009900;">&#40;</span>textLineScheme, intermediatePath1<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    Tap intermediateTap2 <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Hfs<span style="color: #009900;">&#40;</span>textLineScheme, intermediatePath2<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    Tap sinkTap   <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Hfs<span style="color: #009900;">&#40;</span>textLineScheme, outputPath<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// create our first flow, sink to the intermediateTap</span>
    Pipe wsPipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;wordsplit&quot;</span>, 
        <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;line&quot;</span><span style="color: #009900;">&#41;</span>, 
        <span style="color: #000000; font-weight: bold;">new</span> RegexSplitGenerator<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;word&quot;</span><span style="color: #009900;">&#41;</span>, <span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\\</span>s+&quot;</span><span style="color: #009900;">&#41;</span>, 
        <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;word&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    Flow parsedLogFlow <span style="color: #339933;">=</span> flowConnector.<span style="color: #006633;">connect</span><span style="color: #009900;">&#40;</span>sourceTap, intermediateTap1, wsPipe<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Create a pipe and set our mr job for it </span>
    Pipe importPipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Pipe<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;mr pipe&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    Flow mrFlow<span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">try</span> <span style="color: #009900;">&#123;</span>
      JobConf streamConf <span style="color: #339933;">=</span> StreamJob.<span style="color: #006633;">createJob</span><span style="color: #009900;">&#40;</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#123;</span>
          <span style="color: #0000ff;">&quot;-input&quot;</span>, intermediateTap1.<span style="color: #006633;">getPath</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>.<span style="color: #006633;">toString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, 
          <span style="color: #0000ff;">&quot;-output&quot;</span>, intermediateTap2.<span style="color: #006633;">getPath</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>.<span style="color: #006633;">toString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>,
&nbsp;
          <span style="color: #666666; font-style: italic;">// straight unix</span>
          <span style="color: #666666; font-style: italic;">// &quot;-mapper&quot;, &quot;/bin/cat&quot;,</span>
          <span style="color: #666666; font-style: italic;">// &quot;-reducer&quot;, &quot;/usr/bin/wc&quot;</span>
&nbsp;
          <span style="color: #666666; font-style: italic;">// ruby</span>
          <span style="color: #666666; font-style: italic;">// &quot;-mapper&quot;, &quot;src/main/ruby/word_count_mapper.rb&quot;,</span>
          <span style="color: #666666; font-style: italic;">// &quot;-reducer&quot;, &quot;src/main/ruby/word_count_reducer.rb&quot;,</span>
          <span style="color: #666666; font-style: italic;">// &quot;-file&quot;, &quot;src/main/ruby/word_count_mapper.rb&quot;,</span>
          <span style="color: #666666; font-style: italic;">// &quot;-file&quot;, &quot;src/main/ruby/word_count_reducer.rb&quot;</span>
&nbsp;
          <span style="color: #666666; font-style: italic;">// python</span>
          <span style="color: #0000ff;">&quot;-mapper&quot;</span>, <span style="color: #0000ff;">&quot;python synsets.py mapper&quot;</span>,
          <span style="color: #0000ff;">&quot;-reducer&quot;</span>, <span style="color: #0000ff;">&quot;org.apache.hadoop.mapred.lib.IdentityReducer&quot;</span>,
          <span style="color: #0000ff;">&quot;-file&quot;</span>, <span style="color: #0000ff;">&quot;src/main/python/synsets.py&quot;</span>,
          <span style="color: #0000ff;">&quot;-file&quot;</span>, <span style="color: #0000ff;">&quot;resources/nltkandyaml.mod&quot;</span>,
          <span style="color: #0000ff;">&quot;-file&quot;</span>, <span style="color: #0000ff;">&quot;resources/lib/wordnet-flat.zip&quot;</span>,
          <span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      mrFlow <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> MapReduceFlow<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;streaming flow&quot;</span>, streamConf, intermediateTap1,
        intermediateTap2, <span style="color: #000066; font-weight: bold;">false</span>, <span style="color: #000066; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span> <span style="color: #000000; font-weight: bold;">catch</span><span style="color: #009900;">&#40;</span><span style="color: #003399;">IOException</span> ioe<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
       ioe.<span style="color: #006633;">printStackTrace</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
       <span style="color: #003399;">System</span>.<span style="color: #006633;">exit</span><span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
       <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #cc66cc;">1</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// create our third &quot;regular&quot; cascading pipe. this is a bit contrived, but</span>
    <span style="color: #666666; font-style: italic;">// the idea is substitute all 'e's with 'x's. it's just here to show how to</span>
    <span style="color: #666666; font-style: italic;">// take the input of a streaming job back into cascading</span>
    Pipe subPipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Pipe<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;subber&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    subPipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>subPipe,
        <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;line&quot;</span><span style="color: #009900;">&#41;</span>,
        <span style="color: #000000; font-weight: bold;">new</span> RegexReplace<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;linx&quot;</span><span style="color: #009900;">&#41;</span>, <span style="color: #0000ff;">&quot;e&quot;</span>, <span style="color: #0000ff;">&quot;x&quot;</span>, <span style="color: #000066; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span>,
        <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;linx&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    Flow subFlow <span style="color: #339933;">=</span> flowConnector.<span style="color: #006633;">connect</span><span style="color: #009900;">&#40;</span>intermediateTap2, sinkTap, subPipe<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    Cascade cascade <span style="color: #339933;">=</span> cascadeConnector.<span style="color: #006633;">connect</span><span style="color: #009900;">&#40;</span>parsedLogFlow, mrFlow, subFlow<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    cascade.<span style="color: #006633;">complete</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// to get rid of the intermediate files you could do this:</span>
    <span style="color: #666666; font-style: italic;">// Path tmp = tap.getPath();</span>
    <span style="color: #666666; font-style: italic;">// FileSystem fs = tmp.getFileSystem(conf);</span>
    <span style="color: #666666; font-style: italic;">// fs.delete(tmp, true);</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> main<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> args<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">Exception</span> 
  <span style="color: #009900;">&#123;</span>
    <span style="color: #000066; font-weight: bold;">int</span> res <span style="color: #339933;">=</span> ToolRunner.<span style="color: #006633;">run</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Configuration<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #000000; font-weight: bold;">new</span> Main<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, args<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">System</span>.<span style="color: #006633;">exit</span><span style="color: #009900;">&#40;</span>res<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #009900;">&#125;</span></pre></div></div>

<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F18%2Fhow-to-use-cascading-with-hadoop-streaming%2F&amp;title=How%20to%20use%20Cascading%20with%20Hadoop%20Streaming&amp;notes=Last%20time%20we%20talked%20about%20how%20to%20use%20a%20raw%20MapReduce%20job%20in%20Cascading.%20Now%20we%20are%20going%20to%20up%20the%20ante%20by%20using%20Hadoop%20Streaming%20as%20a%20Flow%20in%20Cascading.%20In%20this%20example%2C%20we%20hook%20a%20python%20streaming%20job%20into%20a%20Cascade.%0D%0A%0D%0AIts%20pretty%20easy%20once%20you%20know%20" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F18%2Fhow-to-use-cascading-with-hadoop-streaming%2F&amp;title=How%20to%20use%20Cascading%20with%20Hadoop%20Streaming" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F18%2Fhow-to-use-cascading-with-hadoop-streaming%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=How%20to%20use%20Cascading%20with%20Hadoop%20Streaming%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F18%2Fhow-to-use-cascading-with-hadoop-streaming%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F18%2Fhow-to-use-cascading-with-hadoop-streaming%2F&amp;t=How%20to%20use%20Cascading%20with%20Hadoop%20Streaming" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F18%2Fhow-to-use-cascading-with-hadoop-streaming%2F&amp;title=How%20to%20use%20Cascading%20with%20Hadoop%20Streaming&amp;annotation=Last%20time%20we%20talked%20about%20how%20to%20use%20a%20raw%20MapReduce%20job%20in%20Cascading.%20Now%20we%20are%20going%20to%20up%20the%20ante%20by%20using%20Hadoop%20Streaming%20as%20a%20Flow%20in%20Cascading.%20In%20this%20example%2C%20we%20hook%20a%20python%20streaming%20job%20into%20a%20Cascade.%0D%0A%0D%0AIts%20pretty%20easy%20once%20you%20know%20" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F18%2Fhow-to-use-cascading-with-hadoop-streaming%2F&amp;t=How%20to%20use%20Cascading%20with%20Hadoop%20Streaming" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F18%2Fhow-to-use-cascading-with-hadoop-streaming%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/11/18/how-to-use-cascading-with-hadoop-streaming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/11/18/how-to-use-cascading-with-hadoop-streaming/</feedburner:origLink></item>
		<item>
		<title>Interval – a ruby library for musical interval arithmetic</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/a2BRQO16lac/</link>
		<comments>http://www.xcombinator.com/2009/11/17/interval-a-ruby-library-for-musical-interval-arithmetic/#comments</comments>
		<pubDate>Tue, 17 Nov 2009 16:43:45 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[music]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/?p=144</guid>
		<description><![CDATA[Interval
interval is a tiny library that provides simple musical note pitch and interval arithmetic. It is intended to do one thing: given a pitch add (or subtract) an interval and give the resulting pitch.
Observe:

p = Interval::Pitch.from_string&#40;&#34;c&#34;&#41;
i = Interval::Interval.from_string&#40;&#34;M3&#34;&#41;
p2 = p + i
p2.to_short_name # =&#62; &#34;e&#34;
&#160;
i.to_s # =&#62; &#34;Major Third&#34;
&#160;
i2 = Interval::Interval.from_string&#40;&#34;p5&#34;&#41;
i2.to_s # =&#62; &#34;Perfect Fifth&#34;
&#160;
&#40;p2 [...]]]></description>
			<content:encoded><![CDATA[<h1>Interval</h1>
<p><code>interval</code> is a tiny library that provides simple musical note pitch and interval arithmetic. It is intended to do one thing: given a pitch add (or subtract) an interval and give the resulting pitch.</p>
<p>Observe:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#CC0066; font-weight:bold;">p</span> = <span style="color:#6666ff; font-weight:bold;">Interval::Pitch</span>.<span style="color:#9900CC;">from_string</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;c&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
i = <span style="color:#6666ff; font-weight:bold;">Interval::Interval</span>.<span style="color:#9900CC;">from_string</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;M3&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
p2 = <span style="color:#CC0066; font-weight:bold;">p</span> <span style="color:#006600; font-weight:bold;">+</span> i
p2.<span style="color:#9900CC;">to_short_name</span> <span style="color:#008000; font-style:italic;"># =&gt; &quot;e&quot;</span>
&nbsp;
i.<span style="color:#9900CC;">to_s</span> <span style="color:#008000; font-style:italic;"># =&gt; &quot;Major Third&quot;</span>
&nbsp;
i2 = <span style="color:#6666ff; font-weight:bold;">Interval::Interval</span>.<span style="color:#9900CC;">from_string</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;p5&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
i2.<span style="color:#9900CC;">to_s</span> <span style="color:#008000; font-style:italic;"># =&gt; &quot;Perfect Fifth&quot;</span>
&nbsp;
<span style="color:#006600; font-weight:bold;">&#40;</span>p2 <span style="color:#006600; font-weight:bold;">-</span> i2<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">to_s</span> <span style="color:#008000; font-style:italic;"># =&gt; &quot;a&quot;</span></pre></div></div>

<h1>Interval Quiz</h1>
<p><code>interval</code> was written primarily for learning intervals. <code>interval-quiz</code> is a gem that depends on <code>interval</code> that provides a command-line quiz. Here&#8217;s the output of an <code>interval-quiz</code> session:</p>
<pre><code>$ interval-quiz
Here are the intervals:
unison  p1        a1
second  m2 M2  d2 a2
third   m3 M3  d3 a3
fourth  p4     d4 a4
fifth   d5 p5  d5 a5
sixth   m6 M6  d6 a6
seventh m7 M7  d7 a7
octave  p8     d8
enter the intervals you want (or a blank line to quit):
M3
p5

["M3", "p5"]
1. above
2. below
3. both
do you want to be quizzed on intervals above, below, or both?  3
what is a major third below f# ? d
correct!
what is a major third above g# 1/1 (100%)? b#
correct!
what is a major third below b 2/2 (100%)? g
correct!
what is a perfect fifth below eb 3/3 (100%)? a
wrong. the answer is ab
what is a perfect fifth below c# 3/4 (75%)? d
wrong. the answer is f#
</code></pre>
<h1>Installing</h1>
<pre><code>gem install interval interval-quiz
</code></pre>
<h1>Source</h1>
<p><a href="http://github.com/jashmenn/interval">http://github.com/jashmenn/interval</a><br />
<a href="http://github.com/jashmenn/interval-quiz">http://github.com/jashmenn/interval-quiz</a></p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F17%2Finterval-a-ruby-library-for-musical-interval-arithmetic%2F&amp;title=Interval%20-%20a%20ruby%20library%20for%20musical%20interval%20arithmetic&amp;notes=Interval%0D%0A%0D%0Ainterval%20is%20a%20tiny%20library%20that%20provides%20simple%20musical%20note%20pitch%20and%20interval%20arithmetic.%20It%20is%20intended%20to%20do%20one%20thing%3A%20given%20a%20pitch%20add%20%28or%20subtract%29%20an%20interval%20and%20give%20the%20resulting%20pitch.%0D%0A%0D%0AObserve%3A%0D%0A%0D%0Ap%20%3D%20Interval%3A%3APitch.from_" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F17%2Finterval-a-ruby-library-for-musical-interval-arithmetic%2F&amp;title=Interval%20-%20a%20ruby%20library%20for%20musical%20interval%20arithmetic" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F17%2Finterval-a-ruby-library-for-musical-interval-arithmetic%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=Interval%20-%20a%20ruby%20library%20for%20musical%20interval%20arithmetic%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F17%2Finterval-a-ruby-library-for-musical-interval-arithmetic%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F17%2Finterval-a-ruby-library-for-musical-interval-arithmetic%2F&amp;t=Interval%20-%20a%20ruby%20library%20for%20musical%20interval%20arithmetic" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F17%2Finterval-a-ruby-library-for-musical-interval-arithmetic%2F&amp;title=Interval%20-%20a%20ruby%20library%20for%20musical%20interval%20arithmetic&amp;annotation=Interval%0D%0A%0D%0Ainterval%20is%20a%20tiny%20library%20that%20provides%20simple%20musical%20note%20pitch%20and%20interval%20arithmetic.%20It%20is%20intended%20to%20do%20one%20thing%3A%20given%20a%20pitch%20add%20%28or%20subtract%29%20an%20interval%20and%20give%20the%20resulting%20pitch.%0D%0A%0D%0AObserve%3A%0D%0A%0D%0Ap%20%3D%20Interval%3A%3APitch.from_" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F17%2Finterval-a-ruby-library-for-musical-interval-arithmetic%2F&amp;t=Interval%20-%20a%20ruby%20library%20for%20musical%20interval%20arithmetic" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F17%2Finterval-a-ruby-library-for-musical-interval-arithmetic%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/11/17/interval-a-ruby-library-for-musical-interval-arithmetic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/11/17/interval-a-ruby-library-for-musical-interval-arithmetic/</feedburner:origLink></item>
		<item>
		<title>Slides for “Introduction to Cascading” Presentation</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/p2xgRuMDDY4/</link>
		<comments>http://www.xcombinator.com/2009/11/13/slides-for-introduction-to-cascading-presentation/#comments</comments>
		<pubDate>Sat, 14 Nov 2009 00:57:39 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/?p=140</guid>
		<description><![CDATA[This week I gave an introductory presentation to Cascading. These are the slides from that presentation.
Intro To Cascading
View more documents from Nate Murray.

Share:
	
	
	
	
	
	
	
	
	

]]></description>
			<content:encoded><![CDATA[<p>This week I gave an introductory presentation to <a href="http://www.cascading.org">Cascading</a>. These are the slides from that presentation.</p>
<div style="width:425px;text-align:left" id="__ss_2487571"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" href="http://www.slideshare.net/jashmenn/intro-to-cascading" title="Intro To Cascading">Intro To Cascading</a><object style="margin:0px" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=intro-to-cascading-091112163237-phpapp01&#038;rel=0&#038;stripped_title=intro-to-cascading" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=intro-to-cascading-091112163237-phpapp01&#038;rel=0&#038;stripped_title=intro-to-cascading" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object></p>
<div style="font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">documents</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/jashmenn">Nate Murray</a>.</div>
</div>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F13%2Fslides-for-introduction-to-cascading-presentation%2F&amp;title=Slides%20for%20%22Introduction%20to%20Cascading%22%20Presentation&amp;notes=This%20week%20I%20gave%20an%20introductory%20presentation%20to%20Cascading.%20These%20are%20the%20slides%20from%20that%20presentation.%0D%0A%0D%0AIntro%20To%20CascadingView%20more%20documents%20from%20Nate%20Murray." title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F13%2Fslides-for-introduction-to-cascading-presentation%2F&amp;title=Slides%20for%20%22Introduction%20to%20Cascading%22%20Presentation" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F13%2Fslides-for-introduction-to-cascading-presentation%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=Slides%20for%20%22Introduction%20to%20Cascading%22%20Presentation%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F13%2Fslides-for-introduction-to-cascading-presentation%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F13%2Fslides-for-introduction-to-cascading-presentation%2F&amp;t=Slides%20for%20%22Introduction%20to%20Cascading%22%20Presentation" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F13%2Fslides-for-introduction-to-cascading-presentation%2F&amp;title=Slides%20for%20%22Introduction%20to%20Cascading%22%20Presentation&amp;annotation=This%20week%20I%20gave%20an%20introductory%20presentation%20to%20Cascading.%20These%20are%20the%20slides%20from%20that%20presentation.%0D%0A%0D%0AIntro%20To%20CascadingView%20more%20documents%20from%20Nate%20Murray." title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F13%2Fslides-for-introduction-to-cascading-presentation%2F&amp;t=Slides%20for%20%22Introduction%20to%20Cascading%22%20Presentation" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F13%2Fslides-for-introduction-to-cascading-presentation%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/11/13/slides-for-introduction-to-cascading-presentation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/11/13/slides-for-introduction-to-cascading-presentation/</feedburner:origLink></item>
		<item>
		<title>How to use a raw MapReduce job in Cascading</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/HC23im3CdN4/</link>
		<comments>http://www.xcombinator.com/2009/11/11/how-to-use-a-raw-mapreduce-job-in-cascading/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 22:31:18 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[cascading]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[java]]></category>

		<guid isPermaLink="false">http://xcombinator.local/?p=135</guid>
		<description><![CDATA[Cascading is a great abstraction over MapReduce.
However, sometimes you may have code for an existing MapReduce job or want to drop directly to Hadoop for efficiency. Even if you&#8217;re using raw MapReduce jobs, Cascading can still be useful in planning the overall data pipeline. 
The code below is an example of how to use a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cascading.org/">Cascading</a> is a great abstraction over MapReduce.</p>
<p>However, sometimes you may have code for an existing MapReduce job or want to drop directly to Hadoop for efficiency. Even if you&#8217;re using raw MapReduce jobs, Cascading can still be useful in planning the overall data pipeline. </p>
<p>The code below is an example of how to use a raw MapReduce job in a Cascade. The main thing to take away is that we are creating intermediate sinks and sources and relying on Cascading to schedule the flows in the correct order.</p>
<blockquote class="normal">
<p>NOTE: this code below depends on commit <a href="http://github.com/jashmenn/cascading/commit/f0dd84cd89da70c326e7285034e982c33d2d7388">f0dd84cd</a> which is a patch to MapReduceFlow.java that allows you to specifically set the Taps for a MapReduceFlow. I&#8217;ve contacted Chris about integrating this into the trunk. </p>
<p>Also note this patch applies to the branch <code>wip-1.1</code> and later.</p>
</blockquote>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">com.xcombinator.hadoopjobs.mapreducetest</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.cascade.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.flow.Flow</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.flow.FlowConnector</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.flow.MapReduceFlow</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.aggregator.Count</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.regex.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.pipe.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.scheme.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tap.*</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.tuple.Fields</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.Identity</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.Properties</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.conf.Configuration</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.conf.Configured</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.io.LongWritable</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.io.Text</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.FileInputFormat</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.FileOutputFormat</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.JobConf</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.TextInputFormat</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.TextOutputFormat</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.lib.IdentityMapper</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.lib.IdentityReducer</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.util.Tool</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.util.ToolRunner</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.log4j.Logger</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">cascading.operation.Debug</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.hadoop.mapred.KeyValueTextInputFormat</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #008000; font-style: italic; font-weight: bold;">/**
 * An example file to use a raw MapReduce job in cascading
 */</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> Main <span style="color: #000000; font-weight: bold;">extends</span> Configured <span style="color: #000000; font-weight: bold;">implements</span> Tool
  <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">final</span> Logger LOG <span style="color: #339933;">=</span> Logger.<span style="color: #006633;">getLogger</span><span style="color: #009900;">&#40;</span> Main.<span style="color: #000000; font-weight: bold;">class</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">int</span> run<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> args<span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#123;</span>
    JobConf conf <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> JobConf<span style="color: #009900;">&#40;</span>getConf<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">getClass</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">Properties</span> properties <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Properties</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    FlowConnector.<span style="color: #006633;">setApplicationJarClass</span><span style="color: #009900;">&#40;</span>properties, <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">getClass</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    CascadeConnector cascadeConnector <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> CascadeConnector<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    FlowConnector flowConnector <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> FlowConnector<span style="color: #009900;">&#40;</span>properties<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #003399;">String</span> inputPath  <span style="color: #339933;">=</span> args<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">String</span> outputPath <span style="color: #339933;">=</span> args<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">String</span> intermediatePath1 <span style="color: #339933;">=</span> args<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot;-mr-input&quot;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">String</span> intermediatePath2 <span style="color: #339933;">=</span> args<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot;-mr-output&quot;</span><span style="color: #339933;">;</span>
&nbsp;
    Scheme textLineScheme <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> TextLine<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    Tap sourceTap <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Hfs<span style="color: #009900;">&#40;</span>textLineScheme, inputPath<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    Tap intermediateTap1 <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Hfs<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> TextLine<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;line&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>, intermediatePath1<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    Tap intermediateTap2 <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Hfs<span style="color: #009900;">&#40;</span>textLineScheme, intermediatePath2<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    Tap sinkTap   <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Hfs<span style="color: #009900;">&#40;</span>textLineScheme, outputPath<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// create our first flow, sink to the intermediateTap</span>
    Pipe wsPipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;wordsplit&quot;</span>, 
        <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;line&quot;</span><span style="color: #009900;">&#41;</span>, 
        <span style="color: #000000; font-weight: bold;">new</span> RegexSplitGenerator<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;word&quot;</span><span style="color: #009900;">&#41;</span>, <span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\\</span>s+&quot;</span><span style="color: #009900;">&#41;</span>, 
        <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;word&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    Flow parsedLogFlow <span style="color: #339933;">=</span> flowConnector.<span style="color: #006633;">connect</span><span style="color: #009900;">&#40;</span>sourceTap, intermediateTap1, wsPipe<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Create a pipe and set our mr job for it </span>
    Pipe importPipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Pipe<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;mr pipe&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    JobConf mrconf <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> JobConf<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    mrconf.<span style="color: #006633;">setJobName</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;custom mr&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    mrconf.<span style="color: #006633;">setOutputKeyClass</span><span style="color: #009900;">&#40;</span>LongWritable.<span style="color: #000000; font-weight: bold;">class</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    mrconf.<span style="color: #006633;">setOutputValueClass</span><span style="color: #009900;">&#40;</span>Text.<span style="color: #000000; font-weight: bold;">class</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// the IdentityMapper, in this case, will actually output the key, which is</span>
    <span style="color: #666666; font-style: italic;">// a long of offset in bytes. Not what we'd usually want, but we'll leave</span>
    <span style="color: #666666; font-style: italic;">// it in for now.</span>
    mrconf.<span style="color: #006633;">setMapperClass</span><span style="color: #009900;">&#40;</span>IdentityMapper.<span style="color: #000000; font-weight: bold;">class</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    mrconf.<span style="color: #006633;">setReducerClass</span><span style="color: #009900;">&#40;</span>IdentityReducer.<span style="color: #000000; font-weight: bold;">class</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// note that your input is straight text-lines. This means in a real mr job</span>
    <span style="color: #666666; font-style: italic;">// you'd most likely need to split the line by some convention</span>
    TextInputFormat format <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> TextInputFormat<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    format.<span style="color: #006633;">configure</span><span style="color: #009900;">&#40;</span>mrconf<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// NOTE: this is both here and in the MapReduceFlow below</span>
    FileInputFormat.<span style="color: #006633;">setInputPaths</span><span style="color: #009900;">&#40;</span>mrconf, intermediateTap1.<span style="color: #006633;">getPath</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>  
    FileOutputFormat.<span style="color: #006633;">setOutputPath</span><span style="color: #009900;">&#40;</span>mrconf, intermediateTap2.<span style="color: #006633;">getPath</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// likewise</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// create our second flow, this one is for the mrjob. Notice source and sink taps</span>
    Flow mrFlow <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> MapReduceFlow<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;mrflow&quot;</span>, 
      mrconf, intermediateTap1, intermediateTap2, <span style="color: #000066; font-weight: bold;">false</span>, <span style="color: #000066; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// create our third &quot;regular&quot; cascading pipe</span>
    Pipe countPipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Pipe<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;count&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// b/c our IdentityMapper is emitting long of offset in the line, just</span>
    <span style="color: #666666; font-style: italic;">// strip that out. You wouldn't have to do this if you had a smarter Mapper</span>
    <span style="color: #666666; font-style: italic;">// class.</span>
    countPipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Each<span style="color: #009900;">&#40;</span>countPipe, 
        <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;line&quot;</span><span style="color: #009900;">&#41;</span>, 
        <span style="color: #000000; font-weight: bold;">new</span> RegexParser<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;word&quot;</span><span style="color: #009900;">&#41;</span>, <span style="color: #0000ff;">&quot;.*?<span style="color: #000099; font-weight: bold;">\\</span>t(.*)&quot;</span><span style="color: #009900;">&#41;</span>, 
        <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;word&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    countPipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> GroupBy<span style="color: #009900;">&#40;</span>countPipe, <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;word&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    countPipe <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Every<span style="color: #009900;">&#40;</span>countPipe, <span style="color: #000000; font-weight: bold;">new</span> Count<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #000000; font-weight: bold;">new</span> Fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;count&quot;</span>, <span style="color: #0000ff;">&quot;word&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// create the flow for the last count pipe</span>
    Flow countFlow <span style="color: #339933;">=</span> flowConnector.<span style="color: #006633;">connect</span><span style="color: #009900;">&#40;</span>intermediateTap2, sinkTap, countPipe<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    Cascade cascade <span style="color: #339933;">=</span> cascadeConnector.<span style="color: #006633;">connect</span><span style="color: #009900;">&#40;</span>parsedLogFlow, mrFlow, countFlow<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    cascade.<span style="color: #006633;">complete</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// if you want to get rid of the intermediate files you </span>
    <span style="color: #666666; font-style: italic;">// could do something like the following here:</span>
    <span style="color: #666666; font-style: italic;">// Path tmp = tap.getPath();</span>
    <span style="color: #666666; font-style: italic;">// FileSystem fs = tmp.getFileSystem(conf);</span>
    <span style="color: #666666; font-style: italic;">// fs.delete(tmp, true);</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> main<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> args<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">Exception</span> 
  <span style="color: #009900;">&#123;</span>
    <span style="color: #000066; font-weight: bold;">int</span> res <span style="color: #339933;">=</span> ToolRunner.<span style="color: #006633;">run</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Configuration<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #000000; font-weight: bold;">new</span> Main<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, args<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">System</span>.<span style="color: #006633;">exit</span><span style="color: #009900;">&#40;</span>res<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #009900;">&#125;</span></pre></div></div>

<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Fhow-to-use-a-raw-mapreduce-job-in-cascading%2F&amp;title=How%20to%20use%20a%20raw%20MapReduce%20job%20in%20Cascading&amp;notes=Cascading%20is%20a%20great%20abstraction%20over%20MapReduce.%0D%0A%0D%0AHowever%2C%20sometimes%20you%20may%20have%20code%20for%20an%20existing%20MapReduce%20job%20or%20want%20to%20drop%20directly%20to%20Hadoop%20for%20efficiency.%20Even%20if%20you%27re%20using%20raw%20MapReduce%20jobs%2C%20Cascading%20can%20still%20be%20useful%20in%20planni" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Fhow-to-use-a-raw-mapreduce-job-in-cascading%2F&amp;title=How%20to%20use%20a%20raw%20MapReduce%20job%20in%20Cascading" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Fhow-to-use-a-raw-mapreduce-job-in-cascading%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=How%20to%20use%20a%20raw%20MapReduce%20job%20in%20Cascading%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Fhow-to-use-a-raw-mapreduce-job-in-cascading%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Fhow-to-use-a-raw-mapreduce-job-in-cascading%2F&amp;t=How%20to%20use%20a%20raw%20MapReduce%20job%20in%20Cascading" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Fhow-to-use-a-raw-mapreduce-job-in-cascading%2F&amp;title=How%20to%20use%20a%20raw%20MapReduce%20job%20in%20Cascading&amp;annotation=Cascading%20is%20a%20great%20abstraction%20over%20MapReduce.%0D%0A%0D%0AHowever%2C%20sometimes%20you%20may%20have%20code%20for%20an%20existing%20MapReduce%20job%20or%20want%20to%20drop%20directly%20to%20Hadoop%20for%20efficiency.%20Even%20if%20you%27re%20using%20raw%20MapReduce%20jobs%2C%20Cascading%20can%20still%20be%20useful%20in%20planni" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Fhow-to-use-a-raw-mapreduce-job-in-cascading%2F&amp;t=How%20to%20use%20a%20raw%20MapReduce%20job%20in%20Cascading" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Fhow-to-use-a-raw-mapreduce-job-in-cascading%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/11/11/how-to-use-a-raw-mapreduce-job-in-cascading/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/11/11/how-to-use-a-raw-mapreduce-job-in-cascading/</feedburner:origLink></item>
		<item>
		<title>install enchant dictionaries</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/MvHF6pcFA9g/</link>
		<comments>http://www.xcombinator.com/2009/11/11/install-enchant-dictionaries/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 22:29:57 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[shell]]></category>
		<category><![CDATA[sysadmin]]></category>

		<guid isPermaLink="false">http://xcombinator.local/?p=133</guid>
		<description><![CDATA[I had a bit of a hard time getting enchant installed with dictionaries on Mac OS X. Here&#8217;s how to do it:

sudo port install enchant # you'll need MacPorts 1.8.* for this

Then download the dictionaries from the OpenOffice wiki.
Next, place the files in ~/.enchant like so:


$ tree .enchant/
.enchant/
`-- myspell
    &#124;-- en_US.aff
  [...]]]></description>
			<content:encoded><![CDATA[<p>I had a bit of a hard time getting enchant installed with dictionaries on Mac OS X. Here&#8217;s how to do it:</p>
<p><code><br />
sudo port install enchant # you'll need MacPorts 1.8.* for this<br />
</code></p>
<p>Then download the dictionaries from the <a href="http://wiki.services.openoffice.org/wiki/Dictionaries">OpenOffice wiki</a>.</p>
<p>Next, place the files in <code>~/.enchant</code> like so:</p>
<p><code></p>
<pre>
$ tree .enchant/
.enchant/
`-- myspell
    |-- en_US.aff
    `-- en_US.dic
</pre>
<p></code></p>
<p><b>UPDATE</b><br />
You can also set the path to where you want to keep the dictionaries when you compile <code>enchant</code> by passing the <code>--with-myspell-dir</code> option to <code>configure</code> (or <code>aspell</code>, <code>hspell</code> etc).</p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Finstall-enchant-dictionaries%2F&amp;title=install%20enchant%20dictionaries&amp;notes=I%20had%20a%20bit%20of%20a%20hard%20time%20getting%20enchant%20installed%20with%20dictionaries%20on%20Mac%20OS%20X.%20Here%27s%20how%20to%20do%20it%3A%0D%0A%0D%0A%0D%0Asudo%20port%20install%20enchant%20%23%20you%27ll%20need%20MacPorts%201.8.%2A%20for%20this%0D%0A%0D%0A%0D%0AThen%20download%20the%20dictionaries%20from%20the%20OpenOffice%20wiki.%0D%0A%0D%0ANext%2C%20place" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Finstall-enchant-dictionaries%2F&amp;title=install%20enchant%20dictionaries" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Finstall-enchant-dictionaries%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=install%20enchant%20dictionaries%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Finstall-enchant-dictionaries%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Finstall-enchant-dictionaries%2F&amp;t=install%20enchant%20dictionaries" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Finstall-enchant-dictionaries%2F&amp;title=install%20enchant%20dictionaries&amp;annotation=I%20had%20a%20bit%20of%20a%20hard%20time%20getting%20enchant%20installed%20with%20dictionaries%20on%20Mac%20OS%20X.%20Here%27s%20how%20to%20do%20it%3A%0D%0A%0D%0A%0D%0Asudo%20port%20install%20enchant%20%23%20you%27ll%20need%20MacPorts%201.8.%2A%20for%20this%0D%0A%0D%0A%0D%0AThen%20download%20the%20dictionaries%20from%20the%20OpenOffice%20wiki.%0D%0A%0D%0ANext%2C%20place" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Finstall-enchant-dictionaries%2F&amp;t=install%20enchant%20dictionaries" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F11%2F11%2Finstall-enchant-dictionaries%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/11/11/install-enchant-dictionaries/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/11/11/install-enchant-dictionaries/</feedburner:origLink></item>
		<item>
		<title>testing erlang gen_server with gen_server_mock</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/t960H_b54BI/</link>
		<comments>http://www.xcombinator.com/2009/08/11/testing-erlang-gen_server-with-gen_server_mock/#comments</comments>
		<pubDate>Tue, 11 Aug 2009 15:42:49 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/08/11/testing-erlang-gen_server-with-gen_server_mock/</guid>
		<description><![CDATA[Testing by synchronous pattern matching
Testing multi-process erlang gen_servers can be tricky. Typically one relies simply on pattern matching to verify that the response matches what you would expect.

&#123;expected, Response&#125; = gen_server:call&#40;Pid, hi&#41;.

As long as the gen_server call hi returns expected as the first element of the tuple, then the tests pass.
The technique is also the [...]]]></description>
			<content:encoded><![CDATA[<h2>Testing by synchronous pattern matching</h2>
<p>Testing multi-process erlang gen_servers can be tricky. Typically one relies simply on pattern matching to verify that the response matches what you would expect.</p>

<div class="wp_syntax"><div class="code"><pre class="erlang" style="font-family:monospace;"><span style="color: #109ab8;">&#123;</span>expected<span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">Response</span><span style="color: #109ab8;">&#125;</span> <span style="color: #014ea4;">=</span> <span style="color: #ff4e18;">gen_server</span>:<span style="color: #ff3c00;">call</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Pid</span><span style="color: #6bb810;">,</span> hi<span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">.</span></pre></div></div>

<p>As long as the <code>gen_server</code> call <code>hi</code> returns <code>expected</code> as the first element of the tuple, then the tests pass.</p>
<p>The technique is also the same when building client-server code where both client and server are <code>gen_server</code>s. The common case is to simply test one side at a time; test the response of all client calls and then (independently) test the responses of the server calls.</p>
<h2>What about asynchronous <code>cast</code>?</h2>
<p><code>gen_server:call</code> is convenient because it is synchronous and returns a value.<br />
<code>gen_server:cast</code>, on the other hand, is asynchronous and always returns the atom <code>ok</code>. This can make <code>cast</code>s difficult to test.</p>
<p><code>gen_server_mock</code> is a library to mock <code>gen_server</code> processes that expect specific, ordered sets of messages. It allows you to unit test <code>gen_server</code>s by verifying they are receiving the expected set of messages.</p>
<h2>Example 1</h2>

<div class="wp_syntax"><div class="code"><pre class="erlang" style="font-family:monospace;">     <span style="color: #109ab8;">&#123;</span>ok<span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">Mock</span><span style="color: #109ab8;">&#125;</span> <span style="color: #014ea4;">=</span> gen_server_mock:<span style="color: #ff3c00;">new</span><span style="color: #109ab8;">&#40;</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
     gen_server_mock:<span style="color: #ff3c00;">expect</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> call<span style="color: #6bb810;">,</span> <span style="color: #ff3c00;">fun</span><span style="color: #109ab8;">&#40;</span><span style="color: #109ab8;">&#123;</span>foo<span style="color: #6bb810;">,</span> hi<span style="color: #109ab8;">&#125;</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">_From</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">_State</span><span style="color: #109ab8;">&#41;</span> <span style="color: #6bb810;">-&gt;</span> <span style="color: #006600;">ok</span> <span style="color: #186895;">end</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
     gen_server_mock:<span style="color: #ff3c00;">expect_call</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> <span style="color: #ff3c00;">fun</span><span style="color: #109ab8;">&#40;</span><span style="color: #109ab8;">&#123;</span>bar<span style="color: #6bb810;">,</span> bye<span style="color: #109ab8;">&#125;</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">_From</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">_State</span><span style="color: #109ab8;">&#41;</span> <span style="color: #6bb810;">-&gt;</span> <span style="color: #006600;">ok</span> <span style="color: #186895;">end</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
&nbsp;
     ok <span style="color: #014ea4;">=</span> <span style="color: #ff4e18;">gen_server</span>:<span style="color: #ff3c00;">call</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> <span style="color: #109ab8;">&#123;</span>foo<span style="color: #6bb810;">,</span> hi<span style="color: #109ab8;">&#125;</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>  
     ok <span style="color: #014ea4;">=</span> <span style="color: #ff4e18;">gen_server</span>:<span style="color: #ff3c00;">call</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> <span style="color: #109ab8;">&#123;</span>bar<span style="color: #6bb810;">,</span> bye<span style="color: #109ab8;">&#125;</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>  
&nbsp;
     ok <span style="color: #014ea4;">=</span> gen_server_mock:<span style="color: #ff3c00;">assert_expectations</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #109ab8;">&#41;</span></pre></div></div>

<p>This <code>Mock</code> expects two <code>call</code>s: <code>{foo, hi}</code> and <code>{bar, bye}</code>. Since <code>Mock</code> receives both of these messages, <code>assert_expectations</code> does not raise any errors.</p>
<h2>What is verified</h2>
<p><code>gen_server_mock:assert_expectations(Mock)</code> verifies that:</p>
<ol>
<li>all expected messages were received</li>
<li>no messages were received that were not expected</li>
</ol>
<p>You can catch the <code>exit</code> by using the following:</p>

<div class="wp_syntax"><div class="code"><pre class="erlang" style="font-family:monospace;">     <span style="color: #45b3e6;">Result</span> <span style="color: #014ea4;">=</span> <span style="color: #186895;">try</span> gen_server_mock:<span style="color: #ff3c00;">assert_expectations</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #109ab8;">&#41;</span>
     <span style="color: #186895;">catch</span>
         exit:<span style="color: #006600;">Exception</span> <span style="color: #6bb810;">-&gt;</span> <span style="color: #45b3e6;">Exception</span>
     <span style="color: #186895;">end</span><span style="color: #6bb810;">,</span>
     <span style="color: #666666; font-style: italic;">% etc...</span></pre></div></div>

<h2>Example 2</h2>

<div class="wp_syntax"><div class="code"><pre class="erlang" style="font-family:monospace;">     <span style="color: #109ab8;">&#123;</span>ok<span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">Mock</span><span style="color: #109ab8;">&#125;</span> <span style="color: #014ea4;">=</span> gen_server_mock:<span style="color: #ff3c00;">new</span><span style="color: #109ab8;">&#40;</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
&nbsp;
     gen_server_mock:<span style="color: #ff3c00;">expect_call</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> <span style="color: #ff3c00;">fun</span><span style="color: #109ab8;">&#40;</span>one<span style="color: #6bb810;">,</span>  <span style="color: #45b3e6;">_From</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">_State</span><span style="color: #109ab8;">&#41;</span>            <span style="color: #6bb810;">-&gt;</span> <span style="color: #006600;">ok</span> <span style="color: #186895;">end</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
     gen_server_mock:<span style="color: #ff3c00;">expect_call</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> <span style="color: #ff3c00;">fun</span><span style="color: #109ab8;">&#40;</span>two<span style="color: #6bb810;">,</span>  <span style="color: #45b3e6;">_From</span><span style="color: #6bb810;">,</span>  <span style="color: #45b3e6;">State</span><span style="color: #109ab8;">&#41;</span>            <span style="color: #6bb810;">-&gt;</span> <span style="color: #109ab8;">&#123;</span>ok<span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">State</span><span style="color: #109ab8;">&#125;</span> <span style="color: #186895;">end</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
     gen_server_mock:<span style="color: #ff3c00;">expect_call</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> <span style="color: #ff3c00;">fun</span><span style="color: #109ab8;">&#40;</span>three<span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">_From</span><span style="color: #6bb810;">,</span>  <span style="color: #45b3e6;">State</span><span style="color: #109ab8;">&#41;</span>           <span style="color: #6bb810;">-&gt;</span> <span style="color: #109ab8;">&#123;</span>ok<span style="color: #6bb810;">,</span> good<span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">State</span><span style="color: #109ab8;">&#125;</span> <span style="color: #186895;">end</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
     gen_server_mock:<span style="color: #ff3c00;">expect_call</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> <span style="color: #ff3c00;">fun</span><span style="color: #109ab8;">&#40;</span><span style="color: #109ab8;">&#123;</span>echo<span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">Response</span><span style="color: #109ab8;">&#125;</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">_From</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">State</span><span style="color: #109ab8;">&#41;</span> <span style="color: #6bb810;">-&gt;</span> <span style="color: #109ab8;">&#123;</span>ok<span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">Response</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">State</span><span style="color: #109ab8;">&#125;</span> <span style="color: #186895;">end</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
     gen_server_mock:<span style="color: #ff3c00;">expect_cast</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> <span style="color: #ff3c00;">fun</span><span style="color: #109ab8;">&#40;</span>fish<span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">State</span><span style="color: #109ab8;">&#41;</span> <span style="color: #6bb810;">-&gt;</span> <span style="color: #109ab8;">&#123;</span>ok<span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">State</span><span style="color: #109ab8;">&#125;</span> <span style="color: #186895;">end</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
     gen_server_mock:<span style="color: #ff3c00;">expect_info</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> <span style="color: #ff3c00;">fun</span><span style="color: #109ab8;">&#40;</span>cat<span style="color: #6bb810;">,</span>  <span style="color: #45b3e6;">State</span><span style="color: #109ab8;">&#41;</span> <span style="color: #6bb810;">-&gt;</span> <span style="color: #109ab8;">&#123;</span>ok<span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">State</span><span style="color: #109ab8;">&#125;</span> <span style="color: #186895;">end</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
&nbsp;
     ok <span style="color: #014ea4;">=</span> <span style="color: #ff4e18;">gen_server</span>:<span style="color: #ff3c00;">call</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> one<span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
     ok <span style="color: #014ea4;">=</span> <span style="color: #ff4e18;">gen_server</span>:<span style="color: #ff3c00;">call</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> two<span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
     good <span style="color: #014ea4;">=</span> <span style="color: #ff4e18;">gen_server</span>:<span style="color: #ff3c00;">call</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> three<span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
     tree <span style="color: #014ea4;">=</span> <span style="color: #ff4e18;">gen_server</span>:<span style="color: #ff3c00;">call</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> <span style="color: #109ab8;">&#123;</span>echo<span style="color: #6bb810;">,</span> tree<span style="color: #109ab8;">&#125;</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
     ok <span style="color: #014ea4;">=</span> <span style="color: #ff4e18;">gen_server</span>:<span style="color: #ff3c00;">cast</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #6bb810;">,</span> fish<span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">,</span>
     <span style="color: #45b3e6;">Mock</span> <span style="color: #014ea4;">!</span> cat<span style="color: #6bb810;">,</span>
&nbsp;
     gen_server_mock:<span style="color: #ff3c00;">assert_expectations</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Mock</span><span style="color: #109ab8;">&#41;</span></pre></div></div>

<p>Currently three types of messages are supported: <code>call</code>, <code>cast</code>, and <code>info</code>.</p>
<p>The signature of the <code>fun</code> of each expectation is the same as the corresponding<br />
<code>gen_server:handle_*</code>. So the <code>fun</code> for <code>expect_call</code> has the same signature as <code>handle_call</code>: <code>fun(Request, From, State)</code>. See <code>man gen_server</code> for more information.</p>
<p>However, the return value of the <code>fun</code> <em>must</em> be one of:</p>

<div class="wp_syntax"><div class="code"><pre class="erlang" style="font-family:monospace;">    ok |                  
    <span style="color: #109ab8;">&#123;</span>ok<span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">NewState</span><span style="color: #109ab8;">&#125;</span> |
    <span style="color: #109ab8;">&#123;</span>ok<span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">ResponseValue</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">NewState</span><span style="color: #109ab8;">&#125;</span> |</pre></div></div>

<p>Anything else will be an error. Note that you can change the state of your <code>Mock</code> by returning <code>NewState</code>.</p>
<p>Arbitrary, non-<code>gen_server</code> messages are handled with <code>expect_info</code>, e.g. <code>Mock ! cat</code> fulfills the <code>expect_info</code> in the example above.</p>
<h2>References</h2>
<ul>
<li><a href="http://github.com/jashmenn/gen_server_mock">Github Repo</a> (Patches readily accepted)</li>
<li>Work inspired by <a href="http://erlang.org/pipermail/erlang-questions/2008-April/034140.html">this post</a></li>
<li><a href="http://martinfowler.com/articles/mocksArentStubs.html">Mocks Aren&#8217;t Stubs</a></li>
</ul>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F11%2Ftesting-erlang-gen_server-with-gen_server_mock%2F&amp;title=testing%20erlang%20gen_server%20with%20gen_server_mock&amp;notes=Testing%20by%20synchronous%20pattern%20matching%20%0D%0A%20%0D%0ATesting%20multi-process%20erlang%20gen_servers%20can%20be%20tricky.%20Typically%20one%20relies%20simply%20on%20pattern%20matching%20to%20verify%20that%20the%20response%20matches%20what%20you%20would%20expect.%20%0D%0A%20%0D%0A%0D%0A%7Bexpected%2C%20Response%7D%20%3D%20gen_server%3Ac" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F11%2Ftesting-erlang-gen_server-with-gen_server_mock%2F&amp;title=testing%20erlang%20gen_server%20with%20gen_server_mock" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F11%2Ftesting-erlang-gen_server-with-gen_server_mock%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=testing%20erlang%20gen_server%20with%20gen_server_mock%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F11%2Ftesting-erlang-gen_server-with-gen_server_mock%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F11%2Ftesting-erlang-gen_server-with-gen_server_mock%2F&amp;t=testing%20erlang%20gen_server%20with%20gen_server_mock" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F11%2Ftesting-erlang-gen_server-with-gen_server_mock%2F&amp;title=testing%20erlang%20gen_server%20with%20gen_server_mock&amp;annotation=Testing%20by%20synchronous%20pattern%20matching%20%0D%0A%20%0D%0ATesting%20multi-process%20erlang%20gen_servers%20can%20be%20tricky.%20Typically%20one%20relies%20simply%20on%20pattern%20matching%20to%20verify%20that%20the%20response%20matches%20what%20you%20would%20expect.%20%0D%0A%20%0D%0A%0D%0A%7Bexpected%2C%20Response%7D%20%3D%20gen_server%3Ac" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F11%2Ftesting-erlang-gen_server-with-gen_server_mock%2F&amp;t=testing%20erlang%20gen_server%20with%20gen_server_mock" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F11%2Ftesting-erlang-gen_server-with-gen_server_mock%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/08/11/testing-erlang-gen_server-with-gen_server_mock/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/08/11/testing-erlang-gen_server-with-gen_server_mock/</feedburner:origLink></item>
		<item>
		<title>rough cut: erlang code coverage with rake</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/iZfmJveORb8/</link>
		<comments>http://www.xcombinator.com/2009/08/05/rough-cut-erlang-code-coverage-with-rake/#comments</comments>
		<pubDate>Wed, 05 Aug 2009 16:32:13 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[erlang]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/08/05/rough-cut-erlang-code-coverage-with-rake/</guid>
		<description><![CDATA[Here&#8217;s a quick, ugly version of how to get erlang code coverage using rake. This is a rough first draft and the gist below doesn&#8217;t include all the needed dependencies. To see the rest of the Rakefile, checkout skelerl.

Share:
	
	
	
	
	
	
	
	
	

]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a quick, ugly version of how to get erlang code coverage using rake. This is a rough first draft and the gist below doesn&#8217;t include all the needed dependencies. To see the rest of the Rakefile, checkout <a href="http://github.com/jashmenn/skelerl/blob/b4f64f6ab9ddf3e444aab2e5f2342ac056459469/tasks/build.rake">skelerl</a>.</p>
<p><script src="http://gist.github.com/162794.js"></script></p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F05%2Frough-cut-erlang-code-coverage-with-rake%2F&amp;title=rough%20cut%3A%20erlang%20code%20coverage%20with%20rake&amp;notes=Here%27s%20a%20quick%2C%20ugly%20version%20of%20how%20to%20get%20erlang%20code%20coverage%20using%20rake.%20This%20is%20a%20rough%20first%20draft%20and%20the%20gist%20below%20doesn%27t%20include%20all%20the%20needed%20dependencies.%20To%20see%20the%20rest%20of%20the%20Rakefile%2C%20checkout%20skelerl.%0D%0A%0D%0A" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F05%2Frough-cut-erlang-code-coverage-with-rake%2F&amp;title=rough%20cut%3A%20erlang%20code%20coverage%20with%20rake" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F05%2Frough-cut-erlang-code-coverage-with-rake%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=rough%20cut%3A%20erlang%20code%20coverage%20with%20rake%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F05%2Frough-cut-erlang-code-coverage-with-rake%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F05%2Frough-cut-erlang-code-coverage-with-rake%2F&amp;t=rough%20cut%3A%20erlang%20code%20coverage%20with%20rake" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F05%2Frough-cut-erlang-code-coverage-with-rake%2F&amp;title=rough%20cut%3A%20erlang%20code%20coverage%20with%20rake&amp;annotation=Here%27s%20a%20quick%2C%20ugly%20version%20of%20how%20to%20get%20erlang%20code%20coverage%20using%20rake.%20This%20is%20a%20rough%20first%20draft%20and%20the%20gist%20below%20doesn%27t%20include%20all%20the%20needed%20dependencies.%20To%20see%20the%20rest%20of%20the%20Rakefile%2C%20checkout%20skelerl.%0D%0A%0D%0A" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F05%2Frough-cut-erlang-code-coverage-with-rake%2F&amp;t=rough%20cut%3A%20erlang%20code%20coverage%20with%20rake" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F08%2F05%2Frough-cut-erlang-code-coverage-with-rake%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/08/05/rough-cut-erlang-code-coverage-with-rake/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/08/05/rough-cut-erlang-code-coverage-with-rake/</feedburner:origLink></item>
		<item>
		<title>fixing the disabled snapshot button in vmware fusion</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/w0I3FhwDO0k/</link>
		<comments>http://www.xcombinator.com/2009/07/17/fixing-the-disabled-snapshot-button-in-vmware-fusion/#comments</comments>
		<pubDate>Fri, 17 Jul 2009 18:22:55 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/07/17/fixing-the-disabled-snapshot-button-in-vmware-fusion/</guid>
		<description><![CDATA[I recently created a new CentOS 5 vmware image. Unfortunately the &#8220;Take Snapshot&#8221; button was disabled while the system was running. After searching around I found this post.
What you need to do is open up your vmx file and comment out (disable) the following line:

scsi0:0.mode = &#34;independent-persistent&#34;

Others have also reported finding the following line in [...]]]></description>
			<content:encoded><![CDATA[<p>I recently created a new CentOS 5 vmware image. Unfortunately the &#8220;Take Snapshot&#8221; button was disabled while the system was running. After searching around I found <a href="http://communities.vmware.com/message/589046#589046">this post</a>.</p>
<p>What you need to do is open up your <code>vmx</code> file and comment out (disable) the following line:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">scsi0:0.mode = <span style="color: #ff0000;">&quot;independent-persistent&quot;</span></pre></div></div>

<p>Others have also reported finding the following line in their <code>vmx</code> file:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">snapshot.disabled = <span style="color: #ff0000;">&quot;TRUE&quot;</span></pre></div></div>

<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F17%2Ffixing-the-disabled-snapshot-button-in-vmware-fusion%2F&amp;title=fixing%20the%20disabled%20snapshot%20button%20in%20vmware%20fusion&amp;notes=I%20recently%20created%20a%20new%20CentOS%205%20vmware%20image.%20Unfortunately%20the%20%22Take%20Snapshot%22%20button%20was%20disabled%20while%20the%20system%20was%20running.%20After%20searching%20around%20I%20found%20this%20post.%0D%0A%0D%0AWhat%20you%20need%20to%20do%20is%20open%20up%20your%20vmx%20file%20and%20comment%20out%20%28disable%29%20th" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F17%2Ffixing-the-disabled-snapshot-button-in-vmware-fusion%2F&amp;title=fixing%20the%20disabled%20snapshot%20button%20in%20vmware%20fusion" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F17%2Ffixing-the-disabled-snapshot-button-in-vmware-fusion%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=fixing%20the%20disabled%20snapshot%20button%20in%20vmware%20fusion%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F17%2Ffixing-the-disabled-snapshot-button-in-vmware-fusion%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F17%2Ffixing-the-disabled-snapshot-button-in-vmware-fusion%2F&amp;t=fixing%20the%20disabled%20snapshot%20button%20in%20vmware%20fusion" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F17%2Ffixing-the-disabled-snapshot-button-in-vmware-fusion%2F&amp;title=fixing%20the%20disabled%20snapshot%20button%20in%20vmware%20fusion&amp;annotation=I%20recently%20created%20a%20new%20CentOS%205%20vmware%20image.%20Unfortunately%20the%20%22Take%20Snapshot%22%20button%20was%20disabled%20while%20the%20system%20was%20running.%20After%20searching%20around%20I%20found%20this%20post.%0D%0A%0D%0AWhat%20you%20need%20to%20do%20is%20open%20up%20your%20vmx%20file%20and%20comment%20out%20%28disable%29%20th" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F17%2Ffixing-the-disabled-snapshot-button-in-vmware-fusion%2F&amp;t=fixing%20the%20disabled%20snapshot%20button%20in%20vmware%20fusion" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F17%2Ffixing-the-disabled-snapshot-button-in-vmware-fusion%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/07/17/fixing-the-disabled-snapshot-button-in-vmware-fusion/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/07/17/fixing-the-disabled-snapshot-button-in-vmware-fusion/</feedburner:origLink></item>
		<item>
		<title>automate installing tripwire using expect</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/z6Bokq1QaMg/</link>
		<comments>http://www.xcombinator.com/2009/07/10/automate-installing-tripwire-using-expect/#comments</comments>
		<pubDate>Fri, 10 Jul 2009 21:48:06 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[deployment]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[shell]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/07/10/automate-installing-tripwire-using-expect/</guid>
		<description><![CDATA[tripwire is a handy part of an intrusion detection system. It&#8217;s a nice piece of software but the installer is interactive which makes it a pain to install automatically (e.g. when using PoolParty/EC2). Below is a simple expect script I whipped up to solve the make install problem. Hopefully this will save someone two or [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://sourceforge.net/projects/tripwire/"><tt>tripwire</tt></a> is a handy part of an intrusion detection system. It&#8217;s a nice piece of software but the installer is interactive which makes it a pain to install automatically (e.g. when using PoolParty/EC2). Below is a simple <tt>expect</tt> script I whipped up to solve the <tt>make install</tt> problem. Hopefully this will save someone two or three minutes.</p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;"># Usage: expect install-tripwire.tcl pass1 pass2 
set PASS1 [lindex $argv 0]
set PASS2 [lindex $argv 1]
&nbsp;
spawn make install
&nbsp;
expect &quot;Press ENTER to view the License Agreement.&quot;
send &quot;\r&quot;
send &quot;q&quot;
expect &quot;license agreement. \[do not accept\]&quot;
send &quot;accept\r&quot;
expect &quot;Continue with installation? \[y/n\]&quot;
send &quot;y\r&quot;
&nbsp;
expect &quot;Enter the site keyfile passphrase:&quot;
send &quot;$PASS1\r&quot;
expect &quot;Verify the site keyfile passphrase:&quot;
send &quot;$PASS1\r&quot;
&nbsp;
expect &quot;Enter the local keyfile passphrase:&quot;
send &quot;$PASS2\r&quot;
expect &quot;Verify the local keyfile passphrase:&quot;
send &quot;$PASS2\r&quot;
&nbsp;
expect &quot;Please enter your site passphrase:&quot;
send &quot;$PASS1\r&quot;
&nbsp;
expect &quot;Please enter your site passphrase:&quot;
send &quot;$PASS1\r&quot;</pre></div></div>

<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F10%2Fautomate-installing-tripwire-using-expect%2F&amp;title=automate%20installing%20tripwire%20using%20expect&amp;notes=tripwire%20is%20a%20handy%20part%20of%20an%20intrusion%20detection%20system.%20It%27s%20a%20nice%20piece%20of%20software%20but%20the%20installer%20is%20interactive%20which%20makes%20it%20a%20pain%20to%20install%20automatically%20%28e.g.%20when%20using%20PoolParty%2FEC2%29.%20Below%20is%20a%20simple%20expect%20script%20I%20whipped%20up%20to%20" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F10%2Fautomate-installing-tripwire-using-expect%2F&amp;title=automate%20installing%20tripwire%20using%20expect" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F10%2Fautomate-installing-tripwire-using-expect%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=automate%20installing%20tripwire%20using%20expect%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F10%2Fautomate-installing-tripwire-using-expect%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F10%2Fautomate-installing-tripwire-using-expect%2F&amp;t=automate%20installing%20tripwire%20using%20expect" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F10%2Fautomate-installing-tripwire-using-expect%2F&amp;title=automate%20installing%20tripwire%20using%20expect&amp;annotation=tripwire%20is%20a%20handy%20part%20of%20an%20intrusion%20detection%20system.%20It%27s%20a%20nice%20piece%20of%20software%20but%20the%20installer%20is%20interactive%20which%20makes%20it%20a%20pain%20to%20install%20automatically%20%28e.g.%20when%20using%20PoolParty%2FEC2%29.%20Below%20is%20a%20simple%20expect%20script%20I%20whipped%20up%20to%20" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F10%2Fautomate-installing-tripwire-using-expect%2F&amp;t=automate%20installing%20tripwire%20using%20expect" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F10%2Fautomate-installing-tripwire-using-expect%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/07/10/automate-installing-tripwire-using-expect/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/07/10/automate-installing-tripwire-using-expect/</feedburner:origLink></item>
		<item>
		<title>“Easily” setup a monitored Hadoop / Hive Cluster in EC2 with PoolParty</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/LzbeXhUOZtg/</link>
		<comments>http://www.xcombinator.com/2009/07/08/easily-setup-a-monitored-hadoop-hive-cluster-in-ec2-with-poolparty/#comments</comments>
		<pubDate>Wed, 08 Jul 2009 14:13:34 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[hadoop]]></category>
		<category><![CDATA[poolparty]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/07/08/easily-setup-a-monitored-hadoop-hive-cluster-in-ec2-with-poolparty/</guid>
		<description><![CDATA[
Summary
Setting up a scalable Hadoop cluster isn&#8217;t easy, but PoolParty makes it easier
and manageable.
By the time we&#8217;re done with this tutorial you&#8217;ll have a Hadoop cluster consisting of one master node and two slaves.  The slaves are formatted with HDFS and process MapReduce jobs that are delegated to them from the master. 

The whole [...]]]></description>
			<content:encoded><![CDATA[<p></p>
<h1>Summary</h1>
<p>Setting up a scalable Hadoop cluster isn&#8217;t easy, but PoolParty makes it easier<br />
and manageable.</p>
<p>By the time we&#8217;re done with this tutorial you&#8217;ll have a Hadoop cluster consisting of one master node and two slaves.  The slaves are formatted with HDFS and process MapReduce jobs that are delegated to them from the master.<br /> 
</p>
<p>The whole cluster is monitored by Ganglia.</p>
<p> <a href='http://www.xcombinator.com/wp-content/uploads/2009/07/picture-8.png' title='ganglia cluster monitoring'><img src='http://www.xcombinator.com/wp-content/uploads/2009/07/picture-8.thumbnail.png' alt='ganglia cluster monitoring' /></a></p>
<h1>Benefits of PoolParty</h1>
<p>The nodes are very interdependent. By that I mean that each node needs to have 2 or 3 configuration files that are based on the other currently running nodes in the cluster. As nodes are joining and leaving the cluster each of these files on every node needs to be updated. PoolParty handles this process for you more-or-less automatically. The benefit is that you don&#8217;t have roll your own methods to do this every time you want to setup a cluster. </p>
<p>In PoolParty plugins are first-class citizens. This means you can write your own plugins and they are every bit as powerful as the resources that make up PoolParty core itself. This makes it easy to break up server functionality into <em>modules of code</em> . PoolParty, in a sense, gives you object-oriented server configurations. You can, for instance, take a Ganglia object, call a few methods and PoolParty takes care of executing the required commands to deploy a configured Ganglia cluster.</p>
<h1>Architecture </h1>
<p>PoolParty is built around the notion of <em>pools</em> and <em>clouds</em> . A pool is simply a collection of clouds. A cloud is a homogeneous set of nodes. i.e. <strong>every node in a cloud is <em>configured</em> the same way</strong> . Obviously nodes in a cloud will have different sets of working data as they run, but the idea is any node in a cloud could be substituted for any other node in that same cloud.<br /> 
</p>
<p>PoolParty itself is designed to be fully distributed and masterless. There is no required concept of &#8220;master&#8221; and &#8220;slave&#8221; in PoolParty itself. That said, many pieces of software, such as Hadoop, do have this concept and PoolParty can be configured to take advantage of that. </p>
<p>We&#8217;ll be setting up our pool as two clouds <code>hadoop_master</code> and <code>hadoop_slave</code>. Obviously, <code>hadoop_slave</code> will be a cloud (cluster) of nodes configured to be Hadoop slaves. <code>hadoop_master</code> will also be a cloud of masters. In our example we&#8217;re only going to use 1 node as the master. But  you could relatively easily configure everything to have more than one master.<br /> 
</p>
<h1>Software involved</h1>
<ul>
<li><a href="http://hadoop.apache.org/core/">Hadoop</a> </li>
<li><a href="http://wiki.apache.org/hadoop/Hive">Hive</a></li>
<li><a href="http://ganglia.info/">Ganglia</a></li>
<li><a href="http://poolpartyrb.com">PoolParty</a></li>
</ul>
<h1>Prerequisites</h1>
<p>This tutorial assumes that:</p>
<ol>
<li><strong>You have Amazon EC2 java tools installed</strong>. See <a href="http://docs.amazonwebservices.com/AWSEC2/latest/GettingStartedGuide/index.html?StartCLI.html">EC2: Getting Started with the Command Line Tools</a></li>
<li><strong>You have the proper EC2 environment variables setup</strong>. See <a href="http://auser.github.com/poolparty/amazon_ec2_setup.html">Setting up EC2</a> on the PoolParty website. For instance, a typical PoolParty install would have these variables in <code>$HOME/.ec2/keys_and_secrets.sh</code>.</li>
<li><strong>You have PoolParty installed from source</strong>. In theory, you should be able to install the gem. However, <em>today</em>  you should probably install from source. Make sure you have <code>git://github.com/auser/poolparty.git</code> checked out and then follow the &#8220;Installing&#8221; directions on <a href="http://wiki.github.com/auser/poolparty/installing">the PoolParty wiki</a>. You only need to complete the two sections <strong>Dependencies required to build gem locally</strong> and <strong>Instructions</strong> . This will install all the development dependency gems and then make sure you have all of the submodules. <strong>NOTE</strong> PoolParty deploys ruby gem versions based on the versions on your <em>local</em> machine. So make sure you have the most recent versions of the required gems installed locally.</li>
<li><strong>You have the <a href="http://github.com/jashmenn/poolparty-examples/tree/master">jashmenn/poolparty-examples</a> repository</strong>. <code>git clone git://github.com/jashmenn/poolparty-examples.git /path/to/poolparty-examples</code> </li>
<li><strong>You have the <a href="http://github.com/jashmenn/poolparty-extensions/tree/master">jashmenn/poolparty-extensions</a> repository</strong>. Note that this directory must be a <em>sibling</em> directory to the <code>poolparty-examples</code> directory. <code>git clone git://github.com/jashmenn/poolparty-extensions.git /path/to/poolparty-extensions</code></li>
</ol>
<h1>EC2 Security</h1>
<p>Now that we have the code issue complete, we now need to deal with Amazon&#8217;s security. (See <a href="http://auser.github.com/poolparty/amazon.html">here</a> if you are unclear on how EC2 security works.)</p>
<h2>Setup Keypairs</h2>
<hr />
<p>Every cloud in PoolParty must have its own unique keypair. Thats important enough it&#8217;s worth repeating: <em>every cloud in PoolParty must have its own unique keypair</em> .</p>
<p>So run the following commands:</p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;">ec2-add-keypair cloud_hadoop_slave &amp;gt; ~/.ssh/cloud_hadoop_slave
ec2-add-keypair cloud_hadoop_master &amp;gt; ~/.ssh/cloud_hadoop_master
chmod 600 ~/.ssh/cloud_hadoop_*</pre></div></div>

<h2>Security Groups</h2>
<hr />
<p>You&#8217;ll also want to create a security group for our <em>pool</em> . </p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;">ec2-add-group hadoop_pool -d &quot;the pool of hadoop masters and slaves&quot;</pre></div></div>

<p><strong>NOTICE:</strong> Hadoop has a crazy number of ports that it requires. The ports below will <em>work</em> but may not be the most secure configuration. If you understand this better than I please recommend better settings. Otherwise proceed knowing that these ports are probably a little <em>too</em> open.</p>
<p>We also need to open a number of ports for this security group:</p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;">&lt;code&gt;ec2-authorize -p 22 hadoop_pool               # ssh
ec2-authorize -p 8642 hadoop_pool             # poolparty internal daemons
ec2-authorize -P icmp -t -1:-1 hadoop_pool    # if you want to ping (optional, i guess)
ec2-authorize -p 80 hadoop_pool               # apache
&nbsp;
ec2-authorize -p 8649 -P udp hadoop_pool      # ganglia UDP
ec2-authorize hadoop_pool -o hadoop_pool -u xxxxxxxxxxxx # xxxxxxxxxxxx is your amazon account id. ugly but true
&lt;/code&gt;</pre></div></div>

<h1>Start your cloud</h1>
<p><strong>NOTE</strong> : There are a number of configurations that rely on the whole cloud being booted. This means that the first time you run <code>cloud-start</code> you may see a few shell errors. This is okay as long as it goes away after subsequent configures. The idea is that all nodes need to be started before the whole configuration will work properly.</p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;">cd /path/to/poolparty-examples/hadoop
cloud-list # sanity check, no instances should show up, no exceptions should be raised
cloud-start -vd</pre></div></div>

<p><em>Tons</em>  of information will fly by. Be patient, this could take upwards of 15 minutes. </p>
<p>Everything done? Good. Now you&#8217;re going to need to configure a second time. Now that all the nodes are booted they can be configured to talk to each other properly.</p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;">cloud-configure -vd</pre></div></div>

<p>Again, tons of output should fly by. Wait for it to finish.</p>
<p>Now what we want to do is actually run our hadoop sample job. Open up the <code>hadoop/clouds.rb</code> and find the lines that look like this:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">hadoop <span style="color:#9966CC; font-weight:bold;">do</span>
  configure_master
  prep_example_job
  <span style="color:#008000; font-style:italic;"># run_example_job</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>Uncomment the <code>run_example_job</code> line and configure, but this time we only need to configure master.</p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;">cloud-configure -vd -c hadoop_master</pre></div></div>

<p>This <em>should</em> work, but there is a chance the hdfs wont be started in time to load the sample job. If that happens, just configure one more time.<br />
You know it worked if you see output like the following (it wont be at the bottom):</p>
<pre>[Fri, 26 Jun 2009 20:09:50 +0000] DEBUG: STDERR: 09/06/26 20:09:11 INFO input.FileInputFormat: Total input paths to process : 3
09/06/26 20:09:12 INFO mapred.JobClient: Running job: job_200906262006_0001
09/06/26 20:09:13 INFO mapred.JobClient:  map 0% reduce 0%
09/06/26 20:09:32 INFO mapred.JobClient:  map 66% reduce 0%
09/06/26 20:09:38 INFO mapred.JobClient:  map 100% reduce 0%
09/06/26 20:09:47 INFO mapred.JobClient:  map 100% reduce 100%
09/06/26 20:09:49 INFO mapred.JobClient: Job complete: job_200906262006_0001
09/06/26 20:09:49 INFO mapred.JobClient: Counters: 17
</pre>
<p>Congradulations! You now have a scalable Hadoop cluster at your disposal!</p>
<h1>What to do when something goes wrong</h1>
<ul>
<li>Checkout the <a href="http://auser.github.com/poolparty/community.html">PoolParty IRC channel</a>, we&#8217;re always around and ready to help #poolpartyrb. </li>
</ul>
<p>This plugin was based on a number of helpful sites on the web. Checkout the following links:</p>
<h2>Hadoop</h2>
<hr />
<ul>
<li><a href="http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster">Michael Noll&#8217;s Haddop Tutorial</a>)</li>
</ul>
<h2>Hive</h2>
<hr />
<ul>
<li><a href="http://wiki.apache.org/hadoop/Hive">Apache&#8217;s Hive website</a></li>
</ul>
<h2>Ganglia</h2>
<hr />
<ul>
<li><a href="http://www.ibm.com/developerworks/wikis/display/WikiPtype/ganglia">IBM&#8217;s Ganglia Tutorial</a></li>
</ul>
<h1>References</h1>
<ul>
<li><a href="http://auser.github.com/poolparty/docs/index.html">PoolParty Documentation</a></li>
</ul>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F08%2Feasily-setup-a-monitored-hadoop-hive-cluster-in-ec2-with-poolparty%2F&amp;title=%22Easily%22%20setup%20a%20monitored%20Hadoop%20%2F%20Hive%20Cluster%20in%20EC2%20with%20PoolParty&amp;notes=%20%0D%0A%20%0D%0ASummary%20%0D%0A%20%0D%0ASetting%20up%20a%20scalable%20Hadoop%20cluster%20isn%27t%20easy%2C%20but%20PoolParty%20makes%20it%20easier%0D%0Aand%20manageable.%20%0D%0A%20%0D%0ABy%20the%20time%20we%27re%20done%20with%20this%20tutorial%20you%27ll%20have%20a%20Hadoop%20cluster%20consisting%20of%20one%20master%20node%20and%20two%20slaves.%20%20The%20slaves%20a" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F08%2Feasily-setup-a-monitored-hadoop-hive-cluster-in-ec2-with-poolparty%2F&amp;title=%22Easily%22%20setup%20a%20monitored%20Hadoop%20%2F%20Hive%20Cluster%20in%20EC2%20with%20PoolParty" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F08%2Feasily-setup-a-monitored-hadoop-hive-cluster-in-ec2-with-poolparty%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=%22Easily%22%20setup%20a%20monitored%20Hadoop%20%2F%20Hive%20Cluster%20in%20EC2%20with%20PoolParty%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F08%2Feasily-setup-a-monitored-hadoop-hive-cluster-in-ec2-with-poolparty%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F08%2Feasily-setup-a-monitored-hadoop-hive-cluster-in-ec2-with-poolparty%2F&amp;t=%22Easily%22%20setup%20a%20monitored%20Hadoop%20%2F%20Hive%20Cluster%20in%20EC2%20with%20PoolParty" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F08%2Feasily-setup-a-monitored-hadoop-hive-cluster-in-ec2-with-poolparty%2F&amp;title=%22Easily%22%20setup%20a%20monitored%20Hadoop%20%2F%20Hive%20Cluster%20in%20EC2%20with%20PoolParty&amp;annotation=%20%0D%0A%20%0D%0ASummary%20%0D%0A%20%0D%0ASetting%20up%20a%20scalable%20Hadoop%20cluster%20isn%27t%20easy%2C%20but%20PoolParty%20makes%20it%20easier%0D%0Aand%20manageable.%20%0D%0A%20%0D%0ABy%20the%20time%20we%27re%20done%20with%20this%20tutorial%20you%27ll%20have%20a%20Hadoop%20cluster%20consisting%20of%20one%20master%20node%20and%20two%20slaves.%20%20The%20slaves%20a" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F08%2Feasily-setup-a-monitored-hadoop-hive-cluster-in-ec2-with-poolparty%2F&amp;t=%22Easily%22%20setup%20a%20monitored%20Hadoop%20%2F%20Hive%20Cluster%20in%20EC2%20with%20PoolParty" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F07%2F08%2Feasily-setup-a-monitored-hadoop-hive-cluster-in-ec2-with-poolparty%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/07/08/easily-setup-a-monitored-hadoop-hive-cluster-in-ec2-with-poolparty/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/07/08/easily-setup-a-monitored-hadoop-hive-cluster-in-ec2-with-poolparty/</feedburner:origLink></item>
		<item>
		<title>git-style-binaries screencast</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/5kfwwahUV3E/</link>
		<comments>http://www.xcombinator.com/2009/06/10/git-style-binaries-screencast/#comments</comments>
		<pubDate>Wed, 10 Jun 2009 18:16:44 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[gems]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/06/10/git-style-binaries-screencast/</guid>
		<description><![CDATA[Just released: git-style-binaries ruby gem. Checkout the README on github.
Checkout the screencast.

Share:
	
	
	
	
	
	
	
	
	

]]></description>
			<content:encoded><![CDATA[<p>Just released: git-style-binaries ruby gem. Checkout <a href="http://github.com/jashmenn/git-style-binaries">the README on github</a>.</p>
<p>Checkout <a href="http://www.xcombinator.com/movies/git-style-binaries.mov">the screencast</a>.</p>
<p><a href='http://www.xcombinator.com/movies/git-style-binaries.mov' title='gsb-screencast'><img src='http://www.xcombinator.com/wp-content/uploads/2009/06/gsb-screencast1.png' alt='gsb-screencast1.png' /></a></p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F06%2F10%2Fgit-style-binaries-screencast%2F&amp;title=git-style-binaries%20screencast&amp;notes=Just%20released%3A%20git-style-binaries%20ruby%20gem.%20Checkout%20the%20README%20on%20github.%0D%0A%0D%0ACheckout%20the%20screencast.%0D%0A%0D%0A%0D%0A%0D%0A" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F06%2F10%2Fgit-style-binaries-screencast%2F&amp;title=git-style-binaries%20screencast" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F06%2F10%2Fgit-style-binaries-screencast%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=git-style-binaries%20screencast%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F06%2F10%2Fgit-style-binaries-screencast%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F06%2F10%2Fgit-style-binaries-screencast%2F&amp;t=git-style-binaries%20screencast" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F06%2F10%2Fgit-style-binaries-screencast%2F&amp;title=git-style-binaries%20screencast&amp;annotation=Just%20released%3A%20git-style-binaries%20ruby%20gem.%20Checkout%20the%20README%20on%20github.%0D%0A%0D%0ACheckout%20the%20screencast.%0D%0A%0D%0A%0D%0A%0D%0A" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F06%2F10%2Fgit-style-binaries-screencast%2F&amp;t=git-style-binaries%20screencast" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F06%2F10%2Fgit-style-binaries-screencast%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/06/10/git-style-binaries-screencast/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>

		<feedburner:origLink>http://www.xcombinator.com/2009/06/10/git-style-binaries-screencast/</feedburner:origLink><enclosure url="http://feedproxy.google.com/~r/xcombinator/~5/jmBJXkECgRw/git-style-binaries.mov" length="135240332" type="video/quicktime" /><feedburner:origEnclosureLink>http://www.xcombinator.com/movies/git-style-binaries.mov</feedburner:origEnclosureLink></item>
		<item>
		<title>adding macruby to multiruby versions</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/MVC0Bdn9dPk/</link>
		<comments>http://www.xcombinator.com/2009/05/22/adding-macruby-to-multiruby-versions/#comments</comments>
		<pubDate>Fri, 22 May 2009 21:11:20 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[ruby]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/05/22/adding-macruby-to-multiruby-versions/</guid>
		<description><![CDATA[Modified from Rob Seaman&#8217;s post.
multiruby is a great way to make sure your ruby code runs on the multitude of ruby versions (it&#8217;s part of ZenTest). It doesn&#8217;t install macruby by default. Here are instructions on how to set it up.
Since I installed macruby from the package installer my macruby files are in /Library/Frameworks/MacRuby.framework/Versions/0.4/. If [...]]]></description>
			<content:encoded><![CDATA[<p>Modified from <a href="http://blog.robseaman.com/2008/12/21/adding-jruby-to-multiruby-versions">Rob Seaman&#8217;s post</a>.</p>
<p>multiruby is a great way to make sure your ruby code runs on the multitude of ruby versions (it&#8217;s part of ZenTest). It doesn&#8217;t install macruby by default. Here are instructions on how to set it up.</p>
<p>Since I installed macruby from the <a href="http://www.macruby.org/">package installer</a> my macruby files are in <code>/Library/Frameworks/MacRuby.framework/Versions/0.4/</code>. If your macruby files are somewhere else, adjust accordingly.</p>
<p>First make sure you already have multiruby setup for other versions (don&#8217;t proceed if this doesn&#8217;t work):<br />
<code><br />
multiruby_setup the_usual<br />
</code></p>
<p>One problem I had with the above command was that I had <code>RUBYOPT</code> set in my .profile. This was calling each of these ruby versions with <code>RUBYOPT=rubygems</code> when trying to install. This won&#8217;t work because rubygems is one of the things you&#8217;re trying to install. Make sure you unset this variable etc. before trying to setup multiruby.</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">ln <span style="color:#006600; font-weight:bold;">-</span>s <span style="color:#006600; font-weight:bold;">/</span>Library<span style="color:#006600; font-weight:bold;">/</span>Frameworks<span style="color:#006600; font-weight:bold;">/</span>MacRuby.<span style="color:#9900CC;">framework</span><span style="color:#006600; font-weight:bold;">/</span>Versions<span style="color:#006600; font-weight:bold;">/</span><span style="color:#006666;">0.4</span><span style="color:#006600; font-weight:bold;">/</span>usr ~<span style="color:#006600; font-weight:bold;">/</span>.<span style="color:#9900CC;">multiruby</span><span style="color:#006600; font-weight:bold;">/</span>install<span style="color:#006600; font-weight:bold;">/</span>macruby<span style="color:#006600; font-weight:bold;">-</span>0.0.4
sudo ln <span style="color:#006600; font-weight:bold;">-</span>s ~<span style="color:#006600; font-weight:bold;">/</span>.<span style="color:#9900CC;">multiruby</span><span style="color:#006600; font-weight:bold;">/</span>install<span style="color:#006600; font-weight:bold;">/</span>macruby<span style="color:#006600; font-weight:bold;">-</span>0.0.4<span style="color:#006600; font-weight:bold;">/</span>bin<span style="color:#006600; font-weight:bold;">/</span>macruby ~<span style="color:#006600; font-weight:bold;">/</span>.<span style="color:#9900CC;">multiruby</span><span style="color:#006600; font-weight:bold;">/</span>install<span style="color:#006600; font-weight:bold;">/</span>macruby<span style="color:#006600; font-weight:bold;">-</span>0.0.4<span style="color:#006600; font-weight:bold;">/</span>bin<span style="color:#006600; font-weight:bold;">/</span>ruby
sudo ln <span style="color:#006600; font-weight:bold;">-</span>s ~<span style="color:#006600; font-weight:bold;">/</span>.<span style="color:#9900CC;">multiruby</span><span style="color:#006600; font-weight:bold;">/</span>install<span style="color:#006600; font-weight:bold;">/</span>macruby<span style="color:#006600; font-weight:bold;">-</span>0.0.4<span style="color:#006600; font-weight:bold;">/</span>bin<span style="color:#006600; font-weight:bold;">/</span>macgem ~<span style="color:#006600; font-weight:bold;">/</span>.<span style="color:#9900CC;">multiruby</span><span style="color:#006600; font-weight:bold;">/</span>install<span style="color:#006600; font-weight:bold;">/</span>macruby<span style="color:#006600; font-weight:bold;">-</span>0.0.4<span style="color:#006600; font-weight:bold;">/</span>bin<span style="color:#006600; font-weight:bold;">/</span>gem
touch ~<span style="color:#006600; font-weight:bold;">/</span>.<span style="color:#9900CC;">multiruby</span><span style="color:#006600; font-weight:bold;">/</span>versions<span style="color:#006600; font-weight:bold;">/</span>macruby<span style="color:#006600; font-weight:bold;">-</span>0.0.4.<span style="color:#9900CC;">tar</span>.<span style="color:#9900CC;">gz</span> <span style="color:#008000; font-style:italic;"># fake-out</span></pre></div></div>

<p>Now try:<br />
<code><br />
multiruby -e "p 1+1"<br />
</code></p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F22%2Fadding-macruby-to-multiruby-versions%2F&amp;title=adding%20macruby%20to%20multiruby%20versions&amp;notes=Modified%20from%20Rob%20Seaman%27s%20post.%0D%0A%0D%0Amultiruby%20is%20a%20great%20way%20to%20make%20sure%20your%20ruby%20code%20runs%20on%20the%20multitude%20of%20ruby%20versions%20%28it%27s%20part%20of%20ZenTest%29.%20It%20doesn%27t%20install%20macruby%20by%20default.%20Here%20are%20instructions%20on%20how%20to%20set%20it%20up.%0D%0A%0D%0ASince%20I%20insta" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F22%2Fadding-macruby-to-multiruby-versions%2F&amp;title=adding%20macruby%20to%20multiruby%20versions" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F22%2Fadding-macruby-to-multiruby-versions%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=adding%20macruby%20to%20multiruby%20versions%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F22%2Fadding-macruby-to-multiruby-versions%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F22%2Fadding-macruby-to-multiruby-versions%2F&amp;t=adding%20macruby%20to%20multiruby%20versions" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F22%2Fadding-macruby-to-multiruby-versions%2F&amp;title=adding%20macruby%20to%20multiruby%20versions&amp;annotation=Modified%20from%20Rob%20Seaman%27s%20post.%0D%0A%0D%0Amultiruby%20is%20a%20great%20way%20to%20make%20sure%20your%20ruby%20code%20runs%20on%20the%20multitude%20of%20ruby%20versions%20%28it%27s%20part%20of%20ZenTest%29.%20It%20doesn%27t%20install%20macruby%20by%20default.%20Here%20are%20instructions%20on%20how%20to%20set%20it%20up.%0D%0A%0D%0ASince%20I%20insta" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F22%2Fadding-macruby-to-multiruby-versions%2F&amp;t=adding%20macruby%20to%20multiruby%20versions" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F22%2Fadding-macruby-to-multiruby-versions%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/05/22/adding-macruby-to-multiruby-versions/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/05/22/adding-macruby-to-multiruby-versions/</feedburner:origLink></item>
		<item>
		<title>tweet twitter util – cli feedback on tweet length</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/6pPhbO_VsrQ/</link>
		<comments>http://www.xcombinator.com/2009/05/21/tweet-twitter-util-cli-feedback-on-tweet-length/#comments</comments>
		<pubDate>Thu, 21 May 2009 15:01:21 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[ruby]]></category>
		<category><![CDATA[shell]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/05/21/tweet-twitter-util-cli-feedback-on-tweet-length/</guid>
		<description><![CDATA[One of the nice things about the Twitter web-interface is that it gives you feedback on the number of characters you&#8217;ve typed so far.

 Since I use the command-line twitter gem (by John Nunemaker) I miss out on this feature and find myself using echo and wc to make sure I&#8217;m not over the limit.
To [...]]]></description>
			<content:encoded><![CDATA[<p>One of the nice things about the Twitter web-interface is that it gives you feedback on the number of characters you&#8217;ve typed so far.<a href='http://www.xcombinator.com/wp-content/uploads/2009/05/twitter-warning.png' title='twitter-warning.png'><img src='http://www.xcombinator.com/wp-content/uploads/2009/05/twitter-warning.thumbnail.png' alt='twitter-warning.png' /></a>
</p>
<p> Since I use the command-line <a href="http://twitter.rubyforge.org/"><tt>twitter</tt> gem</a> (by John Nunemaker) I miss out on this feature and find myself using <tt>echo</tt> and <tt>wc</tt> to make sure I&#8217;m not over the limit.</p>
<p>To fix this problem, I&#8217;ve written a small, immediate-feedback utility called <tt>tweet</tt>. It shows you the number of characters typed so far and colorizes them accordingly.</p>
<p> Screenshots: </p>
<p>warning:<br />
<a href='http://www.xcombinator.com/wp-content/uploads/2009/05/tweet-warning.png' title='tweet-warning.png'><img src='http://www.xcombinator.com/wp-content/uploads/2009/05/tweet-warning.png' alt='tweet-warning.png' /></a>
</p>
<p>over:<br />
<a href='http://www.xcombinator.com/wp-content/uploads/2009/05/tweet-overage.png' title='tweet-overage.png'><img src='http://www.xcombinator.com/wp-content/uploads/2009/05/tweet-overage.png' alt='tweet-overage.png' /></a>
</p>
<p>
Just download this script and save it as <tt>tweet</tt> somewhere in your <tt>$PATH</tt><br />
<script src="http://gist.github.com/112317.js"></script>
</p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F21%2Ftweet-twitter-util-cli-feedback-on-tweet-length%2F&amp;title=tweet%20twitter%20util%20-%20cli%20feedback%20on%20tweet%20length&amp;notes=One%20of%20the%20nice%20things%20about%20the%20Twitter%20web-interface%20is%20that%20it%20gives%20you%20feedback%20on%20the%20number%20of%20characters%20you%27ve%20typed%20so%20far.%0D%0A%0D%0A%0D%0A%20Since%20I%20use%20the%20command-line%20twitter%20gem%20%28by%20John%20Nunemaker%29%20I%20miss%20out%20on%20this%20feature%20and%20find%20myself%20using%20" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F21%2Ftweet-twitter-util-cli-feedback-on-tweet-length%2F&amp;title=tweet%20twitter%20util%20-%20cli%20feedback%20on%20tweet%20length" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F21%2Ftweet-twitter-util-cli-feedback-on-tweet-length%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=tweet%20twitter%20util%20-%20cli%20feedback%20on%20tweet%20length%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F21%2Ftweet-twitter-util-cli-feedback-on-tweet-length%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F21%2Ftweet-twitter-util-cli-feedback-on-tweet-length%2F&amp;t=tweet%20twitter%20util%20-%20cli%20feedback%20on%20tweet%20length" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F21%2Ftweet-twitter-util-cli-feedback-on-tweet-length%2F&amp;title=tweet%20twitter%20util%20-%20cli%20feedback%20on%20tweet%20length&amp;annotation=One%20of%20the%20nice%20things%20about%20the%20Twitter%20web-interface%20is%20that%20it%20gives%20you%20feedback%20on%20the%20number%20of%20characters%20you%27ve%20typed%20so%20far.%0D%0A%0D%0A%0D%0A%20Since%20I%20use%20the%20command-line%20twitter%20gem%20%28by%20John%20Nunemaker%29%20I%20miss%20out%20on%20this%20feature%20and%20find%20myself%20using%20" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F21%2Ftweet-twitter-util-cli-feedback-on-tweet-length%2F&amp;t=tweet%20twitter%20util%20-%20cli%20feedback%20on%20tweet%20length" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F21%2Ftweet-twitter-util-cli-feedback-on-tweet-length%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/05/21/tweet-twitter-util-cli-feedback-on-tweet-length/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/05/21/tweet-twitter-util-cli-feedback-on-tweet-length/</feedburner:origLink></item>
		<item>
		<title>temporarily undo commit(s) on a remote server</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/v8hrCUth3M8/</link>
		<comments>http://www.xcombinator.com/2009/05/20/temporarily-undo-commits-on-a-remote-server/#comments</comments>
		<pubDate>Wed, 20 May 2009 23:51:20 +0000</pubDate>
		<dc:creator>brian</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/05/20/temporarily-undo-commits-on-a-remote-server/</guid>
		<description><![CDATA[I do not claim to be a wiz at git, and I do not ensure what I am writing about, but it seemed to work for me, and I appreciate any comments.
My goal was to temporarily revert one or many commits that I had pushed to the remote server.
http://cheat.errtheblog.com/s/git &#8212; specifically the &#8220;Fix mistakes / [...]]]></description>
			<content:encoded><![CDATA[<p>I do not claim to be a wiz at git, and I do not ensure what I am writing about, but it seemed to work for me, and I appreciate any comments.</p>
<p>My goal was to temporarily revert one or many commits that I had pushed to the remote server.</p>
<p><a href="http://cheat.errtheblog.com/s/git">http://cheat.errtheblog.com/s/git</a> &#8212; specifically the &#8220;<span highlight="Search">Fix</span> mistakes / Undo&#8221; section was helpful.</p>
<p>What I found:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">git revert <span style="color: #660033;">-n</span> <span style="color: #000000; font-weight: bold;">&amp;</span>lt;sha<span style="color: #000000; font-weight: bold;">&amp;</span>gt;</pre></div></div>

<p>#run this for each commit you would like to &#8220;undo&#8221;</p>
<p>(the -n makes it so that you are not actually creating a commit, but staging the reverse of the changes made by your &lt;sha&gt; commit in your index. git status will show you this)</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">git ci <span style="color: #660033;">-a</span> <span style="color: #666666; font-style: italic;"># apply your revision</span>
git push <span style="color: #666666; font-style: italic;">#origin to master (these steps effectively created one commit that was the product of reversing all the commits you picked in the git revert -n step)</span></pre></div></div>

<p>now your index looks like:</p>
<li>&lt;sha1&gt;&#8230; revision of &lt;commit&#8230;s&gt;</li>
<li>&lt;sha2&gt;&#8230; commit4</li>
<li>&lt;sha3&gt;&#8230; commit3</li>
<p>now, lets say, the time has come to reapply your commits. Because you didn&#8217;t just do</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">git reset <span style="color: #660033;">--hard</span> <span style="color: #000000; font-weight: bold;">&amp;</span>lt;commit3<span style="color: #000000; font-weight: bold;">&amp;</span>gt;</pre></div></div>

<p>or something like that, all you have to do is git reset &#8211;hard &lt;sha1&gt; which will &#8220;undo your undo&#8221;</p>
<p>then</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">git push <span style="color: #666666; font-style: italic;">#origin master</span></pre></div></div>

<p> again and you are back to where you were.</p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F20%2Ftemporarily-undo-commits-on-a-remote-server%2F&amp;title=temporarily%20undo%20commit%28s%29%20on%20a%20remote%20server&amp;notes=I%20do%20not%20claim%20to%20be%20a%20wiz%20at%20git%2C%20and%20I%20do%20not%20ensure%20what%20I%20am%20writing%20about%2C%20but%20it%20seemed%20to%20work%20for%20me%2C%20and%20I%20appreciate%20any%20comments.%0D%0A%0D%0AMy%20goal%20was%20to%20temporarily%20revert%20one%20or%20many%20commits%20that%20I%20had%20pushed%20to%20the%20remote%20server.%0D%0A%0D%0Ahttp%3A%2F%2Fch" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F20%2Ftemporarily-undo-commits-on-a-remote-server%2F&amp;title=temporarily%20undo%20commit%28s%29%20on%20a%20remote%20server" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F20%2Ftemporarily-undo-commits-on-a-remote-server%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=temporarily%20undo%20commit%28s%29%20on%20a%20remote%20server%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F20%2Ftemporarily-undo-commits-on-a-remote-server%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F20%2Ftemporarily-undo-commits-on-a-remote-server%2F&amp;t=temporarily%20undo%20commit%28s%29%20on%20a%20remote%20server" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F20%2Ftemporarily-undo-commits-on-a-remote-server%2F&amp;title=temporarily%20undo%20commit%28s%29%20on%20a%20remote%20server&amp;annotation=I%20do%20not%20claim%20to%20be%20a%20wiz%20at%20git%2C%20and%20I%20do%20not%20ensure%20what%20I%20am%20writing%20about%2C%20but%20it%20seemed%20to%20work%20for%20me%2C%20and%20I%20appreciate%20any%20comments.%0D%0A%0D%0AMy%20goal%20was%20to%20temporarily%20revert%20one%20or%20many%20commits%20that%20I%20had%20pushed%20to%20the%20remote%20server.%0D%0A%0D%0Ahttp%3A%2F%2Fch" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F20%2Ftemporarily-undo-commits-on-a-remote-server%2F&amp;t=temporarily%20undo%20commit%28s%29%20on%20a%20remote%20server" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F05%2F20%2Ftemporarily-undo-commits-on-a-remote-server%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/05/20/temporarily-undo-commits-on-a-remote-server/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/05/20/temporarily-undo-commits-on-a-remote-server/</feedburner:origLink></item>
		<item>
		<title>runaway process… on a mac</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/6RqcEmGeiOY/</link>
		<comments>http://www.xcombinator.com/2009/04/18/runaway-process-on-a-mac/#comments</comments>
		<pubDate>Sat, 18 Apr 2009 23:58:59 +0000</pubDate>
		<dc:creator>brian</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/04/18/runaway-process-on-a-mac/</guid>
		<description><![CDATA[wresting with the command line, and some cool cli programs I learned on the way
I want desperately to know how to detach a process from one terminal, and re-attach it to another, without using screen from the get-go. The more I read about it, the more I figure that the answer (you can&#8217;t) probably has [...]]]></description>
			<content:encoded><![CDATA[<p>wresting with the command line, and some cool cli programs I learned on the way</p>
<p>I want desperately to know how to detach a process from one terminal, and re-attach it to another, without using screen from the get-go. The more I read about it, the more I figure that the answer (you can&#8217;t) probably has more to do with my lack of understanding of how processes and terminals work. I read a great post <a href="http://www.xaprb.com/blog/2008/08/01/how-to-leave-a-program-running-after-you-log-out/">here</a> that introduced me to <tt>disown -h</tt> (careful) and <tt>nohup</tt>, some really great bash builtins. I thought, ok, lets try it. This is where I got stuck.</p>
<p>I tried</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">while</span> <span style="color: #c20cb9; font-weight: bold;">true</span>; <span style="color: #000000; font-weight: bold;">do</span> <span style="color: #c20cb9; font-weight: bold;">sleep</span> <span style="color: #000000;">10</span>; <span style="color: #000000; font-weight: bold;">done</span> <span style="color: #000000; font-weight: bold;">&amp;</span>amp;
<span style="color: #7a0874; font-weight: bold;">disown</span> <span style="color: #660033;">-h</span> <span style="color: #000000; font-weight: bold;">%</span>1
<span style="color: #7a0874; font-weight: bold;">exit</span></pre></div></div>

<p>The disown builtin handles a problem with background processes: From the bash man page (and also the above blog)</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">The shell exits by default upon receipt of a SIGHUP</pre></div></div>

<p>therefore killing your process you &#8216;thought&#8217; you put in the background, and then logged out to go home for the night. Disown tells the jobspec not to accept a SIGHUP, and the -h switch tells it to remain in the jobs table. I thought, cool, maybe if it stays in the the jobs table, I could also transfer it to another jobs table of another tty. (no, you can&#8217;t&#8230;)<br />
but now I had a process on my hands that wasn&#8217;t attached to a terminal, and would just run forever unless I rebooted.</p>
<p>The while loop itself didn&#8217;t have a process ID, which is interesting, and because of the nature of while, the sleep commands PID kept changing, so a normal <tt>ps aux | grep slee[p] | awk '{print $2}' | xargs kill -9</tt> wasn&#8217;t working. (This post is loosing topic fast, but the <tt>slee[p]</tt> in the above command was a cool trick I learned so that I didn&#8217;t need a <tt>grep -v grep</tt> in there).</p>
<p>I *did* find that I could use <tt>ps</tt> to figure out the ppid (parent process ID) and just kill -9 that, but I was also interested in knowing for sure that it wasn&#8217;t in charge of doing something else important. A little digging around, and I came across the UNIX utility <tt>pstree</tt> which of course didn&#8217;t come on my mac, but I quickly figured out that it could be installed with</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">sudo</span> port <span style="color: #c20cb9; font-weight: bold;">install</span> <span style="color: #c20cb9; font-weight: bold;">pstree</span></pre></div></div>

<p>Yesterday, I had done a similar thing with the UNIX command <tt>watch</tt>, which also nicely installed using <tt>port</tt><br />
And, for those who don&#8217;t know, the UNIX command <tt>watch</tt> is a great poller utility, that will display the first screen&#8217;s worth of output of any command, and update it on a regular basis.</p>
<p>I used <tt>ps | grep</tt> to find the ppid of the sleep process, then ran this command:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">watch <span style="color: #ff0000;">&quot;pstree <span style="color: #007800;">$PPID</span>&quot;</span></pre></div></div>

<p>This was way cool, as every ten seconds, I watched as the PID of sleep (the child process of this bash process I had just found) changed.</p>
<p>Take away:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">sudo</span> port <span style="color: #c20cb9; font-weight: bold;">install</span> watch
<span style="color: #c20cb9; font-weight: bold;">sudo</span> port <span style="color: #c20cb9; font-weight: bold;">install</span> <span style="color: #c20cb9; font-weight: bold;">pstree</span></pre></div></div>

<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F18%2Frunaway-process-on-a-mac%2F&amp;title=runaway%20process...%20on%20a%20mac&amp;notes=wresting%20with%20the%20command%20line%2C%20and%20some%20cool%20cli%20programs%20I%20learned%20on%20the%20way%0D%0A%0D%0AI%20want%20desperately%20to%20know%20how%20to%20detach%20a%20process%20from%20one%20terminal%2C%20and%20re-attach%20it%20to%20another%2C%20without%20using%20screen%20from%20the%20get-go.%20The%20more%20I%20read%20about%20it%2C%20the%20" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F18%2Frunaway-process-on-a-mac%2F&amp;title=runaway%20process...%20on%20a%20mac" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F18%2Frunaway-process-on-a-mac%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=runaway%20process...%20on%20a%20mac%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F18%2Frunaway-process-on-a-mac%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F18%2Frunaway-process-on-a-mac%2F&amp;t=runaway%20process...%20on%20a%20mac" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F18%2Frunaway-process-on-a-mac%2F&amp;title=runaway%20process...%20on%20a%20mac&amp;annotation=wresting%20with%20the%20command%20line%2C%20and%20some%20cool%20cli%20programs%20I%20learned%20on%20the%20way%0D%0A%0D%0AI%20want%20desperately%20to%20know%20how%20to%20detach%20a%20process%20from%20one%20terminal%2C%20and%20re-attach%20it%20to%20another%2C%20without%20using%20screen%20from%20the%20get-go.%20The%20more%20I%20read%20about%20it%2C%20the%20" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F18%2Frunaway-process-on-a-mac%2F&amp;t=runaway%20process...%20on%20a%20mac" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F18%2Frunaway-process-on-a-mac%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/04/18/runaway-process-on-a-mac/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/04/18/runaway-process-on-a-mac/</feedburner:origLink></item>
		<item>
		<title>starting screen as a login shell</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/iMySJuy-74Y/</link>
		<comments>http://www.xcombinator.com/2009/04/14/starting-screen-as-a-login-shell/#comments</comments>
		<pubDate>Wed, 15 Apr 2009 00:21:52 +0000</pubDate>
		<dc:creator>brian</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/04/14/starting-screen-as-a-login-shell/</guid>
		<description><![CDATA[I realized a while ago that I wanted to invoke my shell as a login shell when starting screen. This of course gives me immediate access to things in my .profile like aliases, etc.

shell -bash

Everywhere I read said to put that in your .screenrc file. On my Mac, OS X 10.5, however it seemed to [...]]]></description>
			<content:encoded><![CDATA[<p>I realized a while ago that I wanted to invoke my shell as a login shell when starting screen. This of course gives me immediate access to things in my .profile like aliases, etc.</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">shell <span style="color: #660033;">-bash</span></pre></div></div>

<p>Everywhere I read said to put that in your .screenrc file. On my Mac, OS X 10.5, however it seemed to have the (extremely) unwanted circumstance of changing my cwd whenever I started screen to my $HOME folder.</p>
<p>I was able to do some testing my remoting into a *fairly* vanilla fedora box and running the command:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">cd</span> <span style="color: #000000; font-weight: bold;">/</span>; <span style="color: #c20cb9; font-weight: bold;">screen</span> <span style="color: #660033;">-s</span> <span style="color: #660033;">-bash</span></pre></div></div>

<p>This started bash as a login shell</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">shopt</span> login_shell <span style="color: #666666; font-style: italic;">#=&amp;gt; login_shell    	on</span></pre></div></div>

<p>but did not change my cwd.</p>
<p>I tried moving my .profile file to see if it was the culprit.</p>
<p>I finally came up with this hackish solution.</p>
<p>create file</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">screen_shell</pre></div></div>

<p>and place in it the line</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #007800;">$SHELL</span> <span style="color: #660033;">-l</span></pre></div></div>

<p>I knew from other testing that</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">shell <span style="color: #ff0000;">'bash -l'</span></pre></div></div>

<p>didn&#8217;t work, course, now, having remembered that I had a similar experience with vi&#8230; I wonder if</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">shell <span style="color: #c20cb9; font-weight: bold;">bash</span>\ <span style="color: #660033;">-l</span></pre></div></div>

<p>would work&#8230; (nope&#8230;)</p>
<p>so, I then changed my .screenrc file to read</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">shell <span style="color: #000000; font-weight: bold;">/</span>Users<span style="color: #000000; font-weight: bold;">/</span>bhenderson<span style="color: #000000; font-weight: bold;">/</span>.screen_shell</pre></div></div>

<p>made sure that I did a</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">chmod</span> u+x ~<span style="color: #000000; font-weight: bold;">/</span>.screen_shell</pre></div></div>

<p>and off I went.</p>
<p>If anyone would like to add to my understanding of what was going on, or a better solution, please feel free to comment. thanks.</p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F14%2Fstarting-screen-as-a-login-shell%2F&amp;title=starting%20screen%20as%20a%20login%20shell&amp;notes=I%20realized%20a%20while%20ago%20that%20I%20wanted%20to%20invoke%20my%20shell%20as%20a%20login%20shell%20when%20starting%20screen.%20This%20of%20course%20gives%20me%20immediate%20access%20to%20things%20in%20my%20.profile%20like%20aliases%2C%20etc.%0D%0A%0D%0Ashell%20-bash%0D%0AEverywhere%20I%20read%20said%20to%20put%20that%20in%20your%20.screenrc%20f" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F14%2Fstarting-screen-as-a-login-shell%2F&amp;title=starting%20screen%20as%20a%20login%20shell" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F14%2Fstarting-screen-as-a-login-shell%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=starting%20screen%20as%20a%20login%20shell%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F14%2Fstarting-screen-as-a-login-shell%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F14%2Fstarting-screen-as-a-login-shell%2F&amp;t=starting%20screen%20as%20a%20login%20shell" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F14%2Fstarting-screen-as-a-login-shell%2F&amp;title=starting%20screen%20as%20a%20login%20shell&amp;annotation=I%20realized%20a%20while%20ago%20that%20I%20wanted%20to%20invoke%20my%20shell%20as%20a%20login%20shell%20when%20starting%20screen.%20This%20of%20course%20gives%20me%20immediate%20access%20to%20things%20in%20my%20.profile%20like%20aliases%2C%20etc.%0D%0A%0D%0Ashell%20-bash%0D%0AEverywhere%20I%20read%20said%20to%20put%20that%20in%20your%20.screenrc%20f" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F14%2Fstarting-screen-as-a-login-shell%2F&amp;t=starting%20screen%20as%20a%20login%20shell" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F04%2F14%2Fstarting-screen-as-a-login-shell%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/04/14/starting-screen-as-a-login-shell/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/04/14/starting-screen-as-a-login-shell/</feedburner:origLink></item>
		<item>
		<title>rsync vs. cp</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/8VWYIHsy3Qo/</link>
		<comments>http://www.xcombinator.com/2009/03/03/rsync-vs-cp/#comments</comments>
		<pubDate>Tue, 03 Mar 2009 21:40:49 +0000</pubDate>
		<dc:creator>brian</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/03/03/rsync-vs-cp/</guid>
		<description><![CDATA[
rsync -a /path/to/bar/ /path/to/foo/  # =&#38;gt; /path/to/foo/[contents of bar]
rsync -a /path/to/bar /path/to/foo/   # =&#38;gt; /path/to/foo/bar/[contents of bar]
&#160;
cp -a /path/to/bar/ /path/to/foo/     # =&#38;gt; /path/to/foo/bar
cp -a /path/to/bar /path/to/foo/      # =&#38;gt; /path/to/foo/bar

why can&#8217;t cp have the same source behavior as rsync?
btw, I found the solution on [...]]]></description>
			<content:encoded><![CDATA[
<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">rsync <span style="color: #660033;">-a</span> <span style="color: #000000; font-weight: bold;">/</span>path<span style="color: #000000; font-weight: bold;">/</span>to<span style="color: #000000; font-weight: bold;">/</span>bar<span style="color: #000000; font-weight: bold;">/</span> <span style="color: #000000; font-weight: bold;">/</span>path<span style="color: #000000; font-weight: bold;">/</span>to<span style="color: #000000; font-weight: bold;">/</span>foo<span style="color: #000000; font-weight: bold;">/</span>  <span style="color: #666666; font-style: italic;"># =&amp;gt; /path/to/foo/[contents of bar]</span>
rsync <span style="color: #660033;">-a</span> <span style="color: #000000; font-weight: bold;">/</span>path<span style="color: #000000; font-weight: bold;">/</span>to<span style="color: #000000; font-weight: bold;">/</span>bar <span style="color: #000000; font-weight: bold;">/</span>path<span style="color: #000000; font-weight: bold;">/</span>to<span style="color: #000000; font-weight: bold;">/</span>foo<span style="color: #000000; font-weight: bold;">/</span>   <span style="color: #666666; font-style: italic;"># =&amp;gt; /path/to/foo/bar/[contents of bar]</span>
&nbsp;
<span style="color: #c20cb9; font-weight: bold;">cp</span> <span style="color: #660033;">-a</span> <span style="color: #000000; font-weight: bold;">/</span>path<span style="color: #000000; font-weight: bold;">/</span>to<span style="color: #000000; font-weight: bold;">/</span>bar<span style="color: #000000; font-weight: bold;">/</span> <span style="color: #000000; font-weight: bold;">/</span>path<span style="color: #000000; font-weight: bold;">/</span>to<span style="color: #000000; font-weight: bold;">/</span>foo<span style="color: #000000; font-weight: bold;">/</span>     <span style="color: #666666; font-style: italic;"># =&amp;gt; /path/to/foo/bar</span>
<span style="color: #c20cb9; font-weight: bold;">cp</span> <span style="color: #660033;">-a</span> <span style="color: #000000; font-weight: bold;">/</span>path<span style="color: #000000; font-weight: bold;">/</span>to<span style="color: #000000; font-weight: bold;">/</span>bar <span style="color: #000000; font-weight: bold;">/</span>path<span style="color: #000000; font-weight: bold;">/</span>to<span style="color: #000000; font-weight: bold;">/</span>foo<span style="color: #000000; font-weight: bold;">/</span>      <span style="color: #666666; font-style: italic;"># =&amp;gt; /path/to/foo/bar</span></pre></div></div>

<p>why can&#8217;t cp have the same source behavior as rsync?</p>
<p>btw, I found the solution on this <a href="http://blog.maisnam.com/archives/000109.php">blog:</a></p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">mkdir</span> target_directory
<span style="color: #7a0874; font-weight: bold;">cd</span> source_directory
<span style="color: #c20cb9; font-weight: bold;">cp</span> <span style="color: #660033;">-a</span> . target_directory</pre></div></div>

<p>(on the blog, the author does cp -ap, but that looks redundant according to the man page)</p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F03%2F03%2Frsync-vs-cp%2F&amp;title=rsync%20vs.%20cp&amp;notes=%0D%0A%0D%0Arsync%20-a%20%2Fpath%2Fto%2Fbar%2F%20%2Fpath%2Fto%2Ffoo%2F%20%20%23%20%3D%26gt%3B%20%2Fpath%2Fto%2Ffoo%2F%5Bcontents%20of%20bar%5D%0D%0Arsync%20-a%20%2Fpath%2Fto%2Fbar%20%2Fpath%2Fto%2Ffoo%2F%20%20%20%23%20%3D%26gt%3B%20%2Fpath%2Fto%2Ffoo%2Fbar%2F%5Bcontents%20of%20bar%5D%0D%0A%0D%0Acp%20-a%20%2Fpath%2Fto%2Fbar%2F%20%2Fpath%2Fto%2Ffoo%2F%20%20%20%20%20%23%20%3D%26gt%3B%20%2Fpath%2Fto%2Ffoo%2Fbar%0D%0Acp%20-a%20%2Fpath%2Fto%2Fbar%20%2F" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F03%2F03%2Frsync-vs-cp%2F&amp;title=rsync%20vs.%20cp" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F03%2F03%2Frsync-vs-cp%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=rsync%20vs.%20cp%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F03%2F03%2Frsync-vs-cp%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F03%2F03%2Frsync-vs-cp%2F&amp;t=rsync%20vs.%20cp" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F03%2F03%2Frsync-vs-cp%2F&amp;title=rsync%20vs.%20cp&amp;annotation=%0D%0A%0D%0Arsync%20-a%20%2Fpath%2Fto%2Fbar%2F%20%2Fpath%2Fto%2Ffoo%2F%20%20%23%20%3D%26gt%3B%20%2Fpath%2Fto%2Ffoo%2F%5Bcontents%20of%20bar%5D%0D%0Arsync%20-a%20%2Fpath%2Fto%2Fbar%20%2Fpath%2Fto%2Ffoo%2F%20%20%20%23%20%3D%26gt%3B%20%2Fpath%2Fto%2Ffoo%2Fbar%2F%5Bcontents%20of%20bar%5D%0D%0A%0D%0Acp%20-a%20%2Fpath%2Fto%2Fbar%2F%20%2Fpath%2Fto%2Ffoo%2F%20%20%20%20%20%23%20%3D%26gt%3B%20%2Fpath%2Fto%2Ffoo%2Fbar%0D%0Acp%20-a%20%2Fpath%2Fto%2Fbar%20%2F" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F03%2F03%2Frsync-vs-cp%2F&amp;t=rsync%20vs.%20cp" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F03%2F03%2Frsync-vs-cp%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/03/03/rsync-vs-cp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/03/03/rsync-vs-cp/</feedburner:origLink></item>
		<item>
		<title>what does this return? or ‘why i love ruby’</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/B4IlXAEvhY4/</link>
		<comments>http://www.xcombinator.com/2009/02/19/what-does-this-return-or-why-i-love-ruby/#comments</comments>
		<pubDate>Thu, 19 Feb 2009 17:18:51 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/02/19/what-does-this-return-or-why-i-love-ruby/</guid>
		<description><![CDATA[
def x
  5
end
&#160;
if false
  x = 3
end
&#160;
puts x # =&#62; ?

Share:
	
	
	
	
	
	
	
	
	

]]></description>
			<content:encoded><![CDATA[
<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#9966CC; font-weight:bold;">def</span> x
  <span style="color:#006666;">5</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
<span style="color:#9966CC; font-weight:bold;">if</span> <span style="color:#0000FF; font-weight:bold;">false</span>
  x = <span style="color:#006666;">3</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
<span style="color:#CC0066; font-weight:bold;">puts</span> x <span style="color:#008000; font-style:italic;"># =&gt; ?</span></pre></div></div>

<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F19%2Fwhat-does-this-return-or-why-i-love-ruby%2F&amp;title=what%20does%20this%20return%3F%20or%20%27why%20i%20love%20ruby%27&amp;notes=%0D%0Adef%20x%0D%0A%20%205%0D%0Aend%0D%0A%0D%0Aif%20false%0D%0A%20%20x%20%3D%203%0D%0Aend%0D%0A%0D%0Aputs%20x%20%23%20%3D%3E%20%3F%0D%0A" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F19%2Fwhat-does-this-return-or-why-i-love-ruby%2F&amp;title=what%20does%20this%20return%3F%20or%20%27why%20i%20love%20ruby%27" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F19%2Fwhat-does-this-return-or-why-i-love-ruby%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=what%20does%20this%20return%3F%20or%20%27why%20i%20love%20ruby%27%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F19%2Fwhat-does-this-return-or-why-i-love-ruby%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F19%2Fwhat-does-this-return-or-why-i-love-ruby%2F&amp;t=what%20does%20this%20return%3F%20or%20%27why%20i%20love%20ruby%27" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F19%2Fwhat-does-this-return-or-why-i-love-ruby%2F&amp;title=what%20does%20this%20return%3F%20or%20%27why%20i%20love%20ruby%27&amp;annotation=%0D%0Adef%20x%0D%0A%20%205%0D%0Aend%0D%0A%0D%0Aif%20false%0D%0A%20%20x%20%3D%203%0D%0Aend%0D%0A%0D%0Aputs%20x%20%23%20%3D%3E%20%3F%0D%0A" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F19%2Fwhat-does-this-return-or-why-i-love-ruby%2F&amp;t=what%20does%20this%20return%3F%20or%20%27why%20i%20love%20ruby%27" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F19%2Fwhat-does-this-return-or-why-i-love-ruby%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/02/19/what-does-this-return-or-why-i-love-ruby/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/02/19/what-does-this-return-or-why-i-love-ruby/</feedburner:origLink></item>
		<item>
		<title>Ruby’s #each_with_index for Erlang</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/FudXB0UIysU/</link>
		<comments>http://www.xcombinator.com/2009/02/10/rubys-each_with_index-for-erlang/#comments</comments>
		<pubDate>Tue, 10 Feb 2009 19:30:49 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[erlang]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/02/10/rubys-each_with_index-for-erlang/</guid>
		<description><![CDATA[
Ruby has a great method for enumerations called #each_with_index. It&#8217;s a handy way to iterate over a list of elements and and know where you&#8217;re at while doing it.

Erlang has a group of highly optimized list operations in the lists module (Try erl -man lists to see them all). We&#8217;re going to be using those [...]]]></description>
			<content:encoded><![CDATA[<p>
Ruby has a great method for enumerations called <tt>#each_with_index</tt>. It&#8217;s a handy way to iterate over a list of elements and and know where you&#8217;re at while doing it.</p>
<p>
Erlang has a group of highly optimized list operations in the <tt>lists</tt> module (Try <tt>erl -man lists</tt> to see them all). We&#8217;re going to be using those to build our Erlang version of <tt>#each_with_index</tt>.
</p>
<p>
There are two methods from <tt>list</tt> we&#8217;re going to use: <tt>lists:seq</tt> and <tt>lists:zip</tt>. <tt>lists:seq</tt> simply returns a list of integers between a given range. For instance:</p>

<div class="wp_syntax"><div class="code"><pre class="erlang" style="font-family:monospace;"><span style="color: #ff9600;">1</span><span style="color: #014ea4;">&gt;</span> <span style="color: #45b3e6;">Seq</span> <span style="color: #014ea4;">=</span> <span style="color: #ff4e18;">lists</span>:<span style="color: #ff3c00;">seq</span><span style="color: #109ab8;">&#40;</span><span style="color: #ff9600;">0</span><span style="color: #6bb810;">,</span> <span style="color: #ff9600;">9</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">.</span>
<span style="color: #109ab8;">&#91;</span><span style="color: #ff9600;">0</span><span style="color: #6bb810;">,</span><span style="color: #ff9600;">1</span><span style="color: #6bb810;">,</span><span style="color: #ff9600;">2</span><span style="color: #6bb810;">,</span><span style="color: #ff9600;">3</span><span style="color: #6bb810;">,</span><span style="color: #ff9600;">4</span><span style="color: #6bb810;">,</span><span style="color: #ff9600;">5</span><span style="color: #6bb810;">,</span><span style="color: #ff9600;">6</span><span style="color: #6bb810;">,</span><span style="color: #ff9600;">7</span><span style="color: #6bb810;">,</span><span style="color: #ff9600;">8</span><span style="color: #6bb810;">,</span><span style="color: #ff9600;">9</span><span style="color: #109ab8;">&#93;</span></pre></div></div>

<p><tt>lists:zip</tt> takes two lists (of equal length) and returns a list of tuples. Like so:</p>

<div class="wp_syntax"><div class="code"><pre class="erlang" style="font-family:monospace;"><span style="color: #ff9600;">2</span><span style="color: #014ea4;">&gt;</span> <span style="color: #ff4e18;">lists</span>:<span style="color: #ff3c00;">zip</span><span style="color: #109ab8;">&#40;</span><span style="color: #109ab8;">&#91;</span>a<span style="color: #6bb810;">,</span>b<span style="color: #6bb810;">,</span>c<span style="color: #109ab8;">&#93;</span><span style="color: #6bb810;">,</span> <span style="color: #109ab8;">&#91;</span><span style="color: #ff9600;">1</span><span style="color: #6bb810;">,</span><span style="color: #ff9600;">2</span><span style="color: #6bb810;">,</span><span style="color: #ff9600;">3</span><span style="color: #109ab8;">&#93;</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">.</span>
<span style="color: #109ab8;">&#91;</span><span style="color: #109ab8;">&#123;</span>a<span style="color: #6bb810;">,</span><span style="color: #ff9600;">1</span><span style="color: #109ab8;">&#125;</span><span style="color: #6bb810;">,</span><span style="color: #109ab8;">&#123;</span>b<span style="color: #6bb810;">,</span><span style="color: #ff9600;">2</span><span style="color: #109ab8;">&#125;</span><span style="color: #6bb810;">,</span><span style="color: #109ab8;">&#123;</span>c<span style="color: #6bb810;">,</span><span style="color: #ff9600;">3</span><span style="color: #109ab8;">&#125;</span><span style="color: #109ab8;">&#93;</span></pre></div></div>

<p>The last thing we need is a list comprehension, which is similar to <tt>each</tt> with a block in Ruby. For instance, in Ruby:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#006600; font-weight:bold;">&gt;&gt;</span> <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#ff3333; font-weight:bold;">:a</span>, <span style="color:#ff3333; font-weight:bold;">:b</span>, <span style="color:#ff3333; font-weight:bold;">:c</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">each</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>element<span style="color:#006600; font-weight:bold;">|</span> <span style="color:#CC0066; font-weight:bold;">puts</span> element <span style="color:#006600; font-weight:bold;">&#125;</span>
a
b
c
<span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#ff3333; font-weight:bold;">:a</span>, <span style="color:#ff3333; font-weight:bold;">:b</span>, <span style="color:#ff3333; font-weight:bold;">:c</span><span style="color:#006600; font-weight:bold;">&#93;</span></pre></div></div>

<p>In Erlang we would do the following:</p>

<div class="wp_syntax"><div class="code"><pre class="erlang" style="font-family:monospace;"><span style="color: #ff9600;">3</span><span style="color: #014ea4;">&gt;</span> <span style="color: #109ab8;">&#91;</span><span style="color: #ff4e18;">io</span>:<span style="color: #ff3c00;">format</span><span style="color: #109ab8;">&#40;</span><span style="color: #ff7800;">&quot;~p~n&quot;</span><span style="color: #6bb810;">,</span> <span style="color: #109ab8;">&#91;</span><span style="color: #45b3e6;">Element</span><span style="color: #109ab8;">&#93;</span><span style="color: #109ab8;">&#41;</span> <span style="color: #014ea4;">||</span> <span style="color: #45b3e6;">Element</span> <span style="color: #014ea4;">&lt;-</span> <span style="color: #109ab8;">&#91;</span>a<span style="color: #6bb810;">,</span>b<span style="color: #6bb810;">,</span>c<span style="color: #109ab8;">&#93;</span><span style="color: #109ab8;">&#93;</span><span style="color: #6bb810;">.</span>
a
b
c
<span style="color: #109ab8;">&#91;</span>ok<span style="color: #6bb810;">,</span>ok<span style="color: #6bb810;">,</span>ok<span style="color: #109ab8;">&#93;</span></pre></div></div>

<p>There is one difference here: the Erlang list comprehension returns a new list which is the value of each call to <tt>io:format</tt>. In this way an Erlang list comprehension is closer to Ruby&#8217;s <tt>#collect</tt> rather than <tt>#each</tt>.</p>
<p>
We now have all the pieces to build our Erlang version of <tt>#each_with_index</tt>.</p>

<div class="wp_syntax"><div class="code"><pre class="erlang" style="font-family:monospace;"><span style="color: #ff9600;">4</span><span style="color: #014ea4;">&gt;</span> <span style="color: #45b3e6;">EachWithIndex</span> <span style="color: #014ea4;">=</span> <span style="color: #ff3c00;">fun</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">L</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">X</span><span style="color: #109ab8;">&#41;</span> <span style="color: #6bb810;">-&gt;</span> <span style="color: #109ab8;">&#91;</span>
  <span style="color: #45b3e6;">X</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Element</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">Index</span><span style="color: #109ab8;">&#41;</span> <span style="color: #014ea4;">||</span> 
  <span style="color: #109ab8;">&#123;</span><span style="color: #45b3e6;">Element</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">Index</span><span style="color: #109ab8;">&#125;</span> <span style="color: #014ea4;">&lt;-</span> <span style="color: #ff4e18;">lists</span>:<span style="color: #ff3c00;">zip</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">L</span><span style="color: #6bb810;">,</span> <span style="color: #ff4e18;">lists</span>:<span style="color: #ff3c00;">seq</span><span style="color: #109ab8;">&#40;</span><span style="color: #ff9600;">1</span><span style="color: #6bb810;">,</span> <span style="color: #ff3c00;">length</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">L</span><span style="color: #109ab8;">&#41;</span><span style="color: #109ab8;">&#41;</span><span style="color: #109ab8;">&#41;</span>
   <span style="color: #109ab8;">&#93;</span> <span style="color: #186895;">end</span><span style="color: #6bb810;">.</span></pre></div></div>

<p>Whats going on here?</p>

<div class="wp_syntax"><div class="code"><pre class="erlang" style="font-family:monospace;"><span style="color: #ff4e18;">lists</span>:<span style="color: #ff3c00;">zip</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">L</span><span style="color: #6bb810;">,</span> <span style="color: #ff4e18;">lists</span>:<span style="color: #ff3c00;">seq</span><span style="color: #109ab8;">&#40;</span><span style="color: #ff9600;">1</span><span style="color: #6bb810;">,</span> <span style="color: #ff3c00;">length</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">L</span><span style="color: #109ab8;">&#41;</span><span style="color: #109ab8;">&#41;</span><span style="color: #109ab8;">&#41;</span></pre></div></div>

<p>creates a list of tuples with each tuple containing an element of <tt>L</tt> and a companion integer.</p>

<div class="wp_syntax"><div class="code"><pre class="erlang" style="font-family:monospace;"><span style="color: #109ab8;">&#91;</span><span style="color: #45b3e6;">X</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Element</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">Index</span><span style="color: #109ab8;">&#41;</span> <span style="color: #014ea4;">||</span> 
  <span style="color: #109ab8;">&#123;</span><span style="color: #45b3e6;">Element</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">Index</span><span style="color: #109ab8;">&#125;</span> <span style="color: #014ea4;">&lt;-</span> <span style="color: #ff4e18;">lists</span>:<span style="color: #ff3c00;">zip</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">L</span><span style="color: #6bb810;">,</span> <span style="color: #ff4e18;">lists</span>:<span style="color: #ff3c00;">seq</span><span style="color: #109ab8;">&#40;</span><span style="color: #ff9600;">1</span><span style="color: #6bb810;">,</span> <span style="color: #ff3c00;">length</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">L</span><span style="color: #109ab8;">&#41;</span><span style="color: #109ab8;">&#41;</span><span style="color: #109ab8;">&#41;</span><span style="color: #109ab8;">&#93;</span></pre></div></div>

<p>tells us to take that list of tuples and do a list comprehension on each of the elements, calling <tt>fun X</tt> in turn.</p>

<div class="wp_syntax"><div class="code"><pre class="erlang" style="font-family:monospace;"><span style="color: #45b3e6;">EachWithIndex</span> <span style="color: #014ea4;">=</span> <span style="color: #ff3c00;">fun</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">L</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">X</span><span style="color: #109ab8;">&#41;</span> <span style="color: #6bb810;">-&gt;</span> <span style="color: #6bb810;">...</span> <span style="color: #186895;">end</span><span style="color: #6bb810;">.</span></pre></div></div>

<p>creates a <tt>fun</tt> (think <i>lambda</i>) and assigns it to <tt>EachWithIndex</tt>. You could just as easily create this as a regular function in a <tt>.erl</tt> file somewhere.
</p>
<p>
Now we can call our new <tt>EachWithIndex</tt> function by passing in a list and a <tt>fun</tt>:
</p>

<div class="wp_syntax"><div class="code"><pre class="erlang" style="font-family:monospace;"><span style="color: #ff9600;">5</span><span style="color: #014ea4;">&gt;</span> <span style="color: #45b3e6;">E</span><span style="color: #ff3c00;">achWithIndex</span><span style="color: #109ab8;">&#40;</span><span style="color: #109ab8;">&#91;</span>a<span style="color: #6bb810;">,</span>b<span style="color: #6bb810;">,</span>c<span style="color: #6bb810;">,</span>d<span style="color: #6bb810;">,</span>e<span style="color: #109ab8;">&#93;</span><span style="color: #6bb810;">,</span> <span style="color: #ff3c00;">fun</span><span style="color: #109ab8;">&#40;</span><span style="color: #45b3e6;">Element</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">Index</span><span style="color: #109ab8;">&#41;</span> <span style="color: #6bb810;">-&gt;</span> <span style="color: #ff4e18;">io</span>:<span style="color: #ff3c00;">format</span><span style="color: #109ab8;">&#40;</span><span style="color: #ff7800;">&quot;~p at ~p~n&quot;</span><span style="color: #6bb810;">,</span> <span style="color: #109ab8;">&#91;</span><span style="color: #45b3e6;">Element</span><span style="color: #6bb810;">,</span> <span style="color: #45b3e6;">Index</span><span style="color: #109ab8;">&#93;</span><span style="color: #109ab8;">&#41;</span> <span style="color: #186895;">end</span><span style="color: #109ab8;">&#41;</span><span style="color: #6bb810;">.</span>
a at <span style="color: #ff9600;">1</span>
b at <span style="color: #ff9600;">2</span>
c at <span style="color: #ff9600;">3</span>
d at <span style="color: #ff9600;">4</span>
e at <span style="color: #ff9600;">5</span>
<span style="color: #109ab8;">&#91;</span>ok<span style="color: #6bb810;">,</span>ok<span style="color: #6bb810;">,</span>ok<span style="color: #6bb810;">,</span>ok<span style="color: #6bb810;">,</span>ok<span style="color: #109ab8;">&#93;</span></pre></div></div>

<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Frubys-each_with_index-for-erlang%2F&amp;title=Ruby%27s%20%3Ctt%3E%23each_with_index%3C%2Ftt%3E%20for%20Erlang&amp;notes=%0D%0ARuby%20has%20a%20great%20method%20for%20enumerations%20called%20%23each_with_index.%20It%27s%20a%20handy%20way%20to%20iterate%20over%20a%20list%20of%20elements%20and%20and%20know%20where%20you%27re%20at%20while%20doing%20it.%0D%0A%0D%0A%0D%0AErlang%20has%20a%20group%20of%20highly%20optimized%20list%20operations%20in%20the%20lists%20module%20%28Try%20" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Frubys-each_with_index-for-erlang%2F&amp;title=Ruby%27s%20%3Ctt%3E%23each_with_index%3C%2Ftt%3E%20for%20Erlang" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Frubys-each_with_index-for-erlang%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=Ruby%27s%20%3Ctt%3E%23each_with_index%3C%2Ftt%3E%20for%20Erlang%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Frubys-each_with_index-for-erlang%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Frubys-each_with_index-for-erlang%2F&amp;t=Ruby%27s%20%3Ctt%3E%23each_with_index%3C%2Ftt%3E%20for%20Erlang" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Frubys-each_with_index-for-erlang%2F&amp;title=Ruby%27s%20%3Ctt%3E%23each_with_index%3C%2Ftt%3E%20for%20Erlang&amp;annotation=%0D%0ARuby%20has%20a%20great%20method%20for%20enumerations%20called%20%23each_with_index.%20It%27s%20a%20handy%20way%20to%20iterate%20over%20a%20list%20of%20elements%20and%20and%20know%20where%20you%27re%20at%20while%20doing%20it.%0D%0A%0D%0A%0D%0AErlang%20has%20a%20group%20of%20highly%20optimized%20list%20operations%20in%20the%20lists%20module%20%28Try%20" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Frubys-each_with_index-for-erlang%2F&amp;t=Ruby%27s%20%3Ctt%3E%23each_with_index%3C%2Ftt%3E%20for%20Erlang" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Frubys-each_with_index-for-erlang%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/02/10/rubys-each_with_index-for-erlang/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/02/10/rubys-each_with_index-for-erlang/</feedburner:origLink></item>
		<item>
		<title>PoolParty s3fs Plugin</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/hveQQVHGQcM/</link>
		<comments>http://www.xcombinator.com/2009/02/10/poolparty-s3fs-plugin/#comments</comments>
		<pubDate>Tue, 10 Feb 2009 17:51:41 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[deployment]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/02/10/poolparty-s3fs-plugin/</guid>
		<description><![CDATA[Goal
A new plugin install and mount s3fs volume on an instance using PoolParty.
Getting it
git clone git://github.com/jashmenn/poolparty-s3fs-plugin.git
Usage
NOTICE You must have already created your S3 bucket. This plugin will not
create it for you.

cloud&#40;:app&#41;
  ...
  s3fs&#40;:bucket =&#62; &#34;my-fun-bucket&#34;&#41;
  ...
end

This will be mounted at /mnt/my-fun-bucket
You can mount multiple buckets and/or change the mount point:

s3fs do
  [...]]]></description>
			<content:encoded><![CDATA[<h2>Goal</h2>
<p>A new plugin install and mount s3fs volume on an instance using PoolParty.</p>
<h2>Getting it</h2>
<p><tt>git clone git://github.com/jashmenn/poolparty-s3fs-plugin.git</tt></p>
<h2>Usage</h2>
<p><em>NOTICE</em> You must have already created your S3 bucket. This plugin will <em>not</em><br />
create it for you.</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">cloud<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:app</span><span style="color:#006600; font-weight:bold;">&#41;</span>
  ...
  <span style="color:#9900CC;">s3fs</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:bucket</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#996600;">&quot;my-fun-bucket&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
  ...
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>This will be mounted at <code>/mnt/my-fun-bucket</code><br />
You can mount multiple buckets and/or change the mount point:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">s3fs <span style="color:#9966CC; font-weight:bold;">do</span>
  bucket <span style="color:#996600;">&quot;my-fun-bucket&quot;</span>
  bucket <span style="color:#996600;">&quot;my-other-bucket&quot;</span>, <span style="color:#996600;">&quot;/mnt/a-disk&quot;</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<h2>References:</h2>
<ul>
<li><a href="http://code.google.com/p/s3fs/wiki/FuseOverAmazon">http://code.google.com/p/s3fs/wiki/FuseOverAmazon</a></li>
<li><a href="http://groups.google.com/group/ec2ubuntu/browse_thread/thread/9093236bc07d220b?pli=1">http://groups.google.com/group/ec2ubuntu/browse_thread/thread/9093236bc07d220b?pli=1</a></li>
</ul>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Fpoolparty-s3fs-plugin%2F&amp;title=PoolParty%20s3fs%20Plugin&amp;notes=Goal%0D%0A%0D%0AA%20new%20plugin%20install%20and%20mount%20s3fs%20volume%20on%20an%20instance%20using%20PoolParty.%0D%0A%0D%0AGetting%20it%0D%0Agit%20clone%20git%3A%2F%2Fgithub.com%2Fjashmenn%2Fpoolparty-s3fs-plugin.git%0D%0A%0D%0AUsage%0D%0A%0D%0ANOTICE%20You%20must%20have%20already%20created%20your%20S3%20bucket.%20This%20plugin%20will%20not%0D%0Acre" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Fpoolparty-s3fs-plugin%2F&amp;title=PoolParty%20s3fs%20Plugin" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Fpoolparty-s3fs-plugin%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=PoolParty%20s3fs%20Plugin%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Fpoolparty-s3fs-plugin%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Fpoolparty-s3fs-plugin%2F&amp;t=PoolParty%20s3fs%20Plugin" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Fpoolparty-s3fs-plugin%2F&amp;title=PoolParty%20s3fs%20Plugin&amp;annotation=Goal%0D%0A%0D%0AA%20new%20plugin%20install%20and%20mount%20s3fs%20volume%20on%20an%20instance%20using%20PoolParty.%0D%0A%0D%0AGetting%20it%0D%0Agit%20clone%20git%3A%2F%2Fgithub.com%2Fjashmenn%2Fpoolparty-s3fs-plugin.git%0D%0A%0D%0AUsage%0D%0A%0D%0ANOTICE%20You%20must%20have%20already%20created%20your%20S3%20bucket.%20This%20plugin%20will%20not%0D%0Acre" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Fpoolparty-s3fs-plugin%2F&amp;t=PoolParty%20s3fs%20Plugin" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F10%2Fpoolparty-s3fs-plugin%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/02/10/poolparty-s3fs-plugin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/02/10/poolparty-s3fs-plugin/</feedburner:origLink></item>
		<item>
		<title>PoolParty MRTG Plugin</title>
		<link>http://feedproxy.google.com/~r/xcombinator/~3/NTy5rHhP40Q/</link>
		<comments>http://www.xcombinator.com/2009/02/03/poolparty-mrtg-plugin/#comments</comments>
		<pubDate>Wed, 04 Feb 2009 04:28:06 +0000</pubDate>
		<dc:creator>Nate Murray</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[deployment]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[poolparty]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[sysadmin]]></category>

		<guid isPermaLink="false">http://www.xcombinator.com/2009/02/03/poolparty-mrtg-plugin/</guid>
		<description><![CDATA[I&#8217;ve created a PoolParty plugin that makes it dead-simple to install ]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve created a <a href="http://www.poolpartyrb.com">PoolParty</a> plugin that makes it dead-simple to install <a href="http://oss.oetiker.ch/mrtg/>MRTG</a> on the nodes in your cloud</p>
<p>The <a href="http://github.com/jashmenn/poolparty-mrtg-plugin/tree/master">GitHub Repo is here</a>.</p>
<p>or <tt>git clone git://github.com/jashmenn/poolparty-mrtg-plugin.git</tt></p>
<p>Usage:</p>
<p>Declare <tt>mrtg</tt> in your <tt>cloud</tt> block</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">mrtg <span style="color:#9966CC; font-weight:bold;">do</span>
   monitor <span style="color:#ff3333; font-weight:bold;">:cpu</span>, <span style="color:#ff3333; font-weight:bold;">:uptime</span>, <span style="color:#ff3333; font-weight:bold;">:memory</span>, <span style="color:#ff3333; font-weight:bold;">:open_files</span>, <span style="color:#ff3333; font-weight:bold;">:processes</span>, <span style="color:#ff3333; font-weight:bold;">:apache</span>, <span style="color:#ff3333; font-weight:bold;">:network_interfaces</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

</p>
<p><img src='http://www.xcombinator.com/wp-content/uploads/2009/02/picture-11.png' alt='picture-11.png' /></p>
<p>Share:</p>
<p>	<a rel="nofollow"  href="http://delicious.com/post?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F03%2Fpoolparty-mrtg-plugin%2F&amp;title=PoolParty%20MRTG%20Plugin&amp;notes=I%27ve%20created%20a%20PoolParty%20plugin%20that%20makes%20it%20dead-simple%20to%20install%20" title="del.icio.us"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/delicious.png" title="del.icio.us" alt="del.icio.us" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://reddit.com/submit?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F03%2Fpoolparty-mrtg-plugin%2F&amp;title=PoolParty%20MRTG%20Plugin" title="Reddit"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/reddit.png" title="Reddit" alt="Reddit" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://technorati.com/faves?add=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F03%2Fpoolparty-mrtg-plugin%2F" title="Technorati"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/technorati.png" title="Technorati" alt="Technorati" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://twitter.com/home?status=PoolParty%20MRTG%20Plugin%20-%20http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F03%2Fpoolparty-mrtg-plugin%2F" title="Twitter"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/twitter.png" title="Twitter" alt="Twitter" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F03%2Fpoolparty-mrtg-plugin%2F&amp;t=PoolParty%20MRTG%20Plugin" title="Facebook"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/facebook.png" title="Facebook" alt="Facebook" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.google.com/bookmarks/mark?op=edit&amp;bkmk=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F03%2Fpoolparty-mrtg-plugin%2F&amp;title=PoolParty%20MRTG%20Plugin&amp;annotation=I%27ve%20created%20a%20PoolParty%20plugin%20that%20makes%20it%20dead-simple%20to%20install%20" title="Google Bookmarks"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/googlebookmark.png" title="Google Bookmarks" alt="Google Bookmarks" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F03%2Fpoolparty-mrtg-plugin%2F&amp;t=PoolParty%20MRTG%20Plugin" title="HackerNews"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/hackernews.png" title="HackerNews" alt="HackerNews" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.printfriendly.com/print?url=http%3A%2F%2Fwww.xcombinator.com%2F2009%2F02%2F03%2Fpoolparty-mrtg-plugin%2F&amp;partner=sociable" title="PDF"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/pdf.png" title="PDF" alt="PDF" class="sociable-hovers" /></a><br />
	<a rel="nofollow"  href="http://www.xcombinator.com/feed/" title="RSS"><img src="http://www.xcombinator.com/wp-content/plugins/sociable/images/rss.png" title="RSS" alt="RSS" class="sociable-hovers" /></a></p>
<p><br/><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://www.xcombinator.com/2009/02/03/poolparty-mrtg-plugin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.xcombinator.com/2009/02/03/poolparty-mrtg-plugin/</feedburner:origLink></item>
	</channel>
</rss><!-- Dynamic Page Served (once) in 6.301 seconds -->
