<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Mike Perham</title>
	
	<link>http://www.mikeperham.com</link>
	<description>On Ruby, software and the Internet</description>
	<pubDate>Tue, 23 Jun 2009 01:23:17 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/mikeperham" type="application/rss+xml" /><item>
		<title>Slabs, Pages, Chunks and Memcached</title>
		<link>http://feedproxy.google.com/~r/mikeperham/~3/JQmGhgFYuyg/</link>
		<comments>http://www.mikeperham.com/2009/06/22/slabs-pages-chunks-and-memcached/#comments</comments>
		<pubDate>Tue, 23 Jun 2009 01:23:17 +0000</pubDate>
		<dc:creator>mperham</dc:creator>
		
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=300</guid>
		<description><![CDATA[Many people don&#8217;t know this but the latest memcached release (1.2.8 right now) can be about 15% more efficient in its memory usage than older releases.  If you have a 600MB memcached server, upgrading will magically &#8220;gain&#8221; you 100MB of RAM.  Why is this?
When you ask memcached to store a value, it looks [...]]]></description>
			<content:encoded><![CDATA[<p>Many people don&#8217;t know this but the latest memcached release (1.2.8 right now) can be about 15% more efficient in its memory usage than older releases.  If you have a 600MB memcached server, upgrading will magically &#8220;gain&#8221; you 100MB of RAM.  Why is this?</p>
<p>When you ask memcached to store a value, it looks up the &#8220;slab&#8221; associated with that value.  A slab holds values within a particular size range.  Slabs are composed of 1MB pages, which are broken into chunks of the slab&#8217;s size.  Let&#8217;s say your value is 1001 bytes; memcached will look up the slab which holds values between 1000 and 2000 bytes.  It then finds a page with an empty chunk and inserts the value into that chunk.  Note that a chunk is fixed in size - it must be 2000 bytes in order to store the largest value for the slab.</p>
<p>Now you know why memcached limits values to one megabyte: the value must be stored in a chunk and a page needs to hold the chunk.  Since a page is hardcoded as 1MB, it follows that a chunk must also be limited to 1MB.</p>
<p>So we understand the &#8220;object model&#8221; for memcached memory allocation: a slab has many pages which has many chunks.  Each chunk is a fixed size, based on the maximum size for the slab so e.g. the 2000 byte slab will hold values between 1001 and 2000 bytes.  Older versions of memcached used slabs sized based on powers of two, so you&#8217;d have a 1KB slab, 2KB slab, 4KB slab, &#8230;, all the way to 1MB.  If your memcached server was full of 1001 byte values, your memory efficiency would be 50% (1001 / 2000) in the worst case.  Assuming you have an even distribution of value sizes, you&#8217;ll get 75% efficiency (1500 / 2000).  Your 600MB memcached server will only hold 450MB of actual data!</p>
<p><a href="http://www.mikeperham.com/wp-content/uploads/2009/06/memcached-memory-allocation.png"><img src="http://www.mikeperham.com/wp-content/uploads/2009/06/memcached-memory-allocation.png" alt="memcached-memory-allocation" title="memcached-memory-allocation" width="526" height="481" class="alignnone size-full wp-image-303" /></a></p>
<p>In this image, we see a single slab with two pages.  Each page has several chunks, the green chunks are empty and some have orange values.  The yellow area is the waste we are talking about.</p>
<p>One of the improvements Facebook made to memcached last year was moving to a smaller exponential so there is not as much waste in storing values in chunks.  Instead of 2^n for the slab allocation, the latest versions of memcached use a much smaller growth exponential, 1.25^n, so you will see slabs with sizes 1KB, 1.25KB, 1.56KB, etc&#8230;  This means that instead of 25% waste on average, you should see closer to 10%.  Effectively you regain 15% of your memcached memory just by installing the latest version!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2009/06/22/slabs-pages-chunks-and-memcached/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.mikeperham.com/2009/06/22/slabs-pages-chunks-and-memcached/</feedburner:origLink></item>
		<item>
		<title>memcached vs memcache-client Performance</title>
		<link>http://feedproxy.google.com/~r/mikeperham/~3/n-yEEV3iPdY/</link>
		<comments>http://www.mikeperham.com/2009/06/16/memcached-vs-memcache-client-performance/#comments</comments>
		<pubDate>Tue, 16 Jun 2009 23:08:24 +0000</pubDate>
		<dc:creator>mperham</dc:creator>
		
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=296</guid>
		<description><![CDATA[memcached is Evan Weaver&#8217;s Ruby wrapper around the libmemcached C library and widely regarded as quite fast.  After an hour of trying, I finally got a build of memcached to actually compile and install on my machine (the trick: you need to download the custom packages Evan links on his blog, nothing else seems [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://github.com/fauna/memcached">memcached</a> is Evan Weaver&#8217;s Ruby wrapper around the libmemcached C library and widely regarded as quite fast.  After an hour of trying, I finally got a build of memcached to actually compile and install on my machine (the trick: you need to download the custom packages Evan links on his blog, nothing else seems to work).  Here&#8217;s the results:</p>
<pre>
== memcached 0.13 + libmemcached 0.25.4 versus memcache-client 1.7.4

                                     user     system      total        real
set:plain:noblock:memcached      0.090000   0.030000   0.120000 (  0.277929)
set:plain:memcached              0.220000   0.270000   0.490000 (  1.251547)
set:plain:memcache-client        0.610000   0.270000   0.880000 (  1.670718)
set:ruby:noblock:memcached       0.150000   0.020000   0.170000 (  0.309201)
set:ruby:memcached               0.300000   0.290000   0.590000 (  1.390354)
set:ruby:memcache-client         0.670000   0.270000   0.940000 (  1.713558)
get:plain:memcached              0.240000   0.270000   0.510000 (  1.169909)
get:plain:memcache-client        0.850000   0.270000   1.120000 (  1.885270)
get:ruby:memcached               0.270000   0.280000   0.550000 (  1.229705)
get:ruby:memcache-client         0.890000   0.260000   1.150000 (  1.861660)
multiget:ruby:memcached          0.190000   0.090000   0.280000 (  0.396264)
multiget:ruby:memcache-client    0.530000   0.100000   0.630000 (  0.901016)
missing:ruby:memcached           0.280000   0.290000   0.570000 (  1.254400)
missing:ruby:memcached:inline    0.300000   0.290000   0.590000 (  1.235122)
missing:ruby:memcache-client     0.570000   0.250000   0.820000 (  1.461293)
mixed:ruby:noblock:memcached     0.540000   0.620000   1.160000 (  2.429200)
mixed:ruby:memcached             0.580000   0.570000   1.150000 (  2.610819)
mixed:ruby:memcache-client       1.580000   0.540000   2.120000 (  3.632775)
</pre>
<p>In most cases, memcache-client is within 33-50% of the performance of memcached.  This is amazing for a (mostly) pure Ruby library performing a lot of network IO against a C library which has been tuned for speed!  I hope that puts to bed any lingering doubts that memcache-client is slow.</p>
<p>Remember: if you are using Rails 2.3, just &#8220;gem install memcache-client&#8221; and Rails will pick up the latest version with all these performance improvements.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2009/06/16/memcached-vs-memcache-client-performance/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.mikeperham.com/2009/06/16/memcached-vs-memcache-client-performance/</feedburner:origLink></item>
		<item>
		<title>Adding multi-get support to Rails</title>
		<link>http://feedproxy.google.com/~r/mikeperham/~3/lAfR58RgRSU/</link>
		<comments>http://www.mikeperham.com/2009/06/13/adding-multi-get-support-to-rails/#comments</comments>
		<pubDate>Sat, 13 Jun 2009 20:14:34 +0000</pubDate>
		<dc:creator>mperham</dc:creator>
		
		<category><![CDATA[Rails]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=292</guid>
		<description><![CDATA[Memcache-client has the ability to fetch multiple keys in one request but Rails does not expose this functionality.  It&#8217;s really easy to add it yourself though:
config/initializers/rails_patches.rb

Rails.cache.instance_eval &#60;&#60;-EOM
  def read_multi&#40;*keys&#41;
    @data.get_multi&#40;*keys&#41;
  end
EOM

Rails uses read/write for its API naming so we name the method read_multi rather than get_multi.  Here&#8217;s a [...]]]></description>
			<content:encoded><![CDATA[<p>Memcache-client has the ability to fetch multiple keys in one request but Rails does not expose this functionality.  It&#8217;s really easy to add it yourself though:</p>
<p><strong>config/initializers/rails_patches.rb</strong></p>

<div class="wp_syntax"><div class="code"><pre class="ruby ruby" style="font-family:monospace;">Rails.<span style="color:#9900CC;">cache</span>.<span style="color:#9900CC;">instance_eval</span> <span style="color:#006600; font-weight:bold;">&lt;&lt;-</span>EOM
  <span style="color:#9966CC; font-weight:bold;">def</span> read_multi<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">*</span>keys<span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#0066ff; font-weight:bold;">@data</span>.<span style="color:#9900CC;">get_multi</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">*</span>keys<span style="color:#006600; font-weight:bold;">&#41;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
EOM</pre></div></div>

<p>Rails uses read/write for its API naming so we name the method <code>read_multi</code> rather than <code>get_multi</code>.  Here&#8217;s a sample usage in script/console:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby ruby" style="font-family:monospace;"><span style="color:#006600; font-weight:bold;">&gt;&gt;</span> Rails.<span style="color:#9900CC;">cache</span>.<span style="color:#9900CC;">write</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'a'</span>, <span style="color:#006666;">1</span><span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#006600; font-weight:bold;">&gt;&gt;</span> Rails.<span style="color:#9900CC;">cache</span>.<span style="color:#9900CC;">write</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'b'</span>, <span style="color:#006666;">2</span><span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#006600; font-weight:bold;">&gt;&gt;</span> Rails.<span style="color:#9900CC;">cache</span>.<span style="color:#9900CC;">write</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'c'</span>, <span style="color:#006666;">3</span><span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#006600; font-weight:bold;">&gt;&gt;</span> Rails.<span style="color:#9900CC;">cache</span>.<span style="color:#9900CC;">read_multi</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'a'</span>, <span style="color:#996600;">'b'</span>, <span style="color:#996600;">'c'</span><span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#996600;">&quot;a&quot;</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#006666;">1</span>, <span style="color:#996600;">&quot;b&quot;</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#006666;">2</span>, <span style="color:#996600;">&quot;c&quot;</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#006666;">3</span><span style="color:#006600; font-weight:bold;">&#125;</span></pre></div></div>

<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2009/06/13/adding-multi-get-support-to-rails/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.mikeperham.com/2009/06/13/adding-multi-get-support-to-rails/</feedburner:origLink></item>
		<item>
		<title>Scraping with Typhoeus and Nokogiri</title>
		<link>http://feedproxy.google.com/~r/mikeperham/~3/d8N9edoHS8s/</link>
		<comments>http://www.mikeperham.com/2009/06/12/scraping-with-typhoeus-and-nokogiri/#comments</comments>
		<pubDate>Fri, 12 Jun 2009 20:06:05 +0000</pubDate>
		<dc:creator>mperham</dc:creator>
		
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=286</guid>
		<description><![CDATA[I&#8217;ve been working on some cool new functionality at OneSpot.  We want to provide a widget that can give the reader more context about a given article.  Zemanta takes the article text and hands us back a set of semantic entities, including links to their Wikipedia page, but we wanted to get a [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working on some cool new functionality at <a href="http://www.onespot.com">OneSpot</a>.  We want to provide a widget that can give the reader more context about a given article.  <a href="http://www.zemanta.com">Zemanta</a> takes the article text and hands us back a set of semantic entities, including links to their Wikipedia page, but we wanted to get a nice blurb about each entity and figured that the opening paragraph from the Wikipedia page would be reasonable.</p>
<p>To do this, we use <a href="http://github.com/pauldix/typhoeus">Typhoeus</a> to fetch the Wikipedia pages in parallel and <a href="http://github.com/tenderlove/nokogiri">Nokogiri</a> to pull the relevant content using a custom XPath expression for Wikipedia&#8217;s page layout.</p>
<p>Some notes:</p>
<ul>
<li>We configure Typhoeus to use Rails&#8217;s cache store for its own cache store.  We cache the Wikipedia response for 7 days in order to be good Netizens and not overburden their servers.</li>
<li>Wikipedia  links do not specify a hostname so we make them absolute so the links will work embedded in another page.</li>
<li>We tried Curl::Multi but it was giving us occasional bus errors.</li>
<li>My wordpress syntax highlighter is obviously subpar when it comes to regular expressions.</li>
</ul>

<div class="wp_syntax"><div class="code"><pre class="ruby ruby" style="font-family:monospace;"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'typhoeus'</span>
<span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'nokogiri'</span>
&nbsp;
<span style="color:#9966CC; font-weight:bold;">class</span> Wikipedia
  <span style="color:#9966CC; font-weight:bold;">include</span> Typhoeus
  <span style="color:#008000; font-style:italic;">#self.cache = Rails.cache.instance_variable_get(:@data)</span>
&nbsp;
  remote_defaults <span style="color:#ff3333; font-weight:bold;">:cache_responses</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#006666;">7</span><span style="color:#006600; font-weight:bold;">*</span><span style="color:#006666;">24</span><span style="color:#006600; font-weight:bold;">*</span><span style="color:#006666;">60</span><span style="color:#006600; font-weight:bold;">*</span><span style="color:#006666;">60</span>, 
      <span style="color:#ff3333; font-weight:bold;">:user_agent</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#996600;">'typhoeus crawler'</span>, 
      <span style="color:#ff3333; font-weight:bold;">:timeout</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#006666;">5</span>
&nbsp;
  define_remote_method <span style="color:#ff3333; font-weight:bold;">:extract</span>, 
      <span style="color:#ff3333; font-weight:bold;">:on_success</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#CC0066; font-weight:bold;">lambda</span> <span style="color:#006600; font-weight:bold;">&#123;</span>|response| Wikipedia.<span style="color:#9900CC;">extract_first_paragraph</span><span style="color:#006600; font-weight:bold;">&#40;</span>response.<span style="color:#9900CC;">body</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">def</span> <span style="color:#0000FF; font-weight:bold;">self</span>.<span style="color:#9900CC;">extract_first_paragraph</span><span style="color:#006600; font-weight:bold;">&#40;</span>content<span style="color:#006600; font-weight:bold;">&#41;</span>
    nh = <span style="color:#6666ff; font-weight:bold;">Nokogiri::HTML</span><span style="color:#006600; font-weight:bold;">&#40;</span>content<span style="color:#006600; font-weight:bold;">&#41;</span>
    str = nh.<span style="color:#9900CC;">xpath</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;//div[@id='bodyContent']/p[1]&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">inner_html</span>
    str.<span style="color:#CC0066; font-weight:bold;">gsub</span> <span style="color:#006600; font-weight:bold;">/</span>href=<span style="color:#996600;">&quot;<span style="color:#000099;">\/</span>wiki/, 'href=&quot;</span>http:<span style="color:#006600; font-weight:bold;">//</span>en.<span style="color:#9900CC;">wikipedia</span>.<span style="color:#9900CC;">org</span><span style="color:#006600; font-weight:bold;">/</span>wiki<span style="color:#996600;">'
  end
end</span></pre></div></div>

<p>And here&#8217;s how you use it.</p>

<div class="wp_syntax"><div class="code"><pre class="ruby ruby" style="font-family:monospace;">    entities = <span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>
http:<span style="color:#006600; font-weight:bold;">//</span>en.<span style="color:#9900CC;">wikipedia</span>.<span style="color:#9900CC;">org</span><span style="color:#006600; font-weight:bold;">/</span>wiki<span style="color:#006600; font-weight:bold;">/</span>Garth_Marenghi<span style="color:#996600;">'s_Darkplace
http://en.wikipedia.org/wiki/Bus_error
http://en.wikipedia.org/wiki/Washington
)
    content = entities.map do |url|
      Wikipedia.extract(:base_uri =&gt; url)
    end
    p content</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2009/06/12/scraping-with-typhoeus-and-nokogiri/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.mikeperham.com/2009/06/12/scraping-with-typhoeus-and-nokogiri/</feedburner:origLink></item>
		<item>
		<title>Memory-hungry Ruby daemons</title>
		<link>http://feedproxy.google.com/~r/mikeperham/~3/QhsUzk-3Ueg/</link>
		<comments>http://www.mikeperham.com/2009/05/25/memory-hungry-ruby-daemons/#comments</comments>
		<pubDate>Mon, 25 May 2009 18:03:42 +0000</pubDate>
		<dc:creator>mperham</dc:creator>
		
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=271</guid>
		<description><![CDATA[We&#8217;ve had a perplexing issue with our Ruby daemons at OneSpot: they seem to grow to 300-400MB each within about 30 minutes, at which point our Monit scripts restart them.  We suspected a memory leak and so upgraded from stock Ruby 1.8.5 shipped with CentOS to the latest REE 1.8.6 but nothing changed.  [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve had a perplexing issue with our Ruby daemons at <a href="http://www.onespot.com">OneSpot</a>: they seem to grow to 300-400MB each within about 30 minutes, at which point our Monit scripts restart them.  We suspected a memory leak and so upgraded from stock Ruby 1.8.5 shipped with CentOS to the latest REE 1.8.6 but nothing changed.  I also saw a very similar issue at <a href="http://www.fiveruns.com">FiveRuns</a>.  Why is this problem seemingly endemic, even with completely different source code?  After some thought and research I think I understand the root clause of the problem: it&#8217;s part of Ruby&#8217;s history and design.</p>
<p><strong>Memory Management in Ruby</strong></p>
<p>Ruby uses 5 constants to control how it manages an application&#8217;s heap, 3 of which are important to this discussion.  From the <a href="http://www.rubyenterpriseedition.com/documentation.html#_garbage_collector_performance_tuning">REE user&#8217;s guide</a>:</p>
<blockquote>
<ul>
<li>
<strong>RUBY_HEAP_MIN_SLOTS</strong><br />
This specifies the initial number of heap slots. The default is 10000.
</li>
<li>
<strong>RUBY_HEAP_SLOTS_INCREMENT</strong><br />
The number of additional heap slots to allocate when Ruby needs to allocate new heap slots for the first time. The default is 10000.</p>
<p>For example, suppose that the default GC settings are in effect, and 10000 Ruby objects exist on the heap (= 10000 used heap slots). When the program creates another object, Ruby will allocate a new heap with 10000 heap slots in it. There are now 20000 heap slots in total, of which 10001 are used and 9999 are unused.
</li>
<li>
<strong>RUBY_HEAP_SLOTS_GROWTH_FACTOR</strong><br />
Multiplicator used for calculating the number of new heaps slots to allocate next time Ruby needs new heap slots. The default is 1.8.</p>
<p>Take the program in the last example. Suppose that the program creates 10000 more objects. Upon creating the 10000th object, Ruby needs to allocate another heap. This heap will have 10000 * 1.8 = 18000 heap slots. There are now 20000 + 18000 = 38000 heap slots in total, of which 20001 are used and 17999 are unused.</p>
<p>The next time Ruby needs to allocate a new heap, that heap will have 18000 * 1.8 = 32400 heap slots.
</li>
</ul>
</blockquote>
<p>So MRI will initially allocate the application RUBY_HEAP_MIN_SLOTS or 10,000 slots.  Let&#8217;s assume for ease of math that this corresponds to 1MB of memory.  Now Rails and our application code can&#8217;t fit into anything less than 50MB so Ruby will need to allocate additional heaps for the necessary objects.  It does this by using RUBY_HEAP_SLOTS_INCREMENT and RUBY_HEAP_SLOTS_GROWTH_FACTOR each time.  So we allocate 1.8MB, 3.24, 5.83, 10.5, 18.9, 34, 61, 110, 198, &#8230; where the size of the newest heap is expanded by 1.8x each time.  As you can see, just to get us to our 50MB minimum, we&#8217;re now allocating 34MB for the latest heap.  Once the app starts actually processing data, we&#8217;ll allocate 61 and then 110 MB!</p>
<p>This is the core of the problem: loading Rails expands the Ruby process so much that additional memory allocation grows much larger than we actually need, due to the exponential growth factor.  And since MRI never gives back unused memory, our daemon can easily be taking 300-400MB when it&#8217;s only using 100-200.</p>
<p>It&#8217;s important to note that this is essentially by design.  Ruby&#8217;s history is mostly as a command line tool for text processing and therefore it values quick startup and a small memory footprint.  It was not designed for long-running daemon/server processes.  Java makes a similar tradeoff in its client and server VMs.</p>
<p>Our solution was to move to <a href="http://www.rubyenterpriseedition.com">Ruby Enterprise Edition</a>.  It allows those constants to be modified via environment variables, so that you can greatly increase MIN_SLOTS and greatly reduce GROWTH_FACTOR.  Our settings:</p>

<div class="wp_syntax"><div class="code"><pre class="sh" style="font-family:monospace;">export RUBY_HEAP_MIN_SLOTS=800000 
export RUBY_HEAP_SLOTS_INCREMENT=100000 
export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1</pre></div></div>

<p>That gives our daemon ~80MB to start and each heap is a fixed 10MB.  Our daemon stabilizes at ~120MB and the memory usage doesn&#8217;t change, even after hours of processing.  My takeaway: if you own a Ruby daemon, you need to tune the heap to ensure it does not take too much memory!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2009/05/25/memory-hungry-ruby-daemons/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.mikeperham.com/2009/05/25/memory-hungry-ruby-daemons/</feedburner:origLink></item>
		<item>
		<title>A Guide to Varnish VCL</title>
		<link>http://feedproxy.google.com/~r/mikeperham/~3/FXIy2P0qcRU/</link>
		<comments>http://www.mikeperham.com/2009/05/19/a-guide-to-varnish-vcl/#comments</comments>
		<pubDate>Wed, 20 May 2009 02:22:20 +0000</pubDate>
		<dc:creator>mperham</dc:creator>
		
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=259</guid>
		<description><![CDATA[I&#8217;ve been working with Varnish 2.0 for the last two weeks, going from complete n00b to someone who knows enough to feel I can improve the terrible lack of documentation for Varnish and VCL.  There&#8217;s not a lot out there and what&#8217;s there is hard to find and sometimes erroneous.  I&#8217;m hoping this [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working with Varnish 2.0 for the last two weeks, going from complete n00b to someone who knows enough to feel I can improve the terrible lack of documentation for Varnish and VCL.  There&#8217;s not a lot out there and what&#8217;s there is hard to find and sometimes erroneous.  I&#8217;m hoping this post will help others like me who are struggling with Varnish and VCL.</p>
<p><strong>Basics</strong></p>
<p>VCL is essentially a set of stubs which you can override to provide your own behavior.  It is very limited in what it can do, primarily for performance reasons.  You don&#8217;t have access to the filesystem and the language has no variables or loops.</p>
<p>The two stubs you will most often use:</p>
<ul>
<li><strong>vcl_recv</strong> - called at the start of a request.  This is primarily used to canonicalize the input URL and headers, determine whether to bypass the cache, etc.</li>
<li><strong>vcl_fetch</strong> - called when the response has been gathered from the backend before placing it in the cache.  You can configure a grace period, enable ESI processing, configure different TTLs, remove user-specific cookies, etc before inserting the response into the cache.</li>
</ul>
<p><strong>Examples</strong></p>
<p>The Varnish VCL examples are rather sparse; here&#8217;s a few more which may fill in some gaps.  These work with Varnish 2.0.4.</p>

<div class="wp_syntax"><div class="code"><pre class="c c" style="font-family:monospace;"><span style="color: #339933;"># If the requested URL starts like &quot;/link/?&quot; then immediately pass it to the given </span>
<span style="color: #339933;"># backend and DO NOT cache the result (&quot;pass&quot; basically means &quot;bypass the cache&quot;).</span>
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>req.<span style="color: #202020;">url</span> ~ <span style="color: #ff0000;">&quot;^/link/<span style="color: #000099; font-weight: bold;">\?</span>&quot;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  set req.<span style="color: #202020;">backend</span> <span style="color: #339933;">=</span> web;
  pass;                                                                                                    
<span style="color: #009900;">&#125;</span></pre></div></div>


<div class="wp_syntax"><div class="code"><pre class="c c" style="font-family:monospace;"><span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>req.<span style="color: #202020;">url</span> ~ <span style="color: #ff0000;">&quot;/$&quot;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> 
  <span style="color: #339933;"># Handle URLs with a trailing slash by appending index.html</span>
  <span style="color: #339933;"># (Useful if you are pulling from S3 which does not have default document logic)</span>
  <span style="color: #339933;"># Note there's no explicit string append operator.</span>
  set req.<span style="color: #202020;">url</span> <span style="color: #339933;">=</span> req.<span style="color: #202020;">url</span> <span style="color: #ff0000;">&quot;index.html&quot;</span>;
<span style="color: #009900;">&#125;</span></pre></div></div>


<div class="wp_syntax"><div class="code"><pre class="c c" style="font-family:monospace;"><span style="color: #339933;"># strip port from the Host header</span>
<span style="color: #339933;"># (useful when testing against a local Varnish instance on port 6081)</span>
set req.<span style="color: #202020;">http</span>.<span style="color: #202020;">Host</span> <span style="color: #339933;">=</span> regsub<span style="color: #009900;">&#40;</span>req.<span style="color: #202020;">http</span>.<span style="color: #202020;">Host</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">&quot;:[0-9]+&quot;</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">&quot;&quot;</span><span style="color: #009900;">&#41;</span>;
&nbsp;
<span style="color: #339933;"># /foo/bar.embed -&gt; /foo/bar/embed.js</span>
set req.<span style="color: #202020;">url</span> <span style="color: #339933;">=</span> regsub<span style="color: #009900;">&#40;</span>req.<span style="color: #202020;">url</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">&quot;(.*)<span style="color: #000099; font-weight: bold;">\.</span>embed$&quot;</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\1</span>/embed.js&quot;</span><span style="color: #009900;">&#41;</span>;
&nbsp;
<span style="color: #339933;"># Support feed URLs of the form &quot;/foo/bar.atom&quot; --&gt; &quot;/foo/bar/feed.atom&quot;</span>
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>req.<span style="color: #202020;">url</span> ~ <span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\.</span>(rss|atom)$&quot;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;&amp;</span> <span style="color: #339933;">!</span><span style="color: #009900;">&#40;</span>req.<span style="color: #202020;">url</span> ~ <span style="color: #ff0000;">&quot;feed<span style="color: #000099; font-weight: bold;">\.</span>(atom|rss)$&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    set req.<span style="color: #202020;">url</span> <span style="color: #339933;">=</span> regsub<span style="color: #009900;">&#40;</span>req.<span style="color: #202020;">url</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">&quot;(.*)<span style="color: #000099; font-weight: bold;">\.</span>(.*)$&quot;</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\1</span>/feed.<span style="color: #000099; font-weight: bold;">\2</span>&quot;</span><span style="color: #009900;">&#41;</span>;
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>The biggest pain in all of this was the very limited logic you can perform on req.url.  You don&#8217;t have variables in VCL so you need to think in terms of regular expression groups like in the RSS/ATOM regexp above when trying to restructure the URL.</p>

<div class="wp_syntax"><div class="code"><pre class="c c" style="font-family:monospace;">  <span style="color: #339933;"># use this in vcl_fetch, don't want 404s filling up our cache, so just </span>
  <span style="color: #339933;"># immediately return a client error and bypass the cache.</span>
  <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>obj.<span style="color: #202020;">status</span> <span style="color: #339933;">==</span> <span style="color: #0000dd;">404</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    error <span style="color: #0000dd;">404</span> <span style="color: #ff0000;">&quot;No such file&quot;</span>;
  <span style="color: #009900;">&#125;</span></pre></div></div>

<p><strong>Resources</strong></p>
<p>Here&#8217;s the best VCL resources I could find:</p>
<ul>
<li><a href="http://varnish.projects.linpro.no/wiki/VCL">Varnish VCL</a></li>
<li><a href="http://varnish.projects.linpro.no/wiki/VCLExamples">Varnish VCL Examples</a></li>
<li><a href="http://phk.freebsd.dk/pubs/varnish_vcl.pdf">VCL Overview (PDF)</a></li>
</ul>
<p>Good luck!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2009/05/19/a-guide-to-varnish-vcl/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.mikeperham.com/2009/05/19/a-guide-to-varnish-vcl/</feedburner:origLink></item>
		<item>
		<title>Engines in Rails 2.3</title>
		<link>http://feedproxy.google.com/~r/mikeperham/~3/eVvE-B2Eheo/</link>
		<comments>http://www.mikeperham.com/2009/04/18/engines-in-rails-23/#comments</comments>
		<pubDate>Sat, 18 Apr 2009 20:53:29 +0000</pubDate>
		<dc:creator>mperham</dc:creator>
		
		<category><![CDATA[Rails]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=248</guid>
		<description><![CDATA[Engines have been around Rails for years but it wasn&#8217;t until the recent 2.3 release that Rails officially supported Engines.  So what is an Engine?  An Engine is a Rails plugin with full MVC capabilities.  In essence, that means your Engine has an app directory with helpers, controllers, models and views just [...]]]></description>
			<content:encoded><![CDATA[<p>Engines have been around Rails for years but it wasn&#8217;t until the recent 2.3 release that Rails officially supported Engines.  So what is an Engine?  An Engine is a Rails plugin with full MVC capabilities.  In essence, that means your Engine has an app directory with helpers, controllers, models and views just like a standard Rails application.  You add an engine to <code>vendor/plugins</code> or through <code>config.gem</code> in your application, just like a plugin, but additionally its app directory is effectively overlaid on top of your application&#8217;s app directory.</p>
<p>Let&#8217;s spelunk through the code:</p>
<p><code>rails-2.3.2/lib/rails/plugin/loader.rb</code></p>

<div class="wp_syntax"><div class="code"><pre class="ruby ruby" style="font-family:monospace;">        <span style="color:#9966CC; font-weight:bold;">def</span> configure_engines
          <span style="color:#9966CC; font-weight:bold;">if</span> engines.<span style="color:#9900CC;">any</span>?
            add_engine_routing_configurations
            add_engine_controller_paths
            add_engine_view_paths
          <span style="color:#9966CC; font-weight:bold;">end</span>
        <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
        <span style="color:#9966CC; font-weight:bold;">def</span> add_engine_routing_configurations
          engines.<span style="color:#CC0066; font-weight:bold;">select</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&amp;</span>:routed?<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">collect</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&amp;</span>:routing_file<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> |routing_file|
            <span style="color:#6666ff; font-weight:bold;">ActionController::Routing::Routes</span>.<span style="color:#9900CC;">add_configuration_file</span><span style="color:#006600; font-weight:bold;">&#40;</span>routing_file<span style="color:#006600; font-weight:bold;">&#41;</span>
          <span style="color:#9966CC; font-weight:bold;">end</span>
        <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
        <span style="color:#9966CC; font-weight:bold;">def</span> add_engine_controller_paths
          <span style="color:#6666ff; font-weight:bold;">ActionController::Routing</span>.<span style="color:#9900CC;">controller_paths</span> <span style="color:#006600; font-weight:bold;">+</span>= engines.<span style="color:#9900CC;">collect</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&amp;</span>:controller_path<span style="color:#006600; font-weight:bold;">&#41;</span>
        <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
        <span style="color:#9966CC; font-weight:bold;">def</span> add_engine_view_paths
          <span style="color:#008000; font-style:italic;"># reverse it such that the last engine can overwrite view paths from the first, like with routes</span>
          paths = <span style="color:#6666ff; font-weight:bold;">ActionView::PathSet</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>engines.<span style="color:#9900CC;">collect</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&amp;</span>:view_path<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">reverse</span><span style="color:#006600; font-weight:bold;">&#41;</span>
          <span style="color:#6666ff; font-weight:bold;">ActionController::Base</span>.<span style="color:#9900CC;">view_paths</span>.<span style="color:#9900CC;">concat</span><span style="color:#006600; font-weight:bold;">&#40;</span>paths<span style="color:#006600; font-weight:bold;">&#41;</span>
          <span style="color:#6666ff; font-weight:bold;">ActionMailer::Base</span>.<span style="color:#9900CC;">view_paths</span>.<span style="color:#9900CC;">concat</span><span style="color:#006600; font-weight:bold;">&#40;</span>paths<span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#9966CC; font-weight:bold;">if</span> configuration.<span style="color:#9900CC;">frameworks</span>.<span style="color:#9966CC; font-weight:bold;">include</span>?<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:action_mailer</span><span style="color:#006600; font-weight:bold;">&#41;</span>
        <span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>For each engine, we add any routes, any controllers and any views.  Additionally, the directories within app will be added to the global LOAD_PATH, as with a normal application.  Note that engines are processed in order exactly like plugins: alphabetically or based on the order they are listed in <code>config/environment.rb</code>.</p>
<p>There are some limitations you should be aware of:</p>
<ul>
<li>No migration support: while the engine can add models, it is not obvious how to manage any database structure needed by the engine.  I would imagine the engine should use the install.rb hook to copy migrations to the app&#8217;s <code>db/migrate</code> directory.</li>
<li>No public asset support: like migrations, any stylesheets, javascripts or images must be copied as part of the install.rb hook to the app&#8217;s public directory.</li>
<li>Like plugins, naming becomes a concern.  An engine can have a User model but this will lead to problems with the 90% of Rails applications that have a model of the same name.  You can put your models within a module but I&#8217;ve heard of problems when trying to mix Rails autoloading with modularized classes.  As with plugins, be sure to err on the side of safety and use a unique name for your classes.  I&#8217;m building an engine called <strong>Queso</strong> and it provides a model called <strong>QuesoSearch</strong>, which is unlikely to collide with application classes unless you are building an application for a Mexican cheese provider.  <img src='http://www.mikeperham.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </li>
</ul>
<p>So while Engines do have some limitations to be aware of, they do fill a valuable niche; engines provide a good framework for building full-stack generic application functionality.  <a href="http://activescaffold.com">ActiveScaffold</a> is one example of a Rails plugin that would be an excellent choice to rewrite as an Engine.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2009/04/18/engines-in-rails-23/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.mikeperham.com/2009/04/18/engines-in-rails-23/</feedburner:origLink></item>
		<item>
		<title>memcache-client rdoc</title>
		<link>http://feedproxy.google.com/~r/mikeperham/~3/95NKgudGhUY/</link>
		<comments>http://www.mikeperham.com/2009/03/30/memcache-client-rdoc/#comments</comments>
		<pubDate>Tue, 31 Mar 2009 02:42:09 +0000</pubDate>
		<dc:creator>mperham</dc:creator>
		
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=245</guid>
		<description><![CDATA[I&#8217;ve put up the memcache-client rdoc by request of my coworker Chris.
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve put up the <a href="/memcache-client">memcache-client rdoc</a> by request of my coworker Chris.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2009/03/30/memcache-client-rdoc/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.mikeperham.com/2009/03/30/memcache-client-rdoc/</feedburner:origLink></item>
		<item>
		<title>Caching and Rails</title>
		<link>http://feedproxy.google.com/~r/mikeperham/~3/9KhXuDVgh94/</link>
		<comments>http://www.mikeperham.com/2009/03/25/caching-and-rails/#comments</comments>
		<pubDate>Wed, 25 Mar 2009 14:44:32 +0000</pubDate>
		<dc:creator>mperham</dc:creator>
		
		<category><![CDATA[Rails]]></category>

		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=237</guid>
		<description><![CDATA[Here&#8217;s the slides from my AOR talk last night: Caching, Memcached and Rails (600KB).
Caching, Memcached And Rails


I was a little unhappy with my wrapup - the one thing I wanted to teach people was when to use each different caching mechanism provided by Rails and I didn&#8217;t really revisit and summarize that content.  So [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s the slides from my AOR talk last night: <a href='http://www.mikeperham.com/wp-content/uploads/2009/03/caching-memcached-and-rails.key'>Caching, Memcached and Rails</a> (600KB).</p>
<div style="width:425px;text-align:left" id="__ss_1196725"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" href="http://www.slideshare.net/guestac752c/caching-memcached-and-rails?type=presentation" title="Caching, Memcached And Rails">Caching, Memcached And Rails</a><object style="margin:0px" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cachingmemcachedandrails-090325105739-phpapp02&#038;stripped_title=caching-memcached-and-rails" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cachingmemcachedandrails-090325105739-phpapp02&#038;stripped_title=caching-memcached-and-rails" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;"></div>
</div>
<p>I was a little unhappy with my wrapup - the one thing I wanted to teach people was when to use each different caching mechanism provided by Rails and I didn&#8217;t really revisit and summarize that content.  So here&#8217;s a quick summary:</p>
<ul>
<li>HTTP caching - prefer this over all other mechanisms.  This is really the only mechanism that prevents the request from ever hitting Ruby.  This topic is big enough for a book so I won&#8217;t cover it here but review the Expires, Etag and Cache-Control headers to understand how HTTP caching works.  You&#8217;ll need to configure Varnish, Squid, mod_cache or some other HTTP caching proxy.</li>
<li>Page caching - I believe this is really legacy from before Rails supported HTTP caching properly.  Stick with HTTP caching and proper headers.</li>
<li>Action caching - useful when the entire page contents can be cached but you need to run before_filters (e.g. to ensure the user is logged in).  Use AJAX/javascript to do minor customization to the cached content.</li>
<li>Fragment caching - useful when various boxes of content on the page can be cached, but have different dependencies and need to be expired at different times</li>
<li>Object caching (the Rails.cache.fetch method) - the most granular mechanism.  Good for caching the results of intensive logic or queries.</li>
</ul>
<p>I hope this helps demystify the myriad of caching mechanisms Rails supports.  If you want to learn even more,  Gregg Pollack has an amazing set of videos on <a href="http://railslab.newrelic.com/scaling-rails">Scaling Rails</a> which covers caching in great depth.  Happy Caching!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2009/03/25/caching-and-rails/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.mikeperham.com/2009/03/25/caching-and-rails/</feedburner:origLink></item>
		<item>
		<title>Socket Timeouts in Ruby</title>
		<link>http://feedproxy.google.com/~r/mikeperham/~3/KX1VdSL9V2U/</link>
		<comments>http://www.mikeperham.com/2009/03/15/socket-timeouts-in-ruby/#comments</comments>
		<pubDate>Mon, 16 Mar 2009 01:14:28 +0000</pubDate>
		<dc:creator>mperham</dc:creator>
		
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=220</guid>
		<description><![CDATA[One of Ruby&#8217;s weaknesses is its poor networking performance.  Much of that has to do with the net/http implementation, which uses Ruby&#8217;s awful Timeout library.  The issues with Timeout are well documented.  SystemTimer provides a reliable alternative that also performs better.
However I started today wondering if there was a better way.  [...]]]></description>
			<content:encoded><![CDATA[<p>One of Ruby&#8217;s weaknesses is its poor networking performance.  Much of that has to do with the net/http implementation, which uses Ruby&#8217;s awful Timeout library.  The issues with Timeout are <a href="http://blog.headius.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html">well documented</a>.  <a href="http://ph7spot.com/articles/system_timer">SystemTimer</a> provides a reliable alternative that also performs better.</p>
<p>However I started today wondering if there was a better way.  Enabling timeouts has a huge performance hit on my memcache-client library and reducing the overhead would go a long way to making it perform safely <strong>and</strong> quickly.  Since C programs need socket timeouts also,  I figured there had to be a low-level alternative, and indeed there is: the <code>SO_SNDTIMEO</code> and <code>SO_RCVTIMEO</code> socket options.  It&#8217;s a bit involved to create a proper socket with these options but possible:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby ruby" style="font-family:monospace;">    <span style="color:#9966CC; font-weight:bold;">def</span> connect_to<span style="color:#006600; font-weight:bold;">&#40;</span>host, port, timeout=<span style="color:#0000FF; font-weight:bold;">nil</span><span style="color:#006600; font-weight:bold;">&#41;</span>
      addr = Socket.<span style="color:#9900CC;">getaddrinfo</span><span style="color:#006600; font-weight:bold;">&#40;</span>host, <span style="color:#0000FF; font-weight:bold;">nil</span><span style="color:#006600; font-weight:bold;">&#41;</span>
      sock = Socket.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>Socket.<span style="color:#9900CC;">const_get</span><span style="color:#006600; font-weight:bold;">&#40;</span>addr<span style="color:#006600; font-weight:bold;">&#91;</span>0<span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#91;</span>0<span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#6666ff; font-weight:bold;">Socket::SOCK_STREAM</span>, 0<span style="color:#006600; font-weight:bold;">&#41;</span>
&nbsp;
      <span style="color:#9966CC; font-weight:bold;">if</span> timeout
        secs = <span style="color:#CC0066; font-weight:bold;">Integer</span><span style="color:#006600; font-weight:bold;">&#40;</span>timeout<span style="color:#006600; font-weight:bold;">&#41;</span>
        usecs = <span style="color:#CC0066; font-weight:bold;">Integer</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#40;</span>timeout <span style="color:#006600; font-weight:bold;">-</span> secs<span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">*</span> <span style="color:#006666;">1</span>_000_000<span style="color:#006600; font-weight:bold;">&#41;</span>
        optval = <span style="color:#006600; font-weight:bold;">&#91;</span>secs, usecs<span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">pack</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;l_2&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
        sock.<span style="color:#9900CC;">setsockopt</span> <span style="color:#6666ff; font-weight:bold;">Socket::SOL_SOCKET</span>, <span style="color:#6666ff; font-weight:bold;">Socket::SO_RCVTIMEO</span>, optval
        sock.<span style="color:#9900CC;">setsockopt</span> <span style="color:#6666ff; font-weight:bold;">Socket::SOL_SOCKET</span>, <span style="color:#6666ff; font-weight:bold;">Socket::SO_SNDTIMEO</span>, optval
      <span style="color:#9966CC; font-weight:bold;">end</span>
      sock.<span style="color:#9900CC;">connect</span><span style="color:#006600; font-weight:bold;">&#40;</span>Socket.<span style="color:#9900CC;">pack_sockaddr_in</span><span style="color:#006600; font-weight:bold;">&#40;</span>port, addr<span style="color:#006600; font-weight:bold;">&#91;</span>0<span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">3</span><span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
      sock
    <span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>There are a few complexities in the code:</p>
<ul>
<li>We use the low-level operations, <code>Socket.new</code> and <code>connect</code> rather than just <code>TCPSocket.new(host, port)</code> because otherwise we can&#8217;t set the socket options before the connection is attempted; we want to ensure the connection attempt itself is timed out also.</li>
<li>We have to look up the host via DNS by hand as some systems (*cough*, OSX) can return either IPv6 or IPv4 addresses and the address family constant used in Socket.new must match the address used in the connect statement.</li>
<li>The <code>setsockopt</code> method takes a native C struct so we need to construct it using the <code>Array#pack</code> method.</li>
</ul>
<p>Here&#8217;s the results, from worst to best:</p>
<pre>
== memcache-client 1.7.0 with Ruby 1.8.6, normal Ruby timeouts
                                     user     system      total        real
mixed:ruby:memcache-client      14.240000   7.470000  21.710000 ( 22.173267)
</pre>
<pre>
== memcache-client 1.7.0 with Ruby 1.8.6, SystemTimer 1.1.1
                                     user     system      total        real
mixed:ruby:memcache-client      12.400000   1.960000  14.360000 ( 14.857924)
</pre>
<pre>
== memcache-client 1.7.0 with Ruby 1.8.6, raw socket timeouts
                                     user     system      total        real
mixed:ruby:memcache-client       2.750000   0.620000   3.370000 (  5.841545)
</pre>
<pre>
== memcache-client 1.7.0 with Ruby 1.8.6, no socket timeouts
                                     user     system      total        real
mixed:ruby:memcache-client       2.760000   0.620000   3.380000 (  5.902549)</pre>
<p>Awesome.  <strong>With raw socket timeouts, there is no performance impact!</strong>  SystemTimer provides an excellent replacement for Timeout if you want to guarantee a ceiling on the time spent in an arbitrary block, but if you just need timeouts for low-level socket operations, nothing beats the operating system&#8217;s native socket timeout support.</p>
<p>There is a caveat in the paragraph above: <strong>low-level</strong> socket operations.  memcache-client uses three IO methods: read, write and gets.  The first two are low-level and time out properly, but gets is built on the low-level read operation; it has to ignore the EAGAIN error in order to ensure it returns a full line of text.  So we use a hybrid approach, read and write will use the raw socket timeouts and gets will use SystemTimer.  It&#8217;s not quite as fast as with no/raw timeouts but it&#8217;s definitely an improvement:</p>
<pre>
== memcache-client 1.7.0 with Ruby 1.8.6, raw socket timeouts and SystemTimer
                                     user     system      total        real
mixed:ruby:memcache-client       7.490000   1.270000   8.760000 (  9.361547)
</pre>
<p>So we&#8217;ve gone from 22 sec with Timeout to 15 sec with SystemTimer to 9 sec using raw socket timeouts where possible (<a href="http://github.com/mperham/memcache-client/commit/9f5201b77ccb6ef0d021e741cab8468151f2535d">Github commit</a>).  For my next trick, I figure I&#8217;ll rewrite <code>gets</code> to use read so I can remove the need for SystemTimer and Timeout altogether.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2009/03/15/socket-timeouts-in-ruby/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.mikeperham.com/2009/03/15/socket-timeouts-in-ruby/</feedburner:origLink></item>
	</channel>
</rss><!-- Dynamic Page Served (once) in 0.608 seconds --><!-- Cached page generated by WP-Super-Cache on 2009-07-13 12:07:49 -->
