<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" xml:lang="en" xml:base="http://yehudakatz.com/wp-atom.php">
	<title type="text">Katz Got Your Tongue?</title>
	<subtitle type="text">Random Geek-Related Thoughts</subtitle>

	<updated>2010-09-08T01:08:27Z</updated>

	<link rel="alternate" type="text/html" href="http://yehudakatz.com" />
	<id>http://yehudakatz.com/feed/atom/</id>
	

			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/KatzGotYourTongue" /><feedburner:info uri="katzgotyourtongue" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><link rel="license" type="text/html" href="http://creativecommons.org/licenses/by-sa/2.0/" /><entry>
		<author>
			<name>wycats</name>
					</author>
		<title type="html"><![CDATA[Automatic Flushing: The Rails 3.1 Plan]]></title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/KatzGotYourTongue/~3/RU9BuoerKvo/" />
		<id>http://yehudakatz.com/?p=545</id>
		<updated>2010-09-08T01:08:27Z</updated>
		<published>2010-09-08T01:05:34Z</published>
		<category scheme="http://yehudakatz.com" term="Other" />		<summary type="html"><![CDATA[preamble: this post explains, in some detail, how we will implement a nice performance boost for Rails developers. Understanding the details might help gain the full benefits of the optimization, but you will gain some benefits even if you have no idea how it works. As you&#8217;ve probably seen, DHH announced that we&#8217;d be looking [...]]]></summary>
		<content type="html" xml:base="http://yehudakatz.com/2010/09/07/automatic-flushing-the-rails-3-1-plan/">&lt;p&gt;&lt;strong&gt;preamble&lt;/strong&gt;: this post explains, in some detail, how we will implement a nice performance boost for Rails developers. Understanding the details might help gain the full benefits of the optimization, but you will gain some benefits even if you have no idea how it works.&lt;/p&gt;
&lt;p&gt;As you&amp;#8217;ve probably seen, DHH announced that we&amp;#8217;d be looking at flushing in Rails 3.1 to improve the client-side performance of typical Rails applications.&lt;/p&gt;
&lt;p&gt;The most obvious solution, and one that already exists in &lt;a href="http://github.com/oggy/template_streaming"&gt;plugin form&lt;/a&gt;, is to allow a layout to have a new &lt;code&gt;flush&lt;/code&gt; method, which would immediately flush the contents of the layout to the browser. By putting the &lt;code&gt;flush&lt;/code&gt; method below the JavaScript and CSS includes, the browser could begin downloading and evaluating those static assets while the server continues building the page.&lt;/p&gt;
&lt;p&gt;Unfortunately, this solution has a major problem: it requires a fairly significant change in the current model of how people build applications. In general, for performance optimizations (including client-side optimizations), we like to make the default as fast as possible, without asking people to understand a brand new paradigm, centered around the optimization.&lt;/p&gt;
&lt;p&gt;The problem lies in the fact that a Rails layout is essentially a template with a bunch of holes to fill in.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="erb" style="font-family:monospace;"&gt;&amp;lt;html&amp;gt;
  &amp;lt;head&amp;gt;
    &amp;lt;title&amp;gt;&amp;lt;%= yield :title %&amp;gt;&amp;lt;/title&amp;gt;
    &amp;lt;%= javascript_include_tag :defaults %&amp;gt;
    &amp;lt;%= yield :extra_javascripts %&amp;gt;
    &amp;lt;%= stylesheet_link_tag :defaults %&amp;gt;
    &amp;lt;%= yield :extra_stylesheets %&amp;gt;
  &amp;lt;/head&amp;gt;
  &amp;lt;body&amp;gt;
    &amp;lt;%= yield :sidebar %&amp;gt;
    &amp;lt;%= yield %&amp;gt;
  &amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I this simple example, each &lt;code&gt;yield&lt;/code&gt; is a slot that is filled in by the template (usually via &lt;code&gt;content_for&lt;/code&gt;). In order to achieve this, Rails evaluates the template first, which populates a Hash with each piece of content. Next, it renders the layout, and each &lt;code&gt;yield&lt;/code&gt; checks the Hash for that content. In short, because of the way layouts work, Rails renders the template first, and then the layout.&lt;/p&gt;
&lt;p&gt;To get around this, one option would be to say that everything before the &lt;code&gt;flush&lt;/code&gt; must not use &lt;code&gt;yield&lt;/code&gt;, and must be able to run before the template. Unfortunately, it&amp;#8217;s somewhat common for people to set up a &lt;code&gt;content_for(:javascripts)&lt;/code&gt; in a template, to keep the JavaScript needed for a particular snippet of HTML close to the HTML. This means that not only does the user have to be careful about what can go above and below the &lt;code&gt;flush&lt;/code&gt;, he can no longer use &lt;code&gt;content_for&lt;/code&gt; for things high up in the template, which is a fairly significant change to the overall design of Rails applications.&lt;/p&gt;
&lt;p&gt;For Rails 3.1, we wanted a mostly-compatible solution with the same programmer benefits as the existing model, but with all the benefits of automatic flushing. After a number of very long discussions on the topic, José Valim came up with the idea of using Ruby 1.9 fibers to jump back and forth between the template and layout.&lt;/p&gt;
&lt;p&gt;Let&amp;#8217;s start by taking a look at a very simplified version of the current Rails rendering pipeline. First, we set up a Buffer object purely for logging purposes, so we can see what&amp;#8217;s happening as we push things onto the buffer.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#9966CC; font-weight:bold;"&gt;module&lt;/span&gt; Basic
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;class&lt;/span&gt; Buffer &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#CC0066; font-weight:bold;"&gt;String&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; initialize&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;name, context&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@name&lt;/span&gt;    = name
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;value&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;super&lt;/span&gt;
&amp;nbsp;
      &lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;#{@name} is pushing #{value.inspect}&amp;quot;&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next, we create a simple version of &lt;code&gt;ActionView::Base&lt;/code&gt;. We implement the &lt;code&gt;content_for&lt;/code&gt; method simply, to print out a bit of logging information and stash the value into the &lt;code&gt;@content_for&lt;/code&gt; Hash. Note that the real version is pretty similar, with some added logic for capturing the value of the block from ERB.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#9966CC; font-weight:bold;"&gt;module&lt;/span&gt; Basic
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;class&lt;/span&gt; ViewContext
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; initialize
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@buffer&lt;/span&gt;      = Buffer.&lt;span style="color:#9900CC;"&gt;new&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#ff3333; font-weight:bold;"&gt;:main&lt;/span&gt;, &lt;span style="color:#0000FF; font-weight:bold;"&gt;self&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@content_for&lt;/span&gt; = &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#123;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#125;&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; content_for&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;name, value = &lt;span style="color:#0000FF; font-weight:bold;"&gt;nil&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
      value = &lt;span style="color:#9966CC; font-weight:bold;"&gt;yield&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;if&lt;/span&gt; block_given?
      &lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;Setting #{name} to #{value.inspect}&amp;quot;&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@content_for&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#91;&lt;/span&gt;name&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#93;&lt;/span&gt; = value
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; read_content&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;name&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@content_for&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#91;&lt;/span&gt;name&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#93;&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next, we create a number of methods on the &lt;code&gt;ViewContext&lt;/code&gt; that look like compiled ERB templates. In real life, the ERB (or Haml) compiler would define these methods.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#9966CC; font-weight:bold;"&gt;module&lt;/span&gt; Basic
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;class&lt;/span&gt; ViewContext
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; layout
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@buffer&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;&amp;lt;html&amp;gt;&amp;lt;head&amp;gt;&amp;quot;&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@buffer&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;yield&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#ff3333; font-weight:bold;"&gt;:javascripts&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;to_s&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@buffer&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;yield&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#ff3333; font-weight:bold;"&gt;:stylesheets&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;to_s&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@buffer&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;&amp;lt;/head&amp;gt;&amp;lt;body&amp;gt;&amp;quot;&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@buffer&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;yield&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;to_s&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@buffer&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;yield&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#ff3333; font-weight:bold;"&gt;:not_existant&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;to_s&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@buffer&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;&amp;quot;&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@buffer&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; template
      buffer =  Buffer.&lt;span style="color:#9900CC;"&gt;new&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#ff3333; font-weight:bold;"&gt;:template&lt;/span&gt;, &lt;span style="color:#0000FF; font-weight:bold;"&gt;self&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
      content_for&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#ff3333; font-weight:bold;"&gt;:javascripts&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;do&lt;/span&gt;
        &lt;span style="color:#996600;"&gt;&amp;quot;&amp;lt;script src='application.js'&amp;gt;&amp;lt;/script&amp;gt;&amp;quot;&lt;/span&gt;
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
      content_for&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#ff3333; font-weight:bold;"&gt;:stylesheets&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;do&lt;/span&gt;
        &lt;span style="color:#996600;"&gt;&amp;quot;&amp;lt;link href='application.css' rel='stylesheet' /&amp;gt;&amp;quot;&lt;/span&gt;
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
      &lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;Making a SQL call&amp;quot;&lt;/span&gt;
      &lt;span style="color:#CC0066; font-weight:bold;"&gt;sleep&lt;/span&gt; &lt;span style="color:#006666;"&gt;1&lt;/span&gt; &lt;span style="color:#008000; font-style:italic;"&gt;# Emulate a slow SQL call&lt;/span&gt;
      buffer &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;Hello world!&amp;quot;&lt;/span&gt;
      content_for&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#ff3333; font-weight:bold;"&gt;:body&lt;/span&gt;, buffer&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally, we define the basic rendering logic:&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#9966CC; font-weight:bold;"&gt;module&lt;/span&gt; Basic
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;class&lt;/span&gt; ViewContext
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; render
      template
      layout &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#123;&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;|&lt;/span&gt;value&lt;span style="color:#006600; font-weight:bold;"&gt;|&lt;/span&gt; read_content&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;value &lt;span style="color:#006600; font-weight:bold;"&gt;||&lt;/span&gt; &lt;span style="color:#ff3333; font-weight:bold;"&gt;:body&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#125;&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see, we first render the template, which will fill up the &lt;code&gt;@content_for&lt;/code&gt; Hash, and then call the layout method, with a block which pulls the value from that Hash. This is how &lt;code&gt;yield :javascripts&lt;/code&gt; in a layout works.&lt;/p&gt;
&lt;p&gt;Unfortunately, this means that the entire template must be rendered first, including the (fake) slow SQL query. We&amp;#8217;d prefer to flush the buffer after the JavaScripts and CSS are determined, but before the SQL query is made. Unfortunately, that requires running half of the template method, then continuing with the layout method, retaining the ability to resume the template method later.&lt;/p&gt;
&lt;p&gt;You can think of the way that templates are currently rendered (in Rails 2.x and 3.0) like this:&lt;/p&gt;
&lt;p&gt;&lt;img src="http://yehudakatz.com/wp-content/uploads/2010/09/flush.001.png" alt="flush.001.png" border="0" width="600" height="450" /&gt;&lt;/p&gt;
&lt;p&gt;Unfortunately, this makes it very hard to get any more performance juice out without asking the end-developer to make some hard choices. The solution we came up with is to use Ruby 1.9 fibers to allow the rendering to jump back and forth between the template and layout.&lt;/p&gt;
&lt;p&gt;&lt;img src="http://yehudakatz.com/wp-content/uploads/2010/09/flush.002.png" alt="flush.002.png" border="0" width="600" height="450" /&gt;&lt;/p&gt;
&lt;p&gt;Instead of starting with the template and only rendering the layout when ready, we&amp;#8217;ll start with the layout, and jump over to the template when a &lt;code&gt;yield&lt;/code&gt; is called. Once the &lt;code&gt;content_for&lt;/code&gt; that piece is provided by the template, we can jump back to the layout, flush, and continue rendering. As we need more pieces, we can jump back and forth between the template and layout, flushing as we fill in the holes specified by the &lt;code&gt;yield&lt;/code&gt; statements.&lt;/p&gt;
&lt;p&gt;The implementation is mostly straight-forward:&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#CC0066; font-weight:bold;"&gt;require&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;fiber&amp;quot;&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;module&lt;/span&gt; Fibered
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;class&lt;/span&gt; ViewContext &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#6666ff; font-weight:bold;"&gt;Basic::ViewContext&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; initialize
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;super&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@waiting_for&lt;/span&gt; = &lt;span style="color:#0000FF; font-weight:bold;"&gt;nil&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fiber&lt;/span&gt;       = &lt;span style="color:#0000FF; font-weight:bold;"&gt;nil&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; content_for&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;name, value = &lt;span style="color:#0000FF; font-weight:bold;"&gt;nil&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;super&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fiber&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;resume&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;if&lt;/span&gt; &lt;span style="color:#0066ff; font-weight:bold;"&gt;@waiting_for&lt;/span&gt; == name
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; read_content&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;name&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
      content = &lt;span style="color:#9966CC; font-weight:bold;"&gt;super&lt;/span&gt;
      &lt;span style="color:#0000FF; font-weight:bold;"&gt;return&lt;/span&gt; content &lt;span style="color:#9966CC; font-weight:bold;"&gt;if&lt;/span&gt; content
&amp;nbsp;
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;begin&lt;/span&gt;
        &lt;span style="color:#0066ff; font-weight:bold;"&gt;@waiting_for&lt;/span&gt; = name
        Fiber.&lt;span style="color:#9966CC; font-weight:bold;"&gt;yield&lt;/span&gt;
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;ensure&lt;/span&gt;
        &lt;span style="color:#0066ff; font-weight:bold;"&gt;@waiting_for&lt;/span&gt; = &lt;span style="color:#0000FF; font-weight:bold;"&gt;nil&lt;/span&gt;
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;super&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; layout
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fiber&lt;/span&gt; = Fiber.&lt;span style="color:#9900CC;"&gt;new&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;do&lt;/span&gt;
        &lt;span style="color:#9966CC; font-weight:bold;"&gt;super&lt;/span&gt;
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fiber&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;resume&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@buffer&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; render
      layout &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#123;&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;|&lt;/span&gt;value&lt;span style="color:#006600; font-weight:bold;"&gt;|&lt;/span&gt; read_content&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;value &lt;span style="color:#006600; font-weight:bold;"&gt;||&lt;/span&gt; &lt;span style="color:#ff3333; font-weight:bold;"&gt;:body&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#125;&lt;/span&gt;
      template
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fiber&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;resume&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;while&lt;/span&gt; &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fiber&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;alive&lt;/span&gt;?
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@buffer&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For our fibered implementation, we&amp;#8217;ll inherit from &lt;code&gt;Basic::ViewContext&lt;/code&gt;, because we want to be able to use the same templates as we used in the original implementation. We update the &lt;code&gt;content_for&lt;/code&gt;, &lt;code&gt;read_content&lt;/code&gt;, &lt;code&gt;layout&lt;/code&gt; and &lt;code&gt;render&lt;/code&gt; methods to be fiber-aware. Let&amp;#8217;s take them one at a time.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; layout
  &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fiber&lt;/span&gt; = Fiber.&lt;span style="color:#9900CC;"&gt;new&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;do&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;super&lt;/span&gt;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
  &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fiber&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;resume&lt;/span&gt;
  &lt;span style="color:#0066ff; font-weight:bold;"&gt;@buffer&lt;/span&gt;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;First, we wrap the original implementation of layout in a Fiber, and start it right away. Next, we modify the &lt;code&gt;read_content&lt;/code&gt; method to become Fiber-aware:&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; read_content&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;name&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
  content = &lt;span style="color:#9966CC; font-weight:bold;"&gt;super&lt;/span&gt;
  &lt;span style="color:#0000FF; font-weight:bold;"&gt;return&lt;/span&gt; content &lt;span style="color:#9966CC; font-weight:bold;"&gt;if&lt;/span&gt; content
&amp;nbsp;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;begin&lt;/span&gt;
    &lt;span style="color:#0066ff; font-weight:bold;"&gt;@waiting_for&lt;/span&gt; = name
    Fiber.&lt;span style="color:#9966CC; font-weight:bold;"&gt;yield&lt;/span&gt;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;ensure&lt;/span&gt;
    &lt;span style="color:#0066ff; font-weight:bold;"&gt;@waiting_for&lt;/span&gt; = &lt;span style="color:#0000FF; font-weight:bold;"&gt;nil&lt;/span&gt;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;super&lt;/span&gt;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If the &lt;code&gt;@content_for&lt;/code&gt; Hash already has the content, return it right away. Otherwise, say that we&amp;#8217;re waiting for the key in question, and yield out of the Fiber. We modify the &lt;code&gt;render&lt;/code&gt; method so that the layout is rendered first, followed by the template. As a result, yielding out of the layout will start the template&amp;#8217;s rendering.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; render
  layout &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#123;&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;|&lt;/span&gt;value&lt;span style="color:#006600; font-weight:bold;"&gt;|&lt;/span&gt; read_content&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;value &lt;span style="color:#006600; font-weight:bold;"&gt;||&lt;/span&gt; &lt;span style="color:#ff3333; font-weight:bold;"&gt;:body&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#125;&lt;/span&gt;
  template
  &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fiber&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;resume&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;while&lt;/span&gt; &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fiber&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;alive&lt;/span&gt;?
  &lt;span style="color:#0066ff; font-weight:bold;"&gt;@buffer&lt;/span&gt;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next, modify the &lt;code&gt;content_for&lt;/code&gt; method so that when the content we&amp;#8217;re waiting for is provided, we jump back into the layout.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; content_for&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;name, value = &lt;span style="color:#0000FF; font-weight:bold;"&gt;nil&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;super&lt;/span&gt;
  &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fiber&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;resume&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;if&lt;/span&gt; &lt;span style="color:#0066ff; font-weight:bold;"&gt;@waiting_for&lt;/span&gt; == name
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With this setup, the layout and template will ping-pong back and forth, with the layout requesting data, and the template rendering only as far as it needs to go to provide the data requested.&lt;/p&gt;
&lt;p&gt;Finally, let&amp;#8217;s update the Buffer to take our fibered implementation into consideration.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#9966CC; font-weight:bold;"&gt;module&lt;/span&gt; Basic
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;class&lt;/span&gt; Buffer &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#CC0066; font-weight:bold;"&gt;String&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; initialize&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;name, context&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@name&lt;/span&gt;    = name
      &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fibered&lt;/span&gt; = context.&lt;span style="color:#9900CC;"&gt;fibered&lt;/span&gt;?
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;value&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;super&lt;/span&gt;
&amp;nbsp;
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;if&lt;/span&gt; &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fibered&lt;/span&gt;
        &lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;Flushing #{value.inspect}&amp;quot;&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;if&lt;/span&gt; &lt;span style="color:#0066ff; font-weight:bold;"&gt;@fibered&lt;/span&gt;
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;else&lt;/span&gt;
        &lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;#{@name} is pushing #{value.inspect}&amp;quot;&lt;/span&gt;
      &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;class&lt;/span&gt; ViewContext
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; fibered?
      &lt;span style="color:#0000FF; font-weight:bold;"&gt;false&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;module&lt;/span&gt; Fibered
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;class&lt;/span&gt; ViewContext
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;def&lt;/span&gt; fibered?
      &lt;span style="color:#0000FF; font-weight:bold;"&gt;true&lt;/span&gt;
    &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
  &lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now that we&amp;#8217;re rendering the layout in order, we can flush as we go, instead of being forced to wait for the entire template to render before we can start flushing.&lt;/p&gt;
&lt;p&gt;It&amp;#8217;s worth mentioning that optimal flushing performance will be based on the order of the &lt;code&gt;content_for&lt;/code&gt; in your template. If you run your queries first, then put the expensive template rendering, and only finally do the &lt;code&gt;content_for(:javascript)&lt;/code&gt; at the end, the flushing behavior will look like this:&lt;/p&gt;
&lt;p&gt;&lt;img src="http://yehudakatz.com/wp-content/uploads/2010/09/flush.003.png" alt="flush.003.png" border="0" width="600" height="450" /&gt;&lt;/p&gt;
&lt;p&gt;Instead of flushing quickly, before the SQL call, things are barely better than they are in Rails 2.3, when the entire template must be rendered before the first flush. Because things are no worse, even in the worst-case scenario, we can make this the default behavior. Most people will see some benefit from it, and people interested in the best performance can order their &lt;code&gt;content_for&lt;/code&gt; blocks so they cause the most beneficial flushing.&lt;/p&gt;
&lt;p&gt;Even for people willing to put in the effort, this API is better than forcing a manual flush, because you can still put your &lt;code&gt;content_for&lt;/code&gt; blocks alongside the templates that they are related to.&lt;/p&gt;
&lt;p&gt;Look for this feature in Rails 3.1!&lt;/p&gt;
&lt;h2&gt;Small Caveat&lt;/h2&gt;
&lt;p&gt;For the purposes of this simplified example, I assumed that &lt;code&gt;content_for&lt;/code&gt; can only be run once, immediately setting the value in the &lt;code&gt;@content_for&lt;/code&gt; Hash. However, in some cases, people want to accumulate a String for a particular value. Obviously, we won&amp;#8217;t be able to flush until the full String for that value is accumulated.&lt;/p&gt;
&lt;p&gt;As a result, we&amp;#8217;ll be adding a new API (likely called &lt;code&gt;provide&lt;/code&gt;), which will behave exactly the same as &lt;code&gt;content_for&lt;/code&gt;, but without the ability to accumulate. In the vast majority of cases, people will want to use &lt;code&gt;provide&lt;/code&gt; (for instance, &lt;code&gt;provide :sidebar&lt;/code&gt;), and get all the benefits of autoflushing. In a few cases, people will want to be able to continue accumulating a String, and will still be able to use &lt;code&gt;content_for&lt;/code&gt;, but the template will not return control to the layout when that happens.&lt;/p&gt;
&lt;p&gt;Also note that this (fairly detailed) explanation is not something that you will need to understand as a Rails developer. Instead, you will continue to go about your development as before, using &lt;code&gt;provide&lt;/code&gt; if you have just a single piece of content to add, and &lt;code&gt;content_for&lt;/code&gt; if you have multiple pieces of content to add, and Rails will automatically optimize flushing for you as well as we can.&lt;/p&gt;
&lt;script type="text/javascript"&gt;
  addthis_url    = 'http%3A%2F%2Fyehudakatz.com%2F2010%2F09%2F07%2Fautomatic-flushing-the-rails-3-1-plan%2F';
  addthis_title  = 'Automatic+Flushing%3A+The+Rails+3.1+Plan';
  addthis_pub    = '';
&lt;/script&gt;&lt;script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" &gt;&lt;/script&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/KatzGotYourTongue?a=RU9BuoerKvo:brSzbKnguhs:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/KatzGotYourTongue?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/KatzGotYourTongue/~4/RU9BuoerKvo" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://yehudakatz.com/2010/09/07/automatic-flushing-the-rails-3-1-plan/#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://yehudakatz.com/2010/09/07/automatic-flushing-the-rails-3-1-plan/feed/atom/" thr:count="0" />
		<thr:total>0</thr:total>
	<feedburner:origLink>http://yehudakatz.com/2010/09/07/automatic-flushing-the-rails-3-1-plan/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>wycats</name>
					</author>
		<title type="html"><![CDATA[A Tale of Abort Traps (or Always Question Your Assumptions)]]></title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/KatzGotYourTongue/~3/Cbir1AF5DFc/" />
		<id>http://yehudakatz.com/?p=537</id>
		<updated>2010-08-25T15:13:20Z</updated>
		<published>2010-08-25T01:19:24Z</published>
		<category scheme="http://yehudakatz.com" term="Ruby" />		<summary type="html"><![CDATA[For a few months now, the bundler team has been getting intermittent reports of segfaults in C extensions that happen when using bundler with rvm. A cursory investigation revealed that the issue was that the C extensions were compiled for the wrong version of Ruby. For instance, we would get reports of segfaults in nokogiri [...]]]></summary>
		<content type="html" xml:base="http://yehudakatz.com/2010/08/24/a-tale-of-abort-traps-or-always-question-your-assumptions/">&lt;p&gt;For a few months now, the bundler team has been getting intermittent reports of segfaults in C extensions that happen when using bundler with rvm. A cursory investigation revealed that the issue was that the C extensions were compiled for the wrong version of Ruby.&lt;/p&gt;
&lt;p&gt;For instance, we would get reports of segfaults in &lt;code&gt;nokogiri&lt;/code&gt; when using Ruby 1.9 that resulted from the &lt;code&gt;$GEM_HOME&lt;/code&gt; in Ruby 1.9 containing a &lt;code&gt;.bundle&lt;/code&gt; file compiled for Ruby 1.8. We got a &lt;strong&gt;lot&lt;/strong&gt; of &lt;a href="http://github.com/carlhuda/bundler/issues/closed#issue/583"&gt;really angry bug reports&lt;/a&gt;, and a lot of speculation that we were doing something wrong that could be obviously fixed.&lt;/p&gt;
&lt;p&gt;I finally ran into the issue myself, on my own machine a couple days ago, and tracked it down, deep into the fires of Mount Doom. A word of warning: this story may shock you.&lt;/p&gt;
&lt;h2&gt;It Begins&lt;/h2&gt;
&lt;p&gt;I usually use rvm for my day-to-day work, but I tend to not use gemsets unless I want a guaranteed clean environment for debugging something. Bundler takes care of isolation, even in the face of a lot of different gems mixed together in a global gemset. Last week, I was deep in the process of debugging another issue, so I was making pretty heavy use of gemset (and &lt;code&gt;rvm gemset empty&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;I had switched to Ruby 1.9.2 (final, which had just come out), created a new gemset, and run bundle install on the project I was working on, which included &lt;code&gt;thin&lt;/code&gt; in the &lt;code&gt;Gemfile&lt;/code&gt;. I had run &lt;code&gt;bundle install&lt;/code&gt; to install the gems in the project, and among other things, bundler installed thin &amp;#8220;with native extensions&amp;#8221;. I use bundler a lot, and I had run this exact command probably thousands of times.&lt;/p&gt;
&lt;p&gt;This time, however, I got a segfault in Ruby 1.9, that pointed at the &lt;code&gt;require&lt;/code&gt; call to the &lt;code&gt;rubyeventmachine.bundle&lt;/code&gt; file.&lt;/p&gt;
&lt;h2&gt;Debugging&lt;/h2&gt;
&lt;p&gt;Now that I had the bug on a physical machine, I started debugging. Our working hypothesis was that bundler or Rubygems was somehow compiling gems against Ruby 1.8, even when on Ruby 1.9, but we couldn&amp;#8217;t figure out exactly how it could be happening.&lt;/p&gt;
&lt;p&gt;The first thing I did was run &lt;code&gt;otool -L&lt;/code&gt; on the binary (&lt;code&gt;rubyeventmachine.bundle&lt;/code&gt;):&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="plain" style="font-family:monospace;"&gt;$ otool -L rubyeventmachine.bundle 
rubyeventmachine.bundle:
  /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/libruby.1.dylib
  (compatibility version 1.8.0, current version 1.8.7)
  /usr/lib/libssl.0.9.8.dylib (compatibility version 0.9.8, current version 0.9.8)
  /usr/lib/libcrypto.0.9.8.dylib (compatibility version 0.9.8, current version 0.9.8)
  /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.3)
  /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 125.0.1)
  /usr/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version 7.9.0)&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So that&amp;#8217;s weird. I had run bundle install from inside an rvm gemset, yet it was being compiled against the &lt;code&gt;libruby&lt;/code&gt; that ships with OSX. I had always suspecting that the problem was a leak from another rvm-installed Ruby, so this was definitely a surprise.&lt;/p&gt;
&lt;p&gt;Just out of curiosity, I ran &lt;code&gt;which bundle&lt;/code&gt;:&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="plain" style="font-family:monospace;"&gt;$ which bundle
/usr/bin/bundle&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Ok, now I knew something was rotten. I printed out the &lt;code&gt;$PATH&lt;/code&gt;:&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="plain" style="font-family:monospace;"&gt;$ echo $PATH
/Users/wycats/.rvm/gems/ruby-1.9.2-p0/bin:...
$ ls /Users/wycats/.rvm/gems/ruby-1.9.2-p0/bin
asdf*             erubis*           rake2thor*        sc-build-number*
autospec*         prettify_json.rb* redcloth*         sc-docs*
bundle*           rackup*           ruby-prof*        sc-gen*
edit_json.rb*     rake*             sc-build*         sc-init*&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In other words, a big fat &lt;strong&gt;WTF&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;I asked around, and some people had the vague idea that there was a &lt;code&gt;$PATH&lt;/code&gt; cache in Unix shells. Someone pointed me at &lt;a href="http://blog.byohosting.com/2008/01/17/how-to-clear-shell-command-path-cache/"&gt;this post about clearing the cache&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Sure enough, running &lt;code&gt;hash -r&lt;/code&gt; fixed the output of &lt;code&gt;which&lt;/code&gt;. I alerted Wayne of rvm to this problem, and he threw in &lt;a href="http://github.com/wayneeseguin/rvm/commit/f6473f29458d60201c47cf65ae7dde465ef0adfc"&gt;a fix&lt;/a&gt; to rvm that cleared the cache when switching rvms. Momentarily, everything seemed fine.&lt;/p&gt;
&lt;h2&gt;Digging Further&lt;/h2&gt;
&lt;p&gt;I still didn&amp;#8217;t exactly understand how this condition could happen in the first place. When I went digging, I discovered that shells almost uniformly clear the path cache when modifying the &lt;code&gt;$PATH&lt;/code&gt;. Since rvm pushes its &lt;code&gt;bin&lt;/code&gt; directory onto the &lt;code&gt;$PATH&lt;/code&gt; when you switch to a different rvm, I couldn&amp;#8217;t understand how exactly this problem was happening.&lt;/p&gt;
&lt;p&gt;I read through Chapter 3 of the &lt;a href="http://zsh.sourceforge.net/Guide/zshguide03.html"&gt;zsh guide&lt;/a&gt;, and it finally clicked:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;
The way commands are stored has other consequences. In particular, zsh won&amp;#8217;t look for a new command if it already knows where to find one. &lt;strong&gt;If I put a new ls command in /usr/local/bin in the above example, zsh would continue to use /bin/ls&lt;/strong&gt; (assuming it had already been found). To fix this, there is the command rehash, which actually empties the command hash table, so that finding commands starts again from scratch. Users of csh may remember having to type rehash quite a lot with new commands: it&amp;#8217;s not so bad in zsh, because if no command was already hashed, or the existing one disappeared, zsh will automatically scan the path again; furthermore, zsh performs a rehash of its own accord if $path is altered. So adding a new duplicate command somewhere towards the head of $path is the main reason for needing rehash.
&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;By using the &lt;code&gt;hash&lt;/code&gt; command (which prints out all the entries in this cache), I was able to confirm that the same behavior exists in bash, but it seems that the &lt;code&gt;which&lt;/code&gt; command (which I was using for testing the problem) implicitly rehashes in bash.&lt;/p&gt;
&lt;p&gt;During this time, I also took a look at &lt;code&gt;/usr/bin/bundle&lt;/code&gt;, which looks like this:&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;$ cat &lt;span style="color:#006600; font-weight:bold;"&gt;/&lt;/span&gt;usr&lt;span style="color:#006600; font-weight:bold;"&gt;/&lt;/span&gt;bin&lt;span style="color:#006600; font-weight:bold;"&gt;/&lt;/span&gt;ruby
&lt;span style="color:#008000; font-style:italic;"&gt;#!/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;#&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# This file was generated by RubyGems.&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;#&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# The application 'bundler' is installed as part of a gem, and&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# this file is here to facilitate running it.&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;#&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#CC0066; font-weight:bold;"&gt;require&lt;/span&gt; &lt;span style="color:#996600;"&gt;'rubygems'&lt;/span&gt;
&amp;nbsp;
version = &lt;span style="color:#996600;"&gt;&amp;quot;&amp;gt;= 0&amp;quot;&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;if&lt;/span&gt; ARGV.&lt;span style="color:#9900CC;"&gt;first&lt;/span&gt; =~ &lt;span style="color:#006600; font-weight:bold;"&gt;/&lt;/span&gt;^_&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;.&lt;span style="color:#006600; font-weight:bold;"&gt;*&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;_$&lt;span style="color:#006600; font-weight:bold;"&gt;/&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;and&lt;/span&gt; &lt;span style="color:#6666ff; font-weight:bold;"&gt;Gem::Version&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;correct&lt;/span&gt;? $&lt;span style="color:#006666;"&gt;1&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;then&lt;/span&gt;
  version = $&lt;span style="color:#006666;"&gt;1&lt;/span&gt;
  ARGV.&lt;span style="color:#9900CC;"&gt;shift&lt;/span&gt;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
gem &lt;span style="color:#996600;"&gt;'bundler'&lt;/span&gt;, version
&lt;span style="color:#CC0066; font-weight:bold;"&gt;load&lt;/span&gt; Gem.&lt;span style="color:#9900CC;"&gt;bin_path&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;'bundler'&lt;/span&gt;, &lt;span style="color:#996600;"&gt;'bundle'&lt;/span&gt;, version&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see, executable wrappers created by Rubygems hardcode the version of Ruby that was used to install them. This is one of the reasons that &lt;code&gt;rvm&lt;/code&gt; keeps its own directory for installed executables.&lt;/p&gt;
&lt;h2&gt;A Working Hypothesis&lt;/h2&gt;
&lt;p&gt;It took me some time to figure all this out, during which time I was speaking to a bunch of friends and the guys in #zsh. I formed a working hypothesis. I figured that people were doing something like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install bundler onto their system, and use it there&lt;/li&gt;
&lt;li&gt;Need to work on a new project, so switch to rvm, and create a new gemset&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;bundle install&lt;/code&gt;, resulting in an error&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;gem install bundler&lt;/code&gt;, to install bundler&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;bundle install&lt;/code&gt;, which works, but has a subtle bug&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let&amp;#8217;s walk through each of these steps, and unpack exactly what happens.&lt;/p&gt;
&lt;h4&gt;Install bundler on their system&lt;/h4&gt;
&lt;p&gt;This results in an executable at &lt;code&gt;/usr/bin/bundle&lt;/code&gt; that looks like this:&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#008000; font-style:italic;"&gt;#!/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;#&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# This file was generated by RubyGems.&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;#&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# The application 'bundler' is installed as part of a gem, and&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# this file is here to facilitate running it.&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;#&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#CC0066; font-weight:bold;"&gt;require&lt;/span&gt; &lt;span style="color:#996600;"&gt;'rubygems'&lt;/span&gt;
&amp;nbsp;
version = &lt;span style="color:#996600;"&gt;&amp;quot;&amp;gt;= 0&amp;quot;&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;if&lt;/span&gt; ARGV.&lt;span style="color:#9900CC;"&gt;first&lt;/span&gt; =~ &lt;span style="color:#006600; font-weight:bold;"&gt;/&lt;/span&gt;^_&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;.&lt;span style="color:#006600; font-weight:bold;"&gt;*&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;_$&lt;span style="color:#006600; font-weight:bold;"&gt;/&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;and&lt;/span&gt; &lt;span style="color:#6666ff; font-weight:bold;"&gt;Gem::Version&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;correct&lt;/span&gt;? $&lt;span style="color:#006666;"&gt;1&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;then&lt;/span&gt;
  version = $&lt;span style="color:#006666;"&gt;1&lt;/span&gt;
  ARGV.&lt;span style="color:#9900CC;"&gt;shift&lt;/span&gt;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;
&amp;nbsp;
gem &lt;span style="color:#996600;"&gt;'bundler'&lt;/span&gt;, version
&lt;span style="color:#CC0066; font-weight:bold;"&gt;load&lt;/span&gt; Gem.&lt;span style="color:#9900CC;"&gt;bin_path&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;'bundler'&lt;/span&gt;, &lt;span style="color:#996600;"&gt;'bundle'&lt;/span&gt;, version&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are two important things here. First, it hardcodes the shebang to the system Ruby location. Second, it uses Rubygems to look up the location of the executable shipped with bundler, which will respect &lt;code&gt;GEM_HOME&lt;/code&gt;.&lt;/p&gt;
&lt;h4&gt;Switch to rvm, and create a new gemset&lt;/h4&gt;
&lt;p&gt;Switching to an rvm gemset does a few relevant things. First, it prepends &lt;code&gt;~/.rvm/gems/ruby-1.9.2-p0@gemset/bin&lt;/code&gt; onto the &lt;code&gt;$PATH&lt;/code&gt;. This effectively resets the shell&amp;#8217;s built-in command cache. Second, it sets &lt;code&gt;$GEM_HOME&lt;/code&gt; to &lt;code&gt;~/.rvm/gems/ruby-1.9.2-p0&lt;/code&gt;. The &lt;code&gt;$GEM_HOME&lt;/code&gt; is where Ruby both looks for gems as well as where it installs gems.&lt;/p&gt;
&lt;h4&gt;Run &lt;code&gt;bundle install&lt;/code&gt;, resulting in an error&lt;/h4&gt;
&lt;p&gt;Specifically,&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="plain" style="font-family:monospace;"&gt;$ bundle install
/Library/Ruby/Site/1.8/rubygems.rb:777:in `report_activate_error': 
Could not find RubyGem bundler (&amp;gt;= 0) (Gem::LoadError)
	from /Library/Ruby/Site/1.8/rubygems.rb:211:in `activate'
	from /Library/Ruby/Site/1.8/rubygems.rb:1056:in `gem'
	from /usr/bin/bundle:18&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Most people don&amp;#8217;t take such a close look at this error, interpreting it as the equivalent of &lt;code&gt;command not found: bundle&lt;/code&gt;. What&amp;#8217;s actually happening is a bit different. Since &lt;code&gt;bundle&lt;/code&gt; is installed at &lt;code&gt;/usr/bin/bundle&lt;/code&gt;, the shell finds it, and runs it. It uses the system Ruby (hardcoded in its shebang), and the &lt;code&gt;$GEM_HOME&lt;/code&gt; set by rvm. Since I just created a brand new gemset, the &lt;code&gt;bundler&lt;/code&gt; gem is not found in &lt;code&gt;$GEM_HOME&lt;/code&gt;. As a result, the line &lt;code&gt;gem 'bundler', version&lt;/code&gt; in the executable fails with the error I showed above.&lt;/p&gt;
&lt;p&gt;However, because the shell found the executable, it ends up in the shell&amp;#8217;s command cache. You can see the command cache by typing &lt;code&gt;hash&lt;/code&gt;. In both bash and zsh, the command cache will include an entry for &lt;code&gt;bundle&lt;/code&gt; pointing at &lt;code&gt;/usr/bin/bundle&lt;/code&gt;. zsh is a more aggressive about populating the cache, so you&amp;#8217;ll see the list of all commands in the system in the command cache (which, you&amp;#8217;ll recall, was reset when you first switched into the gemset, because &lt;code&gt;$PATH&lt;/code&gt; was altered).&lt;/p&gt;
&lt;h4&gt;Run &lt;code&gt;gem install bundler&lt;/code&gt;, to install bundler&lt;/h4&gt;
&lt;p&gt;This will install the &lt;code&gt;bundle&lt;/code&gt; executable to &lt;code&gt;~/.rvm/gems/ruby-1.9.2-p0@gemset/bin&lt;/code&gt;, which is the first entry on the &lt;code&gt;$PATH&lt;/code&gt;. It will also install bundler to the &lt;code&gt;$GEM_HOME&lt;/code&gt;.&lt;/p&gt;
&lt;h4&gt;Run &lt;code&gt;bundle install&lt;/code&gt;, which works, but has a subtle bug&lt;/h4&gt;
&lt;p&gt;Here&amp;#8217;s where the fun happens. Since we didn&amp;#8217;t modify the cache, or call &lt;code&gt;hash -r&lt;/code&gt;, the next call to &lt;code&gt;bundle&lt;/code&gt; still picks up the bundle in &lt;code&gt;/usr/bin/bundle&lt;/code&gt;, which is hardcoded to use system Ruby. However, it will use the version of Bundler we just installed, since it was installed to &lt;code&gt;$GEM_HOME&lt;/code&gt;, and Rubygems uses that to look for gems. In other words, even though we&amp;#8217;re using system Ruby, the &lt;code&gt;/usr/bin/bundle&lt;/code&gt; executable will use the rvm gemset&amp;#8217;s Bundler gem we just installed.&lt;/p&gt;
&lt;p&gt;Additionally, because &lt;code&gt;$GEM_HOME&lt;/code&gt; points to the rvm gemset, bundler will install all its gems to the rvm gemset. Taken together, these factors make it almost completely irrelevant that the system Ruby was used to install gems. After all, who cares which Ruby installed the gems, as long as they end up in the right place for the rvm&amp;#8217;s Ruby.&lt;/p&gt;
&lt;p&gt;There are actually two problems. First, as we&amp;#8217;ve seen, the version of Ruby that installs a gem also hardcodes itself into the shebang of the executables for the gem.&lt;/p&gt;
&lt;p&gt;Try this experiment:&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="plain" style="font-family:monospace;"&gt;$ rvm use 1.9.2
$ rvm gemset create experiment
$ rvm gemset use experiment
$ /usr/bin/gem install rack
$ cat ~/.rvm/gems/ruby-1.9.2-p0@experiment/bin/rackup
#!/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby
#
# This file was generated by RubyGems.
#
# The application 'rack' is installed as part of a gem, and
# this file is here to facilitate running it.
#
&amp;nbsp;
require 'rubygems'
&amp;nbsp;
version = &amp;quot;&amp;gt;= 0&amp;quot;
&amp;nbsp;
if ARGV.first =~ /^_(.*)_$/ and Gem::Version.correct? $1 then
  version = $1
  ARGV.shift
end
&amp;nbsp;
gem 'rack', version
load Gem.bin_path('rack', 'rackup', version)&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As a result, any gems that bundler installed will put their executables in the gemset&amp;#8217;s location, but still be hardcoded to use the system&amp;#8217;s Ruby. This is not actually the problem we&amp;#8217;re encountering here, but it could cause some weird results.&lt;/p&gt;
&lt;p&gt;More concerningly, since our system Ruby installed the gems, it will also link any C extensions it compiles against its own copy of &lt;code&gt;libruby&lt;/code&gt;. This shouldn&amp;#8217;t be a major issue if you&amp;#8217;re using a version of Ruby 1.8 in rvm, but it &lt;strong&gt;is&lt;/strong&gt; a major issue if you&amp;#8217;re using Ruby 1.9.&lt;/p&gt;
&lt;p&gt;Now, we have a gem stored in your rvm&amp;#8217;s Ruby that has a &lt;code&gt;.bundle&lt;/code&gt; in it that is linked against the wrong Ruby. When running your application, Ruby 1.9 will try to load it, and kaboom: segfault.&lt;/p&gt;
&lt;h2&gt;Postscript&lt;/h2&gt;
&lt;p&gt;The crazy thing about this story is that it&amp;#8217;s a lot of factors conspiring to cause problems. The combination of the shell command cache, &lt;code&gt;$GEM_HOME&lt;/code&gt; making it look like &lt;strong&gt;SYSTEM&lt;/strong&gt; Ruby was doing the right thing, and hardcoding the version of Ruby in the shebang line of installed gems made this segfault possible.&lt;/p&gt;
&lt;p&gt;Thankfully, now that we figured the problem out, the latest version of rvm fixes it. The solution: wrap the gem command in a shell function that calls &lt;code&gt;hash -r&lt;/code&gt; afterward.&lt;/p&gt;
&lt;p&gt;Amazing.&lt;/p&gt;
&lt;script type="text/javascript"&gt;
  addthis_url    = 'http%3A%2F%2Fyehudakatz.com%2F2010%2F08%2F24%2Fa-tale-of-abort-traps-or-always-question-your-assumptions%2F';
  addthis_title  = 'A+Tale+of+Abort+Traps+%28or+Always+Question+Your+Assumptions%29';
  addthis_pub    = '';
&lt;/script&gt;&lt;script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" &gt;&lt;/script&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/KatzGotYourTongue?a=Cbir1AF5DFc:mD0etq5wfOU:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/KatzGotYourTongue?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/KatzGotYourTongue/~4/Cbir1AF5DFc" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://yehudakatz.com/2010/08/24/a-tale-of-abort-traps-or-always-question-your-assumptions/#comments" thr:count="20" />
		<link rel="replies" type="application/atom+xml" href="http://yehudakatz.com/2010/08/24/a-tale-of-abort-traps-or-always-question-your-assumptions/feed/atom/" thr:count="20" />
		<thr:total>20</thr:total>
	<feedburner:origLink>http://yehudakatz.com/2010/08/24/a-tale-of-abort-traps-or-always-question-your-assumptions/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>wycats</name>
					</author>
		<title type="html"><![CDATA[Using &gt;= Considered Harmful (or, What&#8217;s Wrong With &gt;=)]]></title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/KatzGotYourTongue/~3/gwnyphwdm-I/" />
		<id>http://yehudakatz.com/?p=531</id>
		<updated>2010-08-21T16:14:00Z</updated>
		<published>2010-08-21T14:54:02Z</published>
		<category scheme="http://yehudakatz.com" term="Ruby" />		<summary type="html"><![CDATA[TL;DR Use ~&#62; instead. Having spent far, far too much time with Rubygems dependencies, and the problems that arise with unusual combinations, I am ready to come right out and say it: you basically never, ever want to use a >= dependency in your gems. When you specify a dependency for your gem, it should [...]]]></summary>
		<content type="html" xml:base="http://yehudakatz.com/2010/08/21/using-considered-harmful-or-whats-wrong-with/">&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; Use &lt;code&gt;~&amp;gt;&lt;/code&gt; instead.&lt;/p&gt;
&lt;p&gt;Having spent far, far too much time with Rubygems dependencies, and the problems that arise with unusual combinations, I am ready to come right out and say it: you basically never, ever want to use a &gt;= dependency in your gems.&lt;/p&gt;
&lt;p&gt;When you specify a dependency for your gem, it &lt;strong&gt;should&lt;/strong&gt; mean that you are fairly sure that the unmodified code in the released gem will continue to work with any future version of the dependency that matches the version you specified. So for instance, let&amp;#8217;s take a look at the dependencies listed in the &lt;code&gt;actionpack&lt;/code&gt; gem:&lt;/p&gt;
&lt;pre&gt;
activemodel (= 3.0.0.rc, runtime)
activesupport (= 3.0.0.rc, runtime)
builder (~&gt; 2.1.2, runtime)
erubis (~&gt; 2.6.6, runtime)
i18n (~&gt; 0.4.1, runtime)
rack (~&gt; 1.2.1, runtime)
rack-mount (~&gt; 0.6.9, runtime)
rack-test (~&gt; 0.5.4, runtime)
tzinfo (~&gt; 0.3.22, runtime)
&lt;/pre&gt;
&lt;p&gt;Since we release the Rails gems as a unit, we declare hard dependencies on &lt;code&gt;activemodel&lt;/code&gt; and &lt;code&gt;activesupport&lt;/code&gt;. We declare soft dependencies on &lt;code&gt;builder&lt;/code&gt;, &lt;code&gt;erubis&lt;/code&gt;, &lt;code&gt;i18n&lt;/code&gt;, &lt;code&gt;rack&lt;/code&gt;, &lt;code&gt;rack-mount&lt;/code&gt;, &lt;code&gt;rack-test&lt;/code&gt;, and &lt;code&gt;tzinfo&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;You might not know what exactly the &lt;code&gt;~&amp;gt;&lt;/code&gt; version specifier means. Essentially, it decomposes into two specifiers. So &lt;code&gt;~&amp;gt; 2.1.2&lt;/code&gt; means &lt;code&gt;&amp;gt;= 2.1.2, &amp;lt; 2.2.0&lt;/code&gt;. In other words, it means &amp;#8220;2.1.x, but not less than 2.1.2&amp;#8243;. Specifying &lt;code&gt;~&amp;gt; 1.0&lt;/code&gt;, like many people do for Rack, means &amp;#8220;any 1.x&amp;#8221;.&lt;/p&gt;
&lt;p&gt;You should make your dependencies as soft as the &lt;a href="http://semver.org/"&gt;versioning scheme&lt;/a&gt; and release practices of your dependencies will allow. If you&amp;#8217;re monkey-patching a library outside of its public API (not a very good practice for libraries), you should probably stick with an &lt;code&gt;=&lt;/code&gt; dependency.&lt;/p&gt;
&lt;p&gt;One thing for certain though: you &lt;strong&gt;cannot&lt;/strong&gt; be sure that your gem works with every future version of your dependencies. Sanely versioned gems take the opportunity of a major release to break things, and until you have actually tested against the new versions, it&amp;#8217;s madness to claim compatibility. One example: a number of gems have dependencies on &lt;code&gt;activesupport &amp;gt;= 2.3&lt;/code&gt;. In a large number of cases, these gems do not work correctly with ActiveSuport 3.0, since we changed how components of ActiveSupport get loaded to make it easier to cherry-pick.&lt;/p&gt;
&lt;p&gt;Now, instead of receiving a version conflict, users of these gems will get cryptic runtime error messages. Even worse, everything might &lt;strong&gt;appear&lt;/strong&gt; to work, until some weird edge-case is exercised in production, and which your tests would have caught.&lt;/p&gt;
&lt;h2&gt;But What Happens When a New Version is Released?&lt;/h2&gt;
&lt;p&gt;One reason that people use the &lt;code&gt;activesupport &amp;gt;= 2.3&lt;/code&gt; is that, assuming Rails maintains backward-compatibility, their gem will continue to work in newer Rails environments without any difficulty. If everything happens to work, it saves you the time of running their unit tests against newer versions of dependencies and cutting a new release.&lt;/p&gt;
&lt;p&gt;As I said before, this is a deadly practice. By specifying appropriate dependencies (based on your confidence in the underlying library&amp;#8217;s versioning scheme), you will have a natural opportunity to run your test suite against the new versions, and release a new gem that you know actually works.&lt;/p&gt;
&lt;p&gt;This &lt;strong&gt;does&lt;/strong&gt; mean that you will likely want to release patch releases of old versions of your gem. For instance, if I have AuthMagic 1.0, which worked against Rails 2.3, and I release AuthMagic 2.0 once Rails 3.0 comes out, it makes sense to continue patching AuthMagic 1.0 for a little while, so your Rails 2.3 users aren&amp;#8217;t left out in the cold.&lt;/p&gt;
&lt;h2&gt;Applications and Gemfiles&lt;/h2&gt;
&lt;p&gt;I should be clear that this versioning advice doesn&amp;#8217;t necessarily apply to an application using Bundler. That&amp;#8217;s because the &lt;code&gt;Gemfile.lock&lt;/code&gt;, which &lt;a href="http://gembundler.com/v1.0/rationale.html"&gt;you should check into version control&lt;/a&gt;, essentially converts all &lt;code&gt;&amp;gt;=&lt;/code&gt; dependencies into hard dependencies. However, because you may want to run &lt;code&gt;bundle update&lt;/code&gt; at some point in the future, which will update &lt;strong&gt;everything&lt;/strong&gt; to the latest possible versions, you might want to use version specifiers in your &lt;code&gt;Gemfile&lt;/code&gt; that seem likely to work into the future.&lt;/p&gt;
&lt;script type="text/javascript"&gt;
  addthis_url    = 'http%3A%2F%2Fyehudakatz.com%2F2010%2F08%2F21%2Fusing-considered-harmful-or-whats-wrong-with%2F';
  addthis_title  = 'Using+%3E%3D+Considered+Harmful+%28or%2C+What%26%238217%3Bs+Wrong+With+%3E%3D%29';
  addthis_pub    = '';
&lt;/script&gt;&lt;script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" &gt;&lt;/script&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/KatzGotYourTongue?a=gwnyphwdm-I:UJBqhlagFCI:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/KatzGotYourTongue?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/KatzGotYourTongue/~4/gwnyphwdm-I" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://yehudakatz.com/2010/08/21/using-considered-harmful-or-whats-wrong-with/#comments" thr:count="17" />
		<link rel="replies" type="application/atom+xml" href="http://yehudakatz.com/2010/08/21/using-considered-harmful-or-whats-wrong-with/feed/atom/" thr:count="17" />
		<thr:total>17</thr:total>
	<feedburner:origLink>http://yehudakatz.com/2010/08/21/using-considered-harmful-or-whats-wrong-with/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>wycats</name>
					</author>
		<title type="html"><![CDATA[Threads (in Ruby): Enough Already]]></title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/KatzGotYourTongue/~3/u8IY8IwSdIs/" />
		<id>http://yehudakatz.com/?p=517</id>
		<updated>2010-08-14T20:25:44Z</updated>
		<published>2010-08-14T19:16:55Z</published>
		<category scheme="http://yehudakatz.com" term="Merb" /><category scheme="http://yehudakatz.com" term="Rails 3" /><category scheme="http://yehudakatz.com" term="Ruby" /><category scheme="http://yehudakatz.com" term="Ruby on Rails" />		<summary type="html"><![CDATA[For a while now, the Ruby community has become enamored in the latest new hotness, evented programming and Node.js. It&#8217;s gone so far that I&#8217;ve heard a number of prominent Rubyists saying that JavaScript and Node.js are the only sane way to handle a number of concurrent users. I should start by saying that I [...]]]></summary>
		<content type="html" xml:base="http://yehudakatz.com/2010/08/14/threads-in-ruby-enough-already/">&lt;p&gt;For a while now, the Ruby community has become enamored in the latest new hotness, evented programming and Node.js. It&amp;#8217;s gone so far that I&amp;#8217;ve heard a number of prominent Rubyists saying that JavaScript and Node.js are the only sane way to handle a number of concurrent users.&lt;/p&gt;
&lt;p&gt;I should start by saying that I personally love writing evented JavaScript in the browser, and have been giving talks (for years) about using evented JavaScript to sanely organize client-side code. I think that for the browser environment, events are where it&amp;#8217;s at. Further, I don&amp;#8217;t have any major problem with Node.js or other ways of writing server-side evented code. For instance, if I needed to write a chat server, I would almost certainly write it using Node.js or EventMachine.&lt;/p&gt;
&lt;p&gt;However, I&amp;#8217;m pretty tired of hearing that threads (and especially Ruby threads) are completely useless, and if you don&amp;#8217;t use evented code, you may as well be using a single process per concurrent user. To be fair, this has somewhat been the party line of the Rails team years ago, but Rails has been threadsafe since Rails 2.2, and Rails users have been taking advantage of it for some time.&lt;/p&gt;
&lt;p&gt;Before I start, I should be clear that this post is talking about requests that spent a non-tiny amount of their time utilizing the CPU (normal web requests), even if they do spend a fair amount of time in blocking operations (disk IO, database). I am decidedly &lt;strong&gt;not&lt;/strong&gt; talking about situations, like chat servers where requests sit idle for huge amounts of time with tiny amounts of intermittent CPU usage.&lt;/p&gt;
&lt;h2&gt;Threads and IO Blocking&lt;/h2&gt;
&lt;p&gt;I&amp;#8217;ve heard a common misperception that Ruby inherently &amp;#8220;blocks&amp;#8221; when doing disk IO or making database queries. In reality, Ruby switches to another thread whenever it needs to block for IO. In other words, if a thread needs to wait, but isn&amp;#8217;t using any CPU, Ruby&amp;#8217;s built-in methods allow another waiting thread to use the CPU while the original thread waits.&lt;/p&gt;
&lt;p&gt;If every one of your web requests uses the CPU for 30% of the time, and waits for IO for the rest of the time, you should be able to serve three requests in parallel, coming close to maxing out your CPU.&lt;/p&gt;
&lt;p&gt;Here&amp;#8217;s a couple of diagrams. The first shows how people imagine requests work in Ruby, even in threadsafe mode. The second is how an optimal Ruby environment will actually operate. This example is extremely simplified, showing only a few parts of the request, and assuming equal time spent in areas that are not necessarily equal.&lt;/p&gt;
&lt;hr style="border: 1px solid #aaa" /&gt;
&lt;p&gt;&lt;img src="http://yehudakatz.com/wp-content/uploads/2010/08/Untitled.001.png" alt="Untitled.001.png" border="0" width="600" height="338" /&gt;&lt;/p&gt;
&lt;hr style="border: 1px solid #aaa" /&gt;
&lt;p&gt;&lt;img src="http://yehudakatz.com/wp-content/uploads/2010/08/Untitled.002.png" alt="Untitled.002.png" border="0" width="600" height="338" /&gt;&lt;/p&gt;
&lt;hr style="border: 1px solid #aaa" /&gt;
&lt;p&gt;I should be clear that Ruby 1.8 spends too much time context-switching between its green threads. However, if you&amp;#8217;re not switching between threads extremely often, even Ruby 1.8&amp;#8242;s overhead will amount to a small fraction of the total time needed to serve a request. A lot of the threading benchmarks you&amp;#8217;ll see are testing pathological cases involve huge amounts of threads, not very similar to the profile of a web server.&lt;/p&gt;
&lt;p&gt;(if you&amp;#8217;re thinking that there are caveats to my &amp;#8220;optimal Ruby environment&amp;#8221;, keep reading)&lt;/p&gt;
&lt;h2&gt;&amp;#8220;Threads are just HARD&amp;#8221;&lt;/h2&gt;
&lt;p&gt;Another common gripe that pushes people to evented programming is that working with threads is just too hard. Working hard to avoid sharing state and using locks where necessary is just too tricky for the average web developer, the argument goes.&lt;/p&gt;
&lt;p&gt;I agree with this argument in the general case. Web development, on the other hand, has an extremely clean concurrency primitive: the request. In a threadsafe Rails application, the framework manages threads and uses an environment hash (one per request) to store state. When you work inside a Rails controller, you&amp;#8217;re working inside an object that is inherently &lt;strong&gt;unshared&lt;/strong&gt;. When you instantiate a new instance of an ActiveRecord model inside the controller, it is rooted to that controller, and is therefore not used between live threads.&lt;/p&gt;
&lt;p&gt;It is, of course, &lt;strong&gt;possible&lt;/strong&gt; to use global state, but the vast majority of normal, day-to-day Rails programming (and for that matter, programming in any web framework in any language with a request model) is inherently threadsafe. This means that Ruby will transparently handle switching back and forth between active requests when you do something blocking (file, database, or memcache access, for instance), and you don&amp;#8217;t need to personally manage the problems the arise when doing concurrent programming.&lt;/p&gt;
&lt;p&gt;This is significantly less true about applications, like chat servers, that keep open a huge number of requests. In those cases, a lot of the application logic happens &lt;strong&gt;outside&lt;/strong&gt; the individual request, so you need to personally manage shared state.&lt;/p&gt;
&lt;h2&gt;Historical Ruby Issues&lt;/h2&gt;
&lt;p&gt;What I&amp;#8217;ve been talking about so far is how stock Ruby &lt;strong&gt;ought&lt;/strong&gt; to operate. Unfortunately, a group of things have historically conspired to make Ruby&amp;#8217;s concurrency story look much worse than it actually ought to be.&lt;/p&gt;
&lt;p&gt;Most obviously, early versions of Rails were not threadsafe. As a result, all Rails users were operating with a mutex around the entire request, forcing Rails to behave like the first &amp;#8220;Imagined&amp;#8221; diagram above. Annoyingly, Mongrel, the most common Ruby web server for a few years, hardcoded this mutex into its Rails handler. As a result, if you spun up Rails in &amp;#8220;threadsafe&amp;#8221; mode a year ago using Mongrel, you would have gotten exactly zero concurrency. Also, even in threadsafe mode (when not using the built-in Rails support) Mongrel spins up a new thread for every request, not exactly optimal.&lt;/p&gt;
&lt;p&gt;Second, the most common database driver, &lt;code&gt;mysql&lt;/code&gt; is a very poorly behaved C extension. While built-in I/O (file or pipe access) correctly alerts Ruby to switch to another thread when it hits a blocking region, other C extensions don&amp;#8217;t always do so. For safety, Ruby does not allow a context switch while in C code unless the C code explicitly tells the VM that it&amp;#8217;s ok to do so.&lt;/p&gt;
&lt;p&gt;All of the Data Objects drivers, which we built for DataMapper, correctly cause a context switch when entering a blocking area of their C code. The &lt;code&gt;mysqlplus&lt;/code&gt; gem, released in March 2009, was designed to be a drop-in replacement for the &lt;code&gt;mysql&lt;/code&gt; gem, but fix this problem. The new &lt;code&gt;mysql2&lt;/code&gt; gem, written by Brian Lopez, is a drop-in replacement for the old gem, also correctly handles encodings in Ruby 1.9, and is the new default MySQL driver in Rails.&lt;/p&gt;
&lt;p&gt;Because Rails shipped with the (broken) &lt;code&gt;mysql&lt;/code&gt; gem by default, even people running on working web servers (i.e. not mongrel) in threadsafe mode would have seen a large amount of their potential concurrency eaten away because their database driver wasn&amp;#8217;t alerting Ruby that concurrent operation was possible. With &lt;code&gt;mysql2&lt;/code&gt; as the default, people should see real gains on threadsafe Rails applications.&lt;/p&gt;
&lt;p&gt;A lot of people talk about the GIL (global interpreter lock) in Ruby 1.9 as a death knell for concurrency. For the uninitiated, the GIL disallows multiple CPU cores from running Ruby code simultaneously. That &lt;strong&gt;does&lt;/strong&gt; mean that you&amp;#8217;ll need one Ruby process (or thereabouts) per CPU core, but it also means that if your multithreaded code is running correctly, you should need &lt;strong&gt;only one&lt;/strong&gt; process per CPU core. I&amp;#8217;ve heard tales of six or more processes per core. Since it&amp;#8217;s possible to fully utilize a CPU with a single process (even in Ruby 1.8), these applications could get a 4-6x improvement in RAM usage (depending on context-switching overhead) by switching to threadsafe mode and using modern drivers for blocking operations.&lt;/p&gt;
&lt;h2&gt;JRuby, Ruby 1.9 and Rubinius, and the Future&lt;/h2&gt;
&lt;p&gt;Finally, JRuby already runs without a global interpreter lock, allowing your code to run in true parallel, and to fully utilize all available CPUs with a single JRuby process. A future version of Rubinius will likely ship without a GIL (the work has already begun), also opening the door to utilizing all CPUs with a single Ruby process.&lt;/p&gt;
&lt;p&gt;And &lt;strong&gt;all&lt;/strong&gt; modern Ruby VMs that run Rails (Ruby 1.9&amp;#8242;s YARV, Rubinius, and JRuby) use native threads, eliminating the annoying tax that you need to pay for using threads in Ruby 1.8. Again, though, since that tax is small relative to the time for your requests, you&amp;#8217;d likely see a non-trivial improvement in latency in applications that spend time in the database layer.&lt;/p&gt;
&lt;p&gt;To be honest, a big part of the reason for the poor practical concurrency story in Ruby has been that the Rails project didn&amp;#8217;t take it seriously, which it difficult to get traction for efforts to fix a part of the problem (like the &lt;code&gt;mysql&lt;/code&gt; driver). &lt;/p&gt;
&lt;p&gt;We took concurrency very seriously in the Merb project, leading to the development of proper database drivers for DataMapper (Merb&amp;#8217;s ORM), and a top-to-bottom understanding of parts of the stack that could run in parallel (even on Ruby 1.8), but which weren&amp;#8217;t. Rails 3 doesn&amp;#8217;t bring anything new to the threadsafety of Rails itself (Rails 2.3 was threadsafe too), but by making the &lt;code&gt;mysql2&lt;/code&gt; driver the default, we have eliminated a large barrier to Rails applications performing well in threadsafe mode without any additional research.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;UPDATE:&lt;/strong&gt; It&amp;#8217;s worth pointing to Charlie Nutter&amp;#8217;s &lt;a href="http://blog.headius.com/2008/08/qa-what-thread-safe-rails-means.html"&gt;2008 threadsafety post&lt;/a&gt;, where he talked about how he expected threadsafe Rails would impact the landscape. Unfortunately, the blocking MySQL driver held back some of the promise of the improvement for the vast majority of Rails users.&lt;/p&gt;
&lt;script type="text/javascript"&gt;
  addthis_url    = 'http%3A%2F%2Fyehudakatz.com%2F2010%2F08%2F14%2Fthreads-in-ruby-enough-already%2F';
  addthis_title  = 'Threads+%28in+Ruby%29%3A+Enough+Already';
  addthis_pub    = '';
&lt;/script&gt;&lt;script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" &gt;&lt;/script&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/KatzGotYourTongue?a=u8IY8IwSdIs:CseCkugTQkc:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/KatzGotYourTongue?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/KatzGotYourTongue/~4/u8IY8IwSdIs" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://yehudakatz.com/2010/08/14/threads-in-ruby-enough-already/#comments" thr:count="22" />
		<link rel="replies" type="application/atom+xml" href="http://yehudakatz.com/2010/08/14/threads-in-ruby-enough-already/feed/atom/" thr:count="22" />
		<thr:total>22</thr:total>
	<feedburner:origLink>http://yehudakatz.com/2010/08/14/threads-in-ruby-enough-already/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>wycats</name>
					</author>
		<title type="html"><![CDATA[Everyone Who Tried to Convince Me to use Vim was Wrong]]></title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/KatzGotYourTongue/~3/tpUIMvGYy1U/" />
		<id>http://yehudakatz.com/?p=512</id>
		<updated>2010-07-29T06:27:56Z</updated>
		<published>2010-07-29T06:27:53Z</published>
		<category scheme="http://yehudakatz.com" term="TextMate" />		<summary type="html"><![CDATA[A couple weeks ago, I took the plunge and switched to vim (MacVIM, to be precise). It wasn&#8217;t the first time I tried to make the switch, and I had pretty much written it off entirely. Why? Because the past few times I tried switching to vim, I took the advice of a master vim [...]]]></summary>
		<content type="html" xml:base="http://yehudakatz.com/2010/07/29/everyone-who-tried-to-convince-me-to-use-vim-was-wrong/">&lt;p&gt;A couple weeks ago, I took the plunge and switched to vim (MacVIM, to be precise). It wasn&amp;#8217;t the first time I tried to make the switch, and I had pretty much written it off entirely.&lt;/p&gt;
&lt;p&gt;Why? Because the past few times I tried switching to vim, I took the advice of a master vim user, and quickly sunk into the quicksand of trying to learn a new tool. In every prior attempt, I gave vim a few days before I gave up. And every time, I managed to get virtually no work done the entire time, spending about 90 percent of my day fighting with my editor (a more charitable way to put it would be &amp;#8220;learning my editor&amp;#8221;).&lt;/p&gt;
&lt;p&gt;Invariably, the master vim users that were helping me make the switch would encourage me to stick it out. &amp;#8220;If you just give it a few weeks, you&amp;#8217;ll never want to switch back.&amp;#8221;&lt;/p&gt;
&lt;p&gt;The trouble was, I had work to do. I could only switch editors if the new editor did not significantly impede on my day-to-day work. I can already hear the responses: &amp;#8220;That&amp;#8217;s simply impossible. It&amp;#8217;s a new editor designed for advanced users. You&amp;#8217;ll just have to put up with the pain until you get used to it.&amp;#8221;&lt;/p&gt;
&lt;p&gt;Here&amp;#8217;s the thing, though: I didn&amp;#8217;t really have to put up with a huge amount of pain when switching to Textmate for the first time. In fact, it was downright pleasant.&lt;/p&gt;
&lt;p&gt;The last few times someone tried to get me to switch to vim, I issued them a simple challenge. Can you tell me a way to switch that will not significantly reduce my productivity for the first few weeks. It wasn&amp;#8217;t a challenge that was intended to fully shut down discussion. When I really thought about it, Textmate wasn&amp;#8217;t doing all that much for me. It was a glorified Notepad which had working syntax highlighting and understand where to put the cursor when I hit enter (most of the time).&lt;/p&gt;
&lt;p&gt;I don&amp;#8217;t actually use &amp;#8220;snippets&amp;#8221; all that often, or all that many &amp;#8220;commands&amp;#8221;. I don&amp;#8217;t mind the extensibility of Textmate, but I&amp;#8217;m not a hardcore Textmate hacker myself, meaning that I&amp;#8217;m ok with any editor that has the same level of extensibility that Textmate has (namely, all of them).&lt;/p&gt;
&lt;p&gt;Despite what I considered a relatively reasonable request, my challenge was met with disdain and even anger by most of the people I talked to. &amp;#8220;If you feel that way, Vim probably isn&amp;#8217;t for you.&amp;#8221; &amp;#8220;You&amp;#8217;re learning a new EDITOR for God&amp;#8217;s sakes. Of COURSE there&amp;#8217;s going to be a learning curve.&amp;#8221;&lt;/p&gt;
&lt;p&gt;I had written off the entire sorry affair.&lt;/p&gt;
&lt;p&gt;A few weeks ago, Carl told me that he was playing with Vim. His explanation was that he had seen a number of people be really productive with it, and he was curious. Carl is definitely willing to put up with more pain to learn something new than I am, so I issued the same challenge to him.&lt;/p&gt;
&lt;p&gt;Perhaps because he wasn&amp;#8217;t steeped in hardcore vim hacker lore, he didn&amp;#8217;t angrily dismiss the entire premise of my question. Thinking about it a bit more, I realized that most of the people who had tried to get me into vim had suggested that I dive in head first. &amp;#8220;First thing: turn off the arrow keys.&amp;#8221; &amp;#8220;Don&amp;#8217;t use the mouse. Force yourself to use the keyboard.&amp;#8221;&lt;/p&gt;
&lt;p&gt;Carl convinced me to use vim for the first couple of days pretty much exactly as I use Texmate (with the exception of having to switch between normal and insert modes). I installed NERDTree on MacVIM, grabbed the most common vim &amp;#8220;packages&amp;#8221;, and was off to the races. (I should note that I installed topfunky&amp;#8217;s PeepOpen, which definitely helped with a very common workflow that I find it hard to live without).&lt;/p&gt;
&lt;p&gt;For the first day, I clunked around by using my mouse&amp;#8217;s scroll wheel, clicking and highlighting things, and spending most of my time in insert mode. It was slightly less productive than Textmate, but mostly in the range of what I&amp;#8217;d expect switching to a new tool. In short, while I felt a bit out of sorts, I was able to get plenty of work done that first day.&lt;/p&gt;
&lt;p&gt;As the days went on, I learned a few commands here and there. The first big one for me was &lt;code&gt;ci&lt;/code&gt; as in &lt;code&gt;ci "&lt;/code&gt; (it means: replace what&amp;#8217;s inside the next set of &lt;code&gt;"&lt;/code&gt; and go into insert mode). This singlehandedly made up for most of the productivity losses I was feeling from learning a new tool. Throw in &lt;code&gt;o&lt;/code&gt;, &lt;code&gt;O&lt;/code&gt;, &lt;code&gt;A&lt;/code&gt;, &lt;code&gt;:N&lt;/code&gt; and &lt;code&gt;/search&lt;/code&gt; and I was already quite a bit more productive than I had been in Textmate.&lt;/p&gt;
&lt;p&gt;Sure, I&amp;#8217;m still plodding around in some cases, but only a handful of days later, using Textmate for anything feels clunky (most commonly, I try to use &lt;code&gt;o&lt;/code&gt; or &lt;code&gt;O&lt;/code&gt; to insert a new line above or below the one I&amp;#8217;m currently on).&lt;/p&gt;
&lt;p&gt;I was able to get here because I used my mouse wheel and button, arrow keys, &lt;code&gt;apple-f&lt;/code&gt; to find text, &lt;code&gt;apple-s&lt;/code&gt; to save files, and a whole slew of other common idioms, instead of grinding to a halt and trying to switch all of my practices at once.&lt;/p&gt;
&lt;p&gt;To those who would say &amp;#8220;that&amp;#8217;s obvious; of &lt;strong&gt;course&lt;/strong&gt; you learn vim incrementally&amp;#8221;, I would simply say that having spoken to a number of vim users in the past, I never got that advice. Instead, I got a lot of advice about turning off my arrow keys, disallowing the use of the mouse, and learning the (MORE EFFICIENT!!!) vim ways to do &lt;strong&gt;everything&lt;/strong&gt;, all at once. People just couldn&amp;#8217;t stomach the idea of me continuing to use an outmoded practice (like &lt;code&gt;apple-f&lt;/code&gt;) when vim had much better tools available just a (huge volume of) memorization away.&lt;/p&gt;
&lt;p&gt;To those who are considering using vim, my recommendation is to use MacVIM, NERDTree, PeepOpen (or &lt;a href="http://github.com/wincent/Command-T"&gt;command-t&lt;/a&gt;), and use the mouse, arrow keys, and familiar OSX&amp;#8217;isms all you want. Very quickly, it will become obvious that there&amp;#8217;s a better way to do all kinds of things, and you can pile on the newly found efficiency once you&amp;#8217;ve successfully made the switch without losing the ability to do work in the short-run.&lt;/p&gt;
&lt;script type="text/javascript"&gt;
  addthis_url    = 'http%3A%2F%2Fyehudakatz.com%2F2010%2F07%2F29%2Feveryone-who-tried-to-convince-me-to-use-vim-was-wrong%2F';
  addthis_title  = 'Everyone+Who+Tried+to+Convince+Me+to+use+Vim+was+Wrong';
  addthis_pub    = '';
&lt;/script&gt;&lt;script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" &gt;&lt;/script&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/KatzGotYourTongue?a=tpUIMvGYy1U:IMmHcdeEtq8:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/KatzGotYourTongue?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/KatzGotYourTongue/~4/tpUIMvGYy1U" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://yehudakatz.com/2010/07/29/everyone-who-tried-to-convince-me-to-use-vim-was-wrong/#comments" thr:count="169" />
		<link rel="replies" type="application/atom+xml" href="http://yehudakatz.com/2010/07/29/everyone-who-tried-to-convince-me-to-use-vim-was-wrong/feed/atom/" thr:count="169" />
		<thr:total>169</thr:total>
	<feedburner:origLink>http://yehudakatz.com/2010/07/29/everyone-who-tried-to-convince-me-to-use-vim-was-wrong/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>wycats</name>
					</author>
		<title type="html"><![CDATA[What&#8217;s New in Bundler 1.0.0.rc.1]]></title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/KatzGotYourTongue/~3/RNr2rbM2o3M/" />
		<id>http://yehudakatz.com/?p=506</id>
		<updated>2010-07-26T19:22:17Z</updated>
		<published>2010-07-26T19:19:10Z</published>
		<category scheme="http://yehudakatz.com" term="Rails 3" /><category scheme="http://yehudakatz.com" term="Ruby" />		<summary type="html"><![CDATA[Taking into consideration the huge amount of feedback we received during the Bundler 0.9 series, we streamlined Bundler 1.0 significantly, and made it fit user expectations better. Whether you have used bundler before or not, the easiest way to get up to speed is to read the following notes and go to http://gembundler.com/v1.0 for more [...]]]></summary>
		<content type="html" xml:base="http://yehudakatz.com/2010/07/26/whats-new-in-bundler-1-0-0-rc-1/">&lt;p&gt;Taking into consideration the huge amount of feedback we received during the Bundler 0.9 series, we streamlined Bundler 1.0 significantly, and made it fit user expectations better.&lt;/p&gt;
&lt;p&gt;Whether you have used bundler before or not, the easiest way to get up to speed is to read the following notes and go to http://gembundler.com/v1.0 for more in-depth information.&lt;/p&gt;
&lt;p&gt;(note that gembundler.com is still being updated for the 1.0 changes, and should be ready for the final release).&lt;/p&gt;
&lt;h2&gt;Starting a new project with bundler&lt;/h2&gt;
&lt;p&gt;When you generate a new Rails application, Rails will create a &lt;code&gt;Gemfile&lt;/code&gt; for you, which has everything needed to boot your application.&lt;/p&gt;
&lt;p&gt;Otherwise, you can use &lt;code&gt;bundle init&lt;/code&gt; to create a stub &lt;code&gt;Gemfile&lt;/code&gt;, ready to go.&lt;/p&gt;
&lt;p&gt;First, run &lt;code&gt;bundle install&lt;/code&gt; to make sure that you have all the needed dependencies. If you already do, this process will happen instantaneously.&lt;/p&gt;
&lt;p&gt;Bundler will automatically create a file called &lt;code&gt;Gemfile.lock&lt;/code&gt;. This file is a snapshot of your application&amp;#8217;s dependencies at that time.&lt;/p&gt;
&lt;p&gt;You &lt;strong&gt;SHOULD&lt;/strong&gt; check both files into version control. This will ensure that all team members (as well as your production server) are working with identical dependencies.&lt;/p&gt;
&lt;h2&gt;Checking out an existing project using bundler&lt;/h2&gt;
&lt;p&gt;After checking out an existing project using bundler, check to make sure that the &lt;code&gt;Gemfile.lock&lt;/code&gt; snapshot is checked in. If it is not, you may end up using different dependencies than the person who last used and tested the project.&lt;/p&gt;
&lt;p&gt;Next, run &lt;code&gt;bundle install&lt;/code&gt;. This command will check whether you already have all the required dependencies in your system. If you do not, it will fetch the dependencies and install them.&lt;/p&gt;
&lt;h2&gt;Updating dependencies&lt;/h2&gt;
&lt;p&gt;If you modify the dependencies in your &lt;code&gt;Gemfile&lt;/code&gt;, first try to run &lt;code&gt;bundle install&lt;/code&gt;, as usual. Bundler will attempt to update only the gems you have modified, leaving the rest of the snapshot intact.&lt;/p&gt;
&lt;p&gt;This may not be possible, if the changes conflict with other gems in the snapshot (or their dependencies). If this happens, Bundler will instruct you to run &lt;code&gt;bundle update&lt;/code&gt;. This will re-resolve all dependencies from scratch.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;bundle update&lt;/code&gt; command will update the versions of all gems in your &lt;code&gt;Gemfile&lt;/code&gt;, while &lt;code&gt;bundle install&lt;/code&gt; will only update the gems that have changed since the last &lt;code&gt;bundle install&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;After modifying dependencies, make sure to check in your &lt;code&gt;Gemfile&lt;/code&gt;  and &lt;code&gt;Gemfile.lock&lt;/code&gt; into version control.&lt;/p&gt;
&lt;h2&gt;By default, gems are installed to your system&lt;/h2&gt;
&lt;p&gt;If you follow the instructions above, Bundler will install the gems into the same place as &lt;code&gt;gem install&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If necessary, Bundler will prompt you for your &lt;code&gt;sudo&lt;/code&gt; password.&lt;/p&gt;
&lt;p&gt;You can see the location of a particular gem with &lt;code&gt;bundle show [GEM_NAME]&lt;/code&gt;. You can open it in your default editor with &lt;code&gt;bundle open [GEM_NAME]&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Bundler will still isolate your application from other gems. Installing your gems into a shared location allows multiple projects to avoid downloading the same gem over and over.&lt;/p&gt;
&lt;p&gt;You might want to install your bundled gems to a different location, such as a directory in the application itself. This will ensure that each application has its own copies of the gems, and provides an extra level of isolation.&lt;/p&gt;
&lt;p&gt;To do this, run the install command with &lt;code&gt;bundle install /path/to/location&lt;/code&gt;. You can use a relative path as well: &lt;code&gt;bundle install vendor&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In RC1, this command will use gems from the system, if they are already there (it only affects &lt;em&gt;new&lt;/em&gt; gems). To ensure that all of your gems are located in the path you specified, run &lt;code&gt;bundle install path --disable-shared-gems&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In Bundler 1.0 final, &lt;code&gt;bundle install path&lt;/code&gt; will default to &lt;code&gt;--disable-shared-gems&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Deployment&lt;/h2&gt;
&lt;p&gt;When deploying, we strongly recommend that you isolate your gems into a local path (using &lt;code&gt;bundle install path --disable-shared-gems&lt;/code&gt;). The final version of bundler will come with a &lt;code&gt;--production&lt;/code&gt; flag, encapsulating all of the best deployment practices.&lt;/p&gt;
&lt;p&gt;For now, please follow the following recommendations (described using Capistrano concepts):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Make sure to always check in a &lt;code&gt;Gemfile.lock&lt;/code&gt; that is up to date. This means that after modifying your &lt;code&gt;Gemfile&lt;/code&gt;, you should &lt;em&gt;ALWAYS&lt;/em&gt; run &lt;code&gt;bundle install&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Symlink the vendor/bundle directory into the application&amp;#8217;s shared location (symlink release_path/current/vendor/bundle to release_path/shared/bundled_gems)&lt;/li&gt;
&lt;li&gt;Install your bundle by running &lt;code&gt;bundle install vendor/bundle --disable-shared-gems&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;script type="text/javascript"&gt;
  addthis_url    = 'http%3A%2F%2Fyehudakatz.com%2F2010%2F07%2F26%2Fwhats-new-in-bundler-1-0-0-rc-1%2F';
  addthis_title  = 'What%26%238217%3Bs+New+in+Bundler+1.0.0.rc.1';
  addthis_pub    = '';
&lt;/script&gt;&lt;script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" &gt;&lt;/script&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/KatzGotYourTongue?a=RNr2rbM2o3M:y3iM3qyvNEI:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/KatzGotYourTongue?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/KatzGotYourTongue/~4/RNr2rbM2o3M" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://yehudakatz.com/2010/07/26/whats-new-in-bundler-1-0-0-rc-1/#comments" thr:count="13" />
		<link rel="replies" type="application/atom+xml" href="http://yehudakatz.com/2010/07/26/whats-new-in-bundler-1-0-0-rc-1/feed/atom/" thr:count="13" />
		<thr:total>13</thr:total>
	<feedburner:origLink>http://yehudakatz.com/2010/07/26/whats-new-in-bundler-1-0-0-rc-1/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>wycats</name>
					</author>
		<title type="html"><![CDATA[Encodings, Unabridged]]></title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/KatzGotYourTongue/~3/UD1uh4wQFOM/" />
		<id>http://yehudakatz.com/?p=492</id>
		<updated>2010-05-20T11:29:04Z</updated>
		<published>2010-05-17T07:57:39Z</published>
		<category scheme="http://yehudakatz.com" term="Ruby" />		<summary type="html"><![CDATA[I wrote somewhat extensively about the problem of encodings in Ruby 1.9 in general last week. For those who didn&#8217;t read that post, let me start with a quick refresher. What&#8217;s an Encoding? An encoding specifies how to take a list of characters (such as &#8220;hello&#8221;) and persist them onto disk as a sequence of [...]]]></summary>
		<content type="html" xml:base="http://yehudakatz.com/2010/05/17/encodings-unabridged/">&lt;p&gt;I wrote somewhat extensively about the problem of encodings in Ruby 1.9 in general last week.&lt;/p&gt;
&lt;p&gt;For those who didn&amp;#8217;t read that post, let me start with a quick refresher.&lt;/p&gt;
&lt;h2&gt;What&amp;#8217;s an Encoding?&lt;/h2&gt;
&lt;p&gt;An encoding specifies how to take a list of characters (such as &amp;#8220;hello&amp;#8221;) and persist them onto disk as a sequence of bytes. You&amp;#8217;re probably familiar with the ASCII encoding, which specifies how to store English characters in a single byte each (taking up the space in 0-127, leaving 128-255 empty).&lt;/p&gt;
&lt;p&gt;Another common encoding is ISO-8859-1 (or Latin-1), which uses ASCII&amp;#8217;s designation for the first 127 characters, and designates the numbers 128-255 for Latin characters (such as &amp;#8220;é&amp;#8221; or &amp;#8220;ü&amp;#8221;).&lt;/p&gt;
&lt;p&gt;Obviously, 255 characters isn&amp;#8217;t enough for all languages, so there are a number of ISO-8859-* encodings which each designate numbers 128-255 for their own purposes (for instance, ISO-8859-5 uses that space for Russian characters).&lt;/p&gt;
&lt;p&gt;Unfortunately, the raw bytes themselves do not contain an &amp;#8220;encoding specifier&amp;#8221; or any kind, and the exact same bytes can either mean something in Western characters, Russian, Japanese, or any other language, depending on the character set that was originally used to store off the characters as bytes.&lt;/p&gt;
&lt;p&gt;As a general rule, protocols (such as HTTP), provide a mechanism for specifying the encoding. For instance, in HTTP, you can specify the encoding in the &lt;code&gt;Content-Type&lt;/code&gt; header, like this: &lt;code&gt;Content-Type: text/html; charset=UTF-8&lt;/code&gt;. However, this is not a requirement, so it is possible to receive some content over HTTP and not know its encoding.&lt;/p&gt;
&lt;p&gt;This brings us to an important point: Strings have no &lt;strong&gt;inherent&lt;/strong&gt; encoding. By default, Strings are just &lt;code&gt;BINARY&lt;/code&gt; data. Since the data&lt;br /&gt;
could be encoded using any number of different incompatible encodings, simply combining BINARY data from different sources could easily result in a corrupted String.&lt;/p&gt;
&lt;p&gt;When you see a diamond with a question mark inside on the web, or gibberish characters (like a weird A with a 3/4 symbol), you&amp;#8217;re seeing a mistaken attempt to combine binary data encoded differently into a single String.&lt;/p&gt;
&lt;h2&gt;What&amp;#8217;s Unicode&lt;/h2&gt;
&lt;p&gt;Unicode is an effort to map every known character (to a point) to a number. Unicode does not define an encoding (how to represent those numbers in bytes). It simply provides a unique number for each known character.&lt;/p&gt;
&lt;p&gt;Unicode tries to unify characters from different encodings that represent the same character. For instance, the &lt;code&gt;A&lt;/code&gt; in ASCII, the &lt;code&gt;A&lt;/code&gt; in&lt;br /&gt;
ISO-8859-1, and the &lt;code&gt;A&lt;/code&gt; in the Japanese encoding SHIFT-JIS all map to the same Unicode character.&lt;/p&gt;
&lt;p&gt;Unicode also takes pains to ensure round-tripping between existing encodings and Unicode. Theoretically, this should mean that it&amp;#8217;s possible to take some data&lt;br /&gt;
encoded using any known encoding, use Unicode tables to map the characters to Unicode numbers, and then use the reverse versions of those tables to map the Unicode numbers back into the original encoding.&lt;/p&gt;
&lt;p&gt;Unfortunately, both of these characteristics cause some problems for Asian character sets. First, there have been some historical errors in the process of&lt;br /&gt;
unification, which requires the Unicode committee to properly identify which characters in different existing Chinese, Japanese and Korean (CJK) character sets actually represent the same character.&lt;/p&gt;
&lt;p&gt;In Japanese, personal names use slight variants of the non-personal-name version of the same character. This would be equivalent to the difference (in English) between &amp;#8220;Cate&amp;#8221; and &amp;#8220;Kate&amp;#8221;. Many of these characters (sometimes called Gaiji) cannot be represented in Unicode at all.&lt;/p&gt;
&lt;p&gt;Second, there are still hundreds of characters in some Japanese&lt;br /&gt;
encodings (such as the Microsoft encoding to SHIFT-JIS called CP932 or Windows-31J) that &lt;a href="http://support.microsoft.com/kb/170559"&gt;simply do not round-trip through Unicode&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;To make matters worse, &lt;a href="http://www.w3.org/Submission/japanese-xml/#sjis"&gt;Java and MySQL use a different mapping table&lt;/a&gt; than the standard Unicode mapping tables (making &amp;#8220;This costs ¥5&amp;#8243; come out in Unicode as &amp;#8220;This costs \5&amp;#8243;). The standard Unicode mapping tables handle this particular case correctly (but cannot fully solve the round-tripping problem), but these quirks only serve to further raise doubts about Unicode in the minds of Japanese developers.&lt;/p&gt;
&lt;p&gt;For a lot more information on these issues, check out the &lt;a href="http://www.w3.org/Submission/japanese-xml"&gt;XML Japanese Profile&lt;/a&gt; document created by the W3C to explain how to deal with some of these problems in XML documents.&lt;/p&gt;
&lt;p&gt;In the Western world, all encodings in use do not have these problems. For instance, it is trivial to take a String encoded as ISO-8859-1, convert it into&lt;br /&gt;
Unicode, and then convert it back into ISO-8859-1 when needed.&lt;/p&gt;
&lt;p&gt;This means that for most of the Western world, it is a good idea to&lt;br /&gt;
use Unicode as the &amp;#8220;one true character set&amp;#8221; inside programming languages. This means that programmers can treat Strings as simple sequences of Unicode code points (several&lt;br /&gt;
code points may add up to a single character, such as the ¨ code point, which can be&lt;br /&gt;
applied to other code points to form characters like ü). In the Asian world, while this can sometimes be a good strategy, it is often significantly simpler to use the original encoding and handle merging Strings in different encodings together manually (when an appropriate decision about the tradeoffs around fidelity can be made).&lt;/p&gt;
&lt;p&gt;Before I continue, I would note that the above is a vast simplification of the&lt;br /&gt;
issues surrounding Unicode and Japanese. I believe it to be a fair characterization,&lt;br /&gt;
but die-hard Unicode folks, and die-hard anti-Unicode folks would possibly disagree&lt;br /&gt;
with some elements of it. If I have made any factual errors, please let me know.&lt;/p&gt;
&lt;h2&gt;A Digression: UTF-*&lt;/h2&gt;
&lt;p&gt;Until now, I have talked only about &amp;#8220;Unicode&amp;#8221;, which simply maps code points&lt;br /&gt;
to numbers. Because Unicode uses counting numbers, it can accommodate as many&lt;br /&gt;
code points as it wants.&lt;/p&gt;
&lt;p&gt;However, it is not an encoding. In other words, it does not specify how to&lt;br /&gt;
store the numbers on disk. The most obvious solution would be to use a few&lt;br /&gt;
bytes for each character. This is the solution that UTF-32 uses, specifying&lt;br /&gt;
that each Unicode character be stored as four bytes (accommodating over 4 billion characters). While this has the advantage of being simple, it also uses huge amounts of memory and disk space compared to the original encodings (like ASCII, ISO-8859-1 and SHIFT-JIS) that it is replacing.&lt;/p&gt;
&lt;p&gt;On the other side of the spectrum is UTF-8. UTF-8 uses a single byte for English characters, using the exact same mapping as ASCII. This means that a UTF-8 string that contains only characters found in ASCII will have the identical bytes as a String stored in ASCII.&lt;/p&gt;
&lt;p&gt;It then uses the high bit (the bytes representing 128-255) to specify a series of escape characters that can specify a multibyte character. This means that Strings using Western characters use relatively few bytes (often comparable with the original encodings Unicode replaces), because they are in the low area of the Unicode space, while the large number of characters in the Asian languages use more bytes than their native encodings, because they use characters with larger Unicode numbers.&lt;/p&gt;
&lt;p&gt;This is another reason some Asian developers resent Unicode; while it does not significantly increase the memory requirements for most Western documents, it does so for Asian documents.&lt;/p&gt;
&lt;p&gt;For the curious, UTF-16 uses 16-bits for the most common characters (the BMP, or basic multilingual plane), and 32-bits to represent characters from planes 1 through 16. This means that UTF-8 is most efficient for Strings containing mostly ASCII characters. UTF-8 and UTF-16 are approximately equivalent for Strings containing mostly characters outside ASCII but inside the the BMP. For Strings containing mostly characters outside the BMP, UTF-8, UTF-16, and UTF-32 are approximately equivalent. Note that when I say &amp;#8220;approximately equivalent&amp;#8221;, I&amp;#8217;m not saying that they&amp;#8217;re exactly the same, just that the differences are small in large Strings.&lt;/p&gt;
&lt;p&gt;Of the Unicode encodings, only UTF-8 is compatible with ASCII. By this I mean that if a String is valid ASCII, it is also valid UTF-8. UTF-16 and UTF-32 encode ASCII characters using two or four bytes.&lt;/p&gt;
&lt;h2&gt;What Ruby 1.9 Does&lt;/h2&gt;
&lt;p&gt;Accepting that there are two very different ways of handling this problem, Ruby 1.9 has a String API that is somewhat different from most other languages, mostly influenced by the issues I described above in dealing with Japanese in Unicode.&lt;/p&gt;
&lt;p&gt;First, Ruby does not mandate that all Strings be stored in a single internal encoding. Unfortunately, this is not possible to do reliably with common Japanese encodings (CP932, aka Windows-31J has 300 characters than cannot round-trip through Unicode without corrupting data). It is possible that the Unicode committee will some day fully solve these problems to everyone&amp;#8217;s satisfaction, but that day has not yet come.&lt;/p&gt;
&lt;p&gt;Instead, Ruby 1.9 stores Strings as the original sequence of bytes, but allows a String to be tagged with its encoding. It then provides a rich API for converting Strings from one encoding to another.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#CC0066; font-weight:bold;"&gt;string&lt;/span&gt; = &lt;span style="color:#996600;"&gt;&amp;quot;hello&amp;quot;&lt;/span&gt;                     &lt;span style="color:#008000; font-style:italic;"&gt;# by default, string is encoded as &amp;quot;ASCII&amp;quot;&lt;/span&gt;
&lt;span style="color:#CC0066; font-weight:bold;"&gt;string&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;force_encoding&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;&amp;quot;ISO-8859-1&amp;quot;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;  &lt;span style="color:#008000; font-style:italic;"&gt;# this simply retags the String as ISO-8859-1&lt;/span&gt;
                                     &lt;span style="color:#008000; font-style:italic;"&gt;# this will work since ISO-8859-1&lt;/span&gt;
                                     &lt;span style="color:#008000; font-style:italic;"&gt;# is a superset of ASCII.&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#CC0066; font-weight:bold;"&gt;string&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;encode&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;&amp;quot;UTF-8&amp;quot;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;               &lt;span style="color:#008000; font-style:italic;"&gt;# this will ask Ruby to convert the bytes in&lt;/span&gt;
                                     &lt;span style="color:#008000; font-style:italic;"&gt;# the current encoding to bytes in&lt;/span&gt;
                                     &lt;span style="color:#008000; font-style:italic;"&gt;# the target encoding, and retag it with the&lt;/span&gt;
                                     &lt;span style="color:#008000; font-style:italic;"&gt;# new encoding&lt;/span&gt;
                                     &lt;span style="color:#008000; font-style:italic;"&gt;#&lt;/span&gt;
                                     &lt;span style="color:#008000; font-style:italic;"&gt;# this is usually a lossless conversion, but&lt;/span&gt;
                                     &lt;span style="color:#008000; font-style:italic;"&gt;# can sometimes be lossy&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;A more advanced example:&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#008000; font-style:italic;"&gt;# encoding: UTF-8&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#008000; font-style:italic;"&gt;# first, tell Ruby that our editor saved the file using the UTF-8 encoding.&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# TextMate does this by default. If you lie to Ruby, very strange things&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# will happen&lt;/span&gt;
&amp;nbsp;
utf8 = &lt;span style="color:#996600;"&gt;&amp;quot;hellö&amp;quot;&lt;/span&gt;
iso_8859_1 = &lt;span style="color:#996600;"&gt;&amp;quot;hellö&amp;quot;&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;encode&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;&amp;quot;ISO-8859-1&amp;quot;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#008000; font-style:italic;"&gt;# Because we specified an encoding for this file, Strings in here default&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# to UTF-8 rather than ASCII. Note that if you didn't specify an encoding&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# characters outside of ASCII will be rejected by the parser.&lt;/span&gt;
&amp;nbsp;
utf8 &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&amp;lt;&lt;/span&gt; iso_8859_1
&amp;nbsp;
&lt;span style="color:#008000; font-style:italic;"&gt;# This produces an error, because Ruby does not automatically try to&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# transcode Strings from one encoding into another. In practice, this&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# should rarely, if ever happen in applications that can rely on&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# Unicode; you'll see why shortly&lt;/span&gt;
&amp;nbsp;
utf8 &lt;span style="color:#006600; font-weight:bold;"&gt;&amp;lt;&amp;lt;&lt;/span&gt; iso_8859_1.&lt;span style="color:#9900CC;"&gt;encode&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;&amp;quot;UTF-8&amp;quot;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#008000; font-style:italic;"&gt;# This works fine, because you first made the two encodings the same&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2&gt;The problems people are really having&lt;/h2&gt;
&lt;p&gt;The problem of dealing with ISO-8859-1 encoded text and UTF-8 text in the same Ruby is real, and we&amp;#8217;ll see soon how it is handled in Ruby. However, the problems people have been having are not of this variety.&lt;/p&gt;
&lt;p&gt;If you examine virtually all of the bug reports involving incompatible encoding exceptions, you will find that one of the two encodings is ASCII-8BIT. In Ruby, ASCII-8BIT is the name of the BINARY encoding.&lt;/p&gt;
&lt;p&gt;So what is happening is that a library somewhere in the stack is handing back raw bytes rather than encoded bytes. For a long time, the likely perpetrator here was database drivers, which had not been updated to properly encode the data they were getting back from the database.&lt;/p&gt;
&lt;p&gt;There are several other potential sources of binary data, which we will discuss in due course. However, it&amp;#8217;s important to note that a BINARY encoded String in Ruby 1.9 is the equivalent of a &lt;code&gt;byte[]&lt;/code&gt; in Java. It is a type that cannot be reasonably concatenated onto an encoded String. In fact, it is best to think of BINARY encoded Strings as a different class with many methods in common.&lt;/p&gt;
&lt;p&gt;In practice, as Ruby libraries continue to be updated, you should rarely ever see BINARY data inside of your application. If you do, it is because the library that handed it to you genuinely does not know the encoding, and if you want to combine it with non-BINARY String, you will need to convert it into an encoded String manually (using &lt;code&gt;force_encoding&lt;/code&gt;).&lt;/p&gt;
&lt;h2&gt;Why this is, in practice, a rare problem&lt;/h2&gt;
&lt;p&gt;The problem of incompatible encodings is likely to happen in Western applications only when combining ISO-8859-* data with Unicode data.&lt;/p&gt;
&lt;p&gt;In practice, most sources of data, without any further work, are already encoded as UTF-8. For instance, the default Rails MySQL connection specifies a UTF-8 client encoding, so even an ISO-8859-1 database will return UTF-8 data.&lt;/p&gt;
&lt;p&gt;Many other data sources, such as MongoDB, only support UTF-8 data internally, so their Ruby 1.9-compatible drivers already return UTF-8 encoded data.&lt;/p&gt;
&lt;p&gt;Your text editor (TextMate) likely defaults to saving your templates as UTF-8, so the characters in the templates are already encoded in UTF-8.&lt;/p&gt;
&lt;p&gt;This is why Ruby 1.8 had the illusion of working. With the exception of some (unfortunately somewhat common) edge-cases, most of your data is already encoded in UTF-8, so simply treating it as BINARY data, and smashing it all together (as Ruby 1.8 does) works fairly reliably.&lt;/p&gt;
&lt;p&gt;The only reason why this came tumbling down in Ruby 1.9 is that drivers that should have returned Strings tagged with UTF-8 were returning Strings tagged with BINARY, which Ruby rightly refused to concatenate with UTF-8 Strings. In other words, the vast majority of encoding problems to date are the result of buggy Ruby libraries.&lt;/p&gt;
&lt;p&gt;Those libraries, almost entirely, have now been updated. This means that if you use UTF-8 data sources, which you were likely doing by accident already, everything will continue to work as it did in Ruby 1.8.&lt;/p&gt;
&lt;h2&gt;Digression: force_encoding&lt;/h2&gt;
&lt;p&gt;When people encounter this problem for the first time, they are often instructed by otherwise well-meaning people to simply call &lt;code&gt;force_encoding("UTF-8")&lt;/code&gt; on the offending String.&lt;/p&gt;
&lt;p&gt;This will work reliably if the original data is stored in UTF-8, which is often true about the person who made the original suggestion. However, it will mysteriously fail to work (resulting in &amp;#8220;�&amp;#8221; characters appearing) if the original data is encoded in ISO-8859-1. This can cause major confusion because some people swear up and down that it&amp;#8217;s working and others can clearly see that it&amp;#8217;s not.&lt;/p&gt;
&lt;p&gt;Additionally, since ISO-8859-1 and UTF-8 are both compatible with ASCII, if the characters being force_encoded are ASCII characters, everything will appear to work until a non-ASCII character is entered one day. This further complicates efforts of members of the community to identify and help resolve issues if they are not fluent in the general issues surrounding encodings.&lt;/p&gt;
&lt;p&gt;I&amp;#8217;d note that this particular issue (BINARY data entering the system that is actually ISO-8859-1) would cause similar problems in Java and Python, which would either silently assume Unicode, or present a &lt;code&gt;byte[]&lt;/code&gt;, forcing you to &lt;code&gt;force_encoding&lt;/code&gt; it into something like UTF-8.&lt;/p&gt;
&lt;h2&gt;Where it doesn&amp;#8217;t work&lt;/h2&gt;
&lt;p&gt;Unfortunately, there are a few sources of data that are common in Rails applications that are not already encoded in UTF-8.&lt;/p&gt;
&lt;p&gt;In order to identify these cases, we will need to identify the boundary between a Rails application and the outside world. Let&amp;#8217;s look at a common web request.&lt;/p&gt;
&lt;p&gt;First, the user goes to a URL. That URL is probably encoded in ASCII, but can also contain Unicode characters. The encoding for this part of the request (the URI) is not provided by the browser, but it appears safe to assume that it&amp;#8217;s UTF-8 (which is a superset of ASCII). I have tested in various versions of Firefox, Safari, Chrome, and Internet Explorer and it seems reliable. I personally thank the Encoding  gods for that.&lt;/p&gt;
&lt;p&gt;Next, the request goes through the Rack stack, and makes its way into the Rails application. If all has gone well, the Rails application will see the parameters and other bits of the URI exposed through the request object encoded as UTF-8. At the moment (and after this post, it will probably be true for mere days), Rack actually returns BINARY Strings for these elements.&lt;/p&gt;
&lt;p&gt;At the moment, Ruby allows BINARY Strings that contain only ASCII characters to be concatenated with any ASCII-compatible encoding (such as UTF-8). I believe this is a mistake, because it will make scenarios such as the current state of Rack work in all tested cases, and then mysteriously cause errors when the user enters a UTF-8 character in the URI. I have already reported this issue and it should be fixed in Ruby. Thankfully, this issue only relates to libraries that are mistakenly returning&lt;br /&gt;
BINARY data, so we can cut this off at the pass by fixing Rack to return UTF-8 data here.&lt;/p&gt;
&lt;p&gt;Next, that data will be used to make a request of the data store. Because we are likely using a UTF-8 encoded data-store, once the Rack issue is resolved, the request will go through without incident. If we were using an ISO-8859-1 data store (possible, but unlikely), this could pose issues. For instance, we could be looking up a story by a non-ASCII identifier that the database would not find because the request String is encoded in UTF-8.&lt;/p&gt;
&lt;p&gt;Next, the data store returns the contents. Again, you are likely using using a UTF-8 data store (things like CouchDB and MongoDB return Strings as UTF-8). Your template is likely encoded in UTF-8 (and Rails actually makes the assumption that templates without any encoding specified are UTF-8), so the String from your database should merge with your template without incident.&lt;/p&gt;
&lt;p&gt;However, there is another potential problem here. If your data source does not return UTF-8 data, Ruby will refuse to concatenate the Strings, giving you an incompatible encoding error (which will report UTF-8 as incompatible with, for instance, ISO-8859-1). In all of the encoding-related bug reports I&amp;#8217;ve seen, I&amp;#8217;ve only ever seen reports of BINARY data causing this problem, again, likely because your data source actually is UTF-8.&lt;/p&gt;
&lt;p&gt;Next, you send the data back to the browser. Rails defaults to specifying a UTF-8 character set, so the browser should correctly interpret the String, if it got this far. Note that in Ruby 1.8, if you had received data as ISO-8859-1 and stored it in an ISO-8859-1 database, your users would now see &amp;#8220;�&amp;#8221;, because the browser cannot identify a valid Unicode character for the bytes that came back from the database.&lt;/p&gt;
&lt;p&gt;In Ruby 1.9, this scenario (but not the much more common scenario where the database returns content as UTF-8, which is common because Rails specifies a UTF-8 client encoding in the default database.yml), you would receive an error rather than sending corrupted data to the client.&lt;/p&gt;
&lt;p&gt;If your page included a form, we now have another potential avenue for problems. This is especially insidious because browsers allow the user to change the &amp;#8220;document&amp;#8217;s character set&amp;#8221;, and users routinely fiddle with that setting to &amp;#8220;fix&amp;#8221; pages that are actually encoded in ISO-8859-1, but are specifying UTF-8 as the character set.&lt;/p&gt;
&lt;p&gt;Unfortunately, while browsers generally use the document&amp;#8217;s character set for POSTed form data, this is both not reliable and possible for the user to manually change. To add insult to injury, the browsers with the largest problems in this area do not send a Content-Type header with the correct charset to let the server know the character set of the POSTed data.&lt;/p&gt;
&lt;p&gt;Newer standards specify an attribute &lt;code&gt;accept-charset&lt;/code&gt; that page authors can add to forms to tell the client what character set to send the POSTed data as, but again, the browsers with the largest issues here are also the ones with issues in implementing &lt;code&gt;accept-charset&lt;/code&gt; properly.&lt;/p&gt;
&lt;p&gt;The most common scenario where you can see this issue is when the user pastes in content from Microsoft Word, and it makes it into the database and back out again as gibberish.&lt;/p&gt;
&lt;p&gt;After a lot of research, I have discovered several hacks that, together, should completely solve this problem. I am still testing the solution, but I believe we should be able to completely solve this problem in Rails. By Rails 3.0 final, Rails application should be able to reliably assume that POSTed form data comes in as UTF-8.&lt;/p&gt;
&lt;p&gt;Moving that data to the server presents another potential encoding problem, but again, if we can rely on the database to be using UTF-8 as the client (or internal) encoding, and the solution for POSTed form data pans out, the data should smoothly get into the database as UTF-8.&lt;/p&gt;
&lt;h2&gt;But what if we still do have non-UTF-8 data&lt;/h2&gt;
&lt;p&gt;Even with all of this, it is still possible that some non-BINARY data sneaks over the boundary and into our Rails application from a non-UTF-8 source.&lt;/p&gt;
&lt;p&gt;For this scenario, Ruby 1.9 provides an option called &lt;code&gt;Encoding.default_internal&lt;/code&gt;, which allows the user to specify an preferred encoding for Strings. Ruby itself and Ruby&amp;#8217;s standard libraries respect this option, so even if, for instance, it opens some IO encoded in ISO-8859-1, it will give the data to the Ruby program transcoded to the preferred encoding.&lt;/p&gt;
&lt;p&gt;Libraries, such as database drivers, should also support this option, which means that even if the database is somehow set up to receive UTF-8 String, the driver should convert those String transparently to the preferred encoding before handing it to the program.&lt;/p&gt;
&lt;p&gt;Rails can take advantage of this by setting the &lt;code&gt;default_internal&lt;/code&gt; to UTF-8, which will then ensure that String from non-UTF-8 sources still make their way into Rails encoded as UTF-8.&lt;/p&gt;
&lt;p&gt;Since I started asking libraries to honor this option a week ago, do_sqlite, do_mysql, do_postgres, Nokogiri, Psych (the new YAML parser in Ruby 1.9), sqlite3, and the MongoDB driver have all added support for this option. The fix should be applied to the MySQL driver shortly, and I am still waiting on a response from the pg driver maintainer.&lt;/p&gt;
&lt;p&gt;In short, by the time 1.9.2-final ships, I don&amp;#8217;t see any reason why all libraries in use don&amp;#8217;t honor this setting.&lt;/p&gt;
&lt;p&gt;I&amp;#8217;d also add that MongoDB and Nokogiri already return only UTF-8 data, so supporting this option was primarily a matter of correctness. If a driver already deals entirely in UTF-8, it will work transparently with Rails because Rails deals only in UTF-8.&lt;/p&gt;
&lt;p&gt;That said, we plan to robustly be able to support scenarios where UTF-8 cannot be used in this way (because encoding are in use that cannot be transparently encoded at the boundary without data loss), so proper support for default_internal will be essential in the long-term.&lt;/p&gt;
&lt;h2&gt;TL;DR&lt;/h2&gt;
&lt;p&gt;The vast majority of encoding bugs to date have resulted from outdated drivers that returned BINARY data instead of Strings with proper encoding tags.&lt;/p&gt;
&lt;p&gt;The pipeline that brings Strings in and out of Rails is reasonably well-understood, and simply by using UTF-8 libraries for each part of that pipeline, Ruby 1.9 will transparently work.&lt;/p&gt;
&lt;p&gt;If you accidentally use non-UTF-8 sources in the pipeline, Ruby 1.9 will throw an error, an improvement over the Ruby 1.8 behavior of simply sending corrupted data to the client.&lt;/p&gt;
&lt;p&gt;For this scenario, Ruby 1.9 allows you to specify a preferred encoding, which instructs the non-UTF-8 source to convert Strings in other encodings to UTF-8.&lt;/p&gt;
&lt;p&gt;By default, Rails will set this option to UTF-8, which means that you should not see ISO-8859-1 Strings in your Rails application.&lt;/p&gt;
&lt;p&gt;By the time Ruby 1.9 is released in a few months, this should be a reality, and your experience dealing with Ruby 1.9 String should be superior to the 1.8 experience, because it should generally work, but libraries will have properly considered encoding issues. This means that serving misencoded data should be basically impossible.&lt;/p&gt;
&lt;h2&gt;TL;DR the TL;DR&lt;/h2&gt;
&lt;p&gt;When using Rails 3.0 with Ruby 1.9.2-final, you will generally not have to care about encodings.&lt;/p&gt;
&lt;h2&gt;Postscript&lt;/h2&gt;
&lt;p&gt;With all that said, there can be scenarios where you receive BINARY data from a source. This can happen in any language that handles encodings more transparently than Ruby, such as Java and Python.&lt;/p&gt;
&lt;p&gt;This is because it is possible for a library to receive BINARY data and not have the necessary metadata to tag it with an encoding.&lt;/p&gt;
&lt;p&gt;In this case, you will either need to determine the encoding yourself or treat it as raw BINARY data, and not a String. The reason this scenario is rare is that if there is a way that you can determine the encoding (such as by looking at provided with the bytes), the original library can do the same.&lt;/p&gt;
&lt;p&gt;If you get into a scenario where you know the encoding, but it is not machine available, you will want to do something like:&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;data = API.&lt;span style="color:#9900CC;"&gt;get&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;&amp;quot;data&amp;quot;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
&amp;nbsp;
data.&lt;span style="color:#9900CC;"&gt;encoding&lt;/span&gt; &lt;span style="color:#008000; font-style:italic;"&gt;#=&amp;gt; ASCII-8BIT # alias for BINARY&lt;/span&gt;
&amp;nbsp;
data.&lt;span style="color:#9900CC;"&gt;force_encoding&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;&amp;quot;SHIFT-JIS&amp;quot;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;encode&lt;/span&gt;!
&amp;nbsp;
&lt;span style="color:#008000; font-style:italic;"&gt;# This first tags the data with the encoding that&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# you know it is, and then re-encodes it to&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# the default_internal encoding, if one was&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# specified&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;script type="text/javascript"&gt;
  addthis_url    = 'http%3A%2F%2Fyehudakatz.com%2F2010%2F05%2F17%2Fencodings-unabridged%2F';
  addthis_title  = 'Encodings%2C+Unabridged';
  addthis_pub    = '';
&lt;/script&gt;&lt;script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" &gt;&lt;/script&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/KatzGotYourTongue?a=UD1uh4wQFOM:jHwk-HaoDRU:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/KatzGotYourTongue?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/KatzGotYourTongue/~4/UD1uh4wQFOM" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://yehudakatz.com/2010/05/17/encodings-unabridged/#comments" thr:count="18" />
		<link rel="replies" type="application/atom+xml" href="http://yehudakatz.com/2010/05/17/encodings-unabridged/feed/atom/" thr:count="18" />
		<thr:total>18</thr:total>
	<feedburner:origLink>http://yehudakatz.com/2010/05/17/encodings-unabridged/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>wycats</name>
					</author>
		<title type="html"><![CDATA[My Common Git Workflow]]></title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/KatzGotYourTongue/~3/wcpot-yeU28/" />
		<id>http://yehudakatz.com/?p=486</id>
		<updated>2010-05-26T17:16:22Z</updated>
		<published>2010-05-13T09:31:32Z</published>
		<category scheme="http://yehudakatz.com" term="Other" />		<summary type="html"><![CDATA[A recent post that was highly ranked on Hacker News complained about common git workflows causing him serious pain. While I won&#8217;t get into the merit of his user experience complaints, I do want to talk about his specific use-case and how I personally work with it in git. Best I can tell, Mike Taylor [...]]]></summary>
		<content type="html" xml:base="http://yehudakatz.com/2010/05/13/common-git-workflows/">&lt;p&gt;A &lt;a href="http://reprog.wordpress.com/2010/05/12/still-hatin-on-git-now-with-added-actual-reasons/"&gt;recent post&lt;/a&gt; that was highly ranked on Hacker News complained about common git workflows causing him serious pain. While I won&amp;#8217;t get into the merit of his user experience complaints, I do want to talk about his specific use-case and how I personally work with it in git.&lt;/p&gt;
&lt;p&gt;Best I can tell, Mike Taylor (the guy in the post) either tried to figure out a standard git workflow on his own, or he followed poor instructions that tried to bootstrap someone from an svn background while intentionally leaving out &lt;strong&gt;important&lt;/strong&gt; information. In any event, I&amp;#8217;ll step through my personal workflow for his scenario, contrasting with subversion as I go.&lt;/p&gt;
&lt;h2&gt;Cloning the Repository&lt;/h2&gt;
&lt;p&gt;The very first step when working with a repository is to clone it. In subversion, this is accomplished via &lt;code&gt;svn checkout svn://url/to/repo/trunk&lt;/code&gt;. This retrieves the most recent revision of the trunk branch of the repository.&lt;/p&gt;
&lt;p&gt;In git, this is accomplished via &lt;code&gt;git clone git://url/to/repo&lt;/code&gt; (the http protocol is also possible). This retrieves the entire repository, including other branches and tags.&lt;/p&gt;
&lt;h2&gt;Making the Change&lt;/h2&gt;
&lt;p&gt;In both git and subversion, you make the change using a normal text editor.&lt;/p&gt;
&lt;h2&gt;After Making the Change&lt;/h2&gt;
&lt;p&gt;In git, you make a local commit, marking the difference between the most recent pulled version (master) and the changes you made. In subversion, the normal workflow does not involve making a change, but apparently some people make manual diffs in order to have a local copy of the changes before updating from the remote. Here&amp;#8217;s an example comment from the Hacker News post:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt; I&amp;#8217;ll tell you what happens when I use svn and there&amp;#8217;s been an upstream change: I never update my local tree with local modifications. Instead, I extract all my local changes into a diff, then I update my local tree, and then I merge my diff back into the updated tree and commit.&lt;/p&gt;
&lt;p&gt;When I need three-way merging, which isn&amp;#8217;t often &amp;#8211; usually patch can resync simple things like line offsets &amp;#8211; it&amp;#8217;s handled by a file comparison tool. I have a simple script which handles this&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;My personal process for making the commit in git almost always involves the gitx GUI, which lets me see the changes for each individual file, select the files (or chunks in the files) to commit, and then commit the whole thing. I sometimes break up the changes into several granular commits, if appropriate.&lt;/p&gt;
&lt;h2&gt;Updating from the remote&lt;/h2&gt;
&lt;p&gt;Now that we have our local changes, the next step is to update from the remote. In subversion, &lt;a href="http://svnbook.red-bean.com/en/1.1/ch03s05.html"&gt;you would run svn up&lt;/a&gt;. Here, subversion will apply a merge strategy to attempt to merge the remote changes with the local ones that you made. If a merge was unsuccessful, subversion will tell you that a conflict has occurred. If you did not manually save off a diff file, there is no way to get back to the status from before you made the change.&lt;/p&gt;
&lt;p&gt;In git, you would run git pull. By default, git applies the &amp;#8220;recursive&amp;#8221; strategy, which tries to merge your current files with the remote files at the most recent revision. As with subversion, this can result in a conflict. You can also pass the &lt;code&gt;--rebase&lt;/code&gt; flag to pull, which is how I usually work. This tells git to stash away your commits, pull the remote changes, and then reapply your changes on top one at a time. &lt;/p&gt;
&lt;p&gt;If you use &lt;code&gt;--rebase&lt;/code&gt;, you may get a conflict for each of your local commits, which is usually easier to handle than a bunch of conflicts all at once.&lt;/p&gt;
&lt;p&gt;I definitely recommend using &lt;code&gt;--rebase&lt;/code&gt; which also provides instructions for dealing with conflicts as they arise.&lt;/p&gt;
&lt;p&gt;In either case, in my experience, git&amp;#8217;s merging capabilities are more advanced than subversion&amp;#8217;s. This will result in many fewer cases where conflicts occur.&lt;/p&gt;
&lt;h2&gt;Resolving Conflicts&lt;/h2&gt;
&lt;p&gt;From here on, I am assuming you followed my advice and used &lt;code&gt;git pull --rebase&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If a conflict has occurred, you will find that if you run git status, all of the non-conflicting files are already listed as &amp;#8220;staged&amp;#8221;, while the conflicting files are listed outside the staging area. This means that the non-conflicting files are already considered &amp;#8220;added&amp;#8221; to the current commit.&lt;/p&gt;
&lt;p&gt;To resolve the conflicts, fix up the files listed outside the staging area and &lt;code&gt;git add&lt;/code&gt; them. Again, I normally use gitx to move the resolved files into the staging area.&lt;/p&gt;
&lt;p&gt;Once you have resolved the conflict, run &lt;code&gt;git rebase --continue&lt;/code&gt;. This tells git to use the fixed up changes you just made instead of the original commit it was trying to put on top of the changes you got from the remote.&lt;/p&gt;
&lt;p&gt;In subversion, if you got a conflict, subversion will create three files for you: &lt;code&gt;file.mine&lt;/code&gt;, &lt;code&gt;file.rOLD&lt;/code&gt;, and &lt;code&gt;file.rNEW&lt;/code&gt;. You are responsible for fixing up the conflicts and getting back a working &lt;code&gt;file&lt;/code&gt;. Once you are done, you run &lt;code&gt;svn resolved&lt;/code&gt;.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; If you had not used &lt;code&gt;git pull --rebase&lt;/code&gt;, but instead did raw &lt;code&gt;git pull&lt;/code&gt;, you would fix up the files, add the files using &lt;code&gt;git add&lt;/code&gt; or gitx, and the run &lt;code&gt;git commit&lt;/code&gt; to seal the deal&lt;/p&gt;&lt;/blockquote&gt;
&lt;h2&gt;Yikes! Something went wrong!&lt;/h2&gt;
&lt;p&gt;In git, if something goes wrong, you just run &lt;code&gt;git reset --hard&lt;/code&gt;, which will bring you back to your last local commit.&lt;/p&gt;
&lt;p&gt;In subversion, it&amp;#8217;s not always possible unless you manually stored off a diff before you started.&lt;/p&gt;
&lt;h2&gt;Pushing&lt;/h2&gt;
&lt;p&gt;Now that you&amp;#8217;re in sync with the remote server, you push your changes. In git, you run &lt;code&gt;git push&lt;/code&gt;. In subversion, you run &lt;code&gt;svn commit&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;One Glossed-Over Difference&lt;/h2&gt;
&lt;p&gt;Subversion allows you to commit changes even if you haven&amp;#8217;t &lt;code&gt;svn up&lt;/code&gt;ed and there have been changes to the remote, as long as there are no conflicts between your local files and the remote files.&lt;/p&gt;
&lt;p&gt;Git never allows you to push changes to the remote if there have been remote changes. I personally prefer the git behavior, but I could see why someone might prefer the subversion behavior. However, I glossed over this difference because every subversion reference I&amp;#8217;ve found advises running &lt;code&gt;svn up&lt;/code&gt; before a commit, and I personally always did that in my years using subversion.&lt;/p&gt;
&lt;h2&gt;Comparison&lt;/h2&gt;
&lt;p&gt;Here&amp;#8217;s a workflow comparison between git and subversion:&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;git&lt;/th&gt;
&lt;th&gt;svn&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Clone a repository&lt;/td&gt;
&lt;td&gt;&lt;code&gt;git clone git://github.com/rails/rails.git&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;svn checkout http://dev.rubyonrails.org/svn/rails/trunk&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Preparing changes&lt;/td&gt;
&lt;td&gt;&lt;code&gt;git commit&lt;/code&gt; (using gitx)&lt;/td&gt;
&lt;td&gt;nothing or create a manual diff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Update from the remote&lt;/td&gt;
&lt;td&gt;&lt;code&gt;git pull --rebase&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;svn up&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resolving conflicts&lt;/td&gt;
&lt;td&gt;&lt;code&gt;git add&lt;/code&gt; then &lt;code&gt;git rebase --continue&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;svn resolve&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resolving conflicts without &amp;#8211;rebase&lt;/td&gt;
&lt;td&gt;&lt;code&gt;git add&lt;/code&gt; then &lt;code&gt;git commit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yikes! Rolling back&lt;/td&gt;
&lt;td&gt;git reset &amp;#8211;hard&lt;/td&gt;
&lt;td&gt;&lt;code&gt;svn up -rOLD&lt;/code&gt; then apply diff (only if you manually made a diff first)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pushing&lt;/td&gt;
&lt;td&gt;git push&lt;/td&gt;
&lt;td&gt;svn commit&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;Note that I am not attempting to provide an exhaustive guide to git here; there are many more git features that are quite useful. Additionally, I personally do a lot of local branching, and prefer to be able to think about git in terms of cheap branches, but the original poster explicitly said that he&amp;#8217;d rather not. As a result, I didn&amp;#8217;t address that here.&lt;/p&gt;
&lt;p&gt;I also don&amp;#8217;t believe that thinking of git in terms of subversion is a good idea. That said, the point of this post (and the point of the original poster) is that there are a set of high-level version control operations that you&amp;#8217;d expect git to be able to handle in simple cases without a lot of fuss.&lt;/p&gt;
&lt;script type="text/javascript"&gt;
  addthis_url    = 'http%3A%2F%2Fyehudakatz.com%2F2010%2F05%2F13%2Fcommon-git-workflows%2F';
  addthis_title  = 'My+Common+Git+Workflow';
  addthis_pub    = '';
&lt;/script&gt;&lt;script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" &gt;&lt;/script&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/KatzGotYourTongue?a=wcpot-yeU28:u3B6kUCQJl8:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/KatzGotYourTongue?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/KatzGotYourTongue/~4/wcpot-yeU28" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://yehudakatz.com/2010/05/13/common-git-workflows/#comments" thr:count="29" />
		<link rel="replies" type="application/atom+xml" href="http://yehudakatz.com/2010/05/13/common-git-workflows/feed/atom/" thr:count="29" />
		<thr:total>29</thr:total>
	<feedburner:origLink>http://yehudakatz.com/2010/05/13/common-git-workflows/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>wycats</name>
					</author>
		<title type="html"><![CDATA[The How and Why of Bundler Groups]]></title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/KatzGotYourTongue/~3/hriKV6vww1U/" />
		<id>http://yehudakatz.com/?p=482</id>
		<updated>2010-05-09T19:09:30Z</updated>
		<published>2010-05-09T18:32:58Z</published>
		<category scheme="http://yehudakatz.com" term="Other" />		<summary type="html"><![CDATA[Since version 0.9, Bundler has had a feature called &#8220;groups&#8221;. The purpose of this feature is to allow you to specify groups of dependencies which may be used in certain situations, but not in others. For instance, you may use ActiveMerchant only in production. In this case, you could say: group :production do gem &#34;activemerchant&#34; [...]]]></summary>
		<content type="html" xml:base="http://yehudakatz.com/2010/05/09/the-how-and-why-of-bundler-groups/">&lt;p&gt;Since version 0.9, Bundler has had a feature called &amp;#8220;groups&amp;#8221;. The purpose of this feature is to allow you to specify groups of dependencies which may be used in certain situations, but not in others.&lt;/p&gt;
&lt;p&gt;For instance, you may use ActiveMerchant only in production. In this case, you could say:&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;group &lt;span style="color:#ff3333; font-weight:bold;"&gt;:production&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;do&lt;/span&gt;
  gem &lt;span style="color:#996600;"&gt;&amp;quot;activemerchant&amp;quot;&lt;/span&gt;
&lt;span style="color:#9966CC; font-weight:bold;"&gt;end&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Specifying groups allows you to do two things. First, you can install the gems in your Gemfile, minus specific groups. For instance, Rails puts &lt;code&gt;mysql&lt;/code&gt; and &lt;code&gt;pg&lt;/code&gt; in a &lt;code&gt;database&lt;/code&gt; group so that if you&amp;#8217;re just working on &lt;code&gt;ActionPack&lt;/code&gt;, you can &lt;code&gt;bundle install --without db&lt;/code&gt; and run the &lt;code&gt;ActionPack&lt;/code&gt; tests without having to worry about getting the gems installed.&lt;/p&gt;
&lt;p&gt;Second, you can list specific groups to autorequire using &lt;code&gt;Bundler.require&lt;/code&gt;. By default, &lt;code&gt;Bundler.require&lt;/code&gt; requires all the gems in the &lt;code&gt;default&lt;/code&gt; group (which is all the gems that have no explicit group). You can also say &lt;code&gt;Bundler.require(:default, :another_group)&lt;/code&gt; to require specific groups.&lt;/p&gt;
&lt;p&gt;Note the difference between these operations: &lt;code&gt;bundle install&lt;/code&gt; is opt-out, while &lt;code&gt;Bundler.require&lt;/code&gt; is opt-in. This is because the common usage of groups is to specify gems for different environments (such as development, test and production) and you shouldn&amp;#8217;t need to specify that you want the &amp;#8220;development&amp;#8221; and &amp;#8220;test&amp;#8221; gems just to get up and running. On the other hand, you don&amp;#8217;t want your test dependencies loaded in development or production.&lt;/p&gt;
&lt;p&gt;It is also worth noting that all gems that you installed (i.e. not the ones that you excluded at install time with &lt;code&gt;--without&lt;/code&gt;) will be available to &lt;code&gt;require&lt;/code&gt;. This has no effect unless you actually require them. This means that in development mode, if you explicitly require &lt;code&gt;rspec&lt;/code&gt;, it will work.&lt;/p&gt;
&lt;p&gt;Rails 3 defaults to mapping groups to environment names, and explicitly autorequiring the implicit &lt;code&gt;default&lt;/code&gt; group and the group named the same as the current environment. For example, in development mode, Rails will require the &lt;code&gt;default&lt;/code&gt; group and the &lt;code&gt;development&lt;/code&gt; group. The code that does this is in your application.rb:&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;Bundler.&lt;span style="color:#CC0066; font-weight:bold;"&gt;require&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#ff3333; font-weight:bold;"&gt;:default&lt;/span&gt;, Rails.&lt;span style="color:#9900CC;"&gt;env&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;if&lt;/span&gt; &lt;span style="color:#9966CC; font-weight:bold;"&gt;defined&lt;/span&gt;?&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;Bundler&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2&gt;Consistency&lt;/h2&gt;
&lt;p&gt;In order to ensure consistency across all environments, bundler resolves the dependencies of your application using the gems listed in all groups, even if you specify &lt;code&gt;--without&lt;/code&gt;. This means that while you can skip installing the gems listed in the production group by saying &lt;code&gt;--without production&lt;/code&gt;, bundler will still download and examine the gems in order to properly resolve all dependencies.&lt;/p&gt;
&lt;p&gt;As a result, the dependencies you install in development mode and test with will be compatible with the gems in other environments. In essence, this policy ensures that if your tests pass and run in development, your app will not fail to run in production because the dependencies resolved differently.&lt;/p&gt;
&lt;h2&gt;Multiple Inconsistent Configurations&lt;/h2&gt;
&lt;p&gt;Sometimes, especially when developing gems for wider use, you want to test your code against multiple incompatible configurations. At first glance, you might think that you could use groups for this case, but as described above, groups are designed for cases where all of the gems are compatible, but you don&amp;#8217;t always want to have to install them in all situations.&lt;/p&gt;
&lt;p&gt;Instead, use multiple Gemfiles, one for each incompatible configuration. When installing, do &lt;code&gt;bundle install --gemfile Gemfile.rails2&lt;/code&gt;. This will tell Bundler to use &lt;code&gt;Gemfile.rails2&lt;/code&gt; rather than the default &lt;code&gt;Gemfile&lt;/code&gt;. As in all cases in Bundler, you can also specify this option globally with an environment variable (&lt;code&gt;BUNDLE_GEMFILE&lt;/code&gt;).&lt;/p&gt;
&lt;script type="text/javascript"&gt;
  addthis_url    = 'http%3A%2F%2Fyehudakatz.com%2F2010%2F05%2F09%2Fthe-how-and-why-of-bundler-groups%2F';
  addthis_title  = 'The+How+and+Why+of+Bundler+Groups';
  addthis_pub    = '';
&lt;/script&gt;&lt;script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" &gt;&lt;/script&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/KatzGotYourTongue?a=hriKV6vww1U:5NNsScxfOAo:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/KatzGotYourTongue?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/KatzGotYourTongue/~4/hriKV6vww1U" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://yehudakatz.com/2010/05/09/the-how-and-why-of-bundler-groups/#comments" thr:count="6" />
		<link rel="replies" type="application/atom+xml" href="http://yehudakatz.com/2010/05/09/the-how-and-why-of-bundler-groups/feed/atom/" thr:count="6" />
		<thr:total>6</thr:total>
	<feedburner:origLink>http://yehudakatz.com/2010/05/09/the-how-and-why-of-bundler-groups/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>wycats</name>
					</author>
		<title type="html"><![CDATA[Ruby 1.9 Encodings: A Primer and the Solution for Rails]]></title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/KatzGotYourTongue/~3/N0haIfkcoBw/" />
		<id>http://yehudakatz.com/?p=476</id>
		<updated>2010-05-07T07:15:33Z</updated>
		<published>2010-05-05T20:41:27Z</published>
		<category scheme="http://yehudakatz.com" term="Other" />		<summary type="html"><![CDATA[UPDATE: The DataObjects drivers, which are used in DataMapper, are now updated to honor default_internal. Let&#8217;s keep this moving. Since Ruby 1.9 announced support for encodings, there has been a flurry of activity to make existing libraries encoding aware, and a tornado of confusion as users of Ruby and Rails have tried to make sense [...]]]></summary>
		<content type="html" xml:base="http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/">&lt;p&gt;&lt;strong&gt;UPDATE:&lt;/strong&gt; The DataObjects drivers, which are used in DataMapper, are &lt;a href="http://github.com/datamapper/do/commit/b1c2fe5ad4fe44ea7cfd15d2d4231dfe1290b428"&gt;now updated&lt;/a&gt; to honor default_internal. Let&amp;#8217;s keep this moving.&lt;/p&gt;
&lt;p&gt;Since Ruby 1.9 announced support for encodings, there has been a flurry of activity to make existing libraries encoding aware, and a tornado of confusion as users of Ruby and Rails have tried to make sense of it.&lt;/p&gt;
&lt;p&gt;In this post, I will lay out the most common problems people have had, and what we can do as a community to put these issues to bed in time for Ruby 1.9.2 final.&lt;/p&gt;
&lt;h2&gt;A Quick Tour&lt;/h2&gt;
&lt;p&gt;I&amp;#8217;m going to simplify some of this, but the broad strokes are essentially correct.&lt;/p&gt;
&lt;p&gt;Before we begin, many of you are probably wonder what exactly an &amp;#8220;encoding&amp;#8221; is. For me, getting a handle on this was an important part of helping me understand the possible solution space.&lt;/p&gt;
&lt;p&gt;On disk, all Strings are stored as a sequence of bytes. An encoding simply specifies how to take those bytes and convert them into &amp;#8220;codepoints&amp;#8221;. In some languages, such as English, a &amp;#8220;codepoint&amp;#8221; is exactly equivalent to &amp;#8220;a character&amp;#8221;. In most other languages, there is not a one-to-one correspondence. For example, a German codepoint might specify that the next codepoint should get an ümlaut.&lt;/p&gt;
&lt;p&gt;The list of English characters represented by the first seven bits of ASCII (characters 0 through 127 in &amp;#8220;ASCII-7&amp;#8243;) have the same representation in many (but not all) encodings. This means that if you only use English characters, the on-disk representation of the characters will often be exactly the same regardless of the source encoding.&lt;/p&gt;
&lt;p&gt;However, once you start to use other characters, the bytes on disk mean different things in different encodings. Have you ever seen a page on the Internet filled with something like &amp;#8220;FÃ¼hrer&amp;#8221;? That is the consequence of the bytes of &amp;#8220;Führer&amp;#8221; stored as UTF-8 being interpreted as Latin-1.&lt;/p&gt;
&lt;p&gt;You can trivially see this problem using Ruby 1.9&amp;#8242;s encoding support by running this program:&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#008000; font-style:italic;"&gt;# encoding: UTF-8&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;hello ümlaut&amp;quot;&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;force_encoding&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;&amp;quot;ISO-8859-1&amp;quot;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;encode&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;&amp;quot;UTF-8&amp;quot;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#008000; font-style:italic;"&gt;# Output&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# hello Ã¼mlat&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;First, we create a String (&amp;#8220;hello ümlaut&amp;#8221;) in the UTF-8 encoding. Next, we tell Ruby that the String is actually Latin-1. It&amp;#8217;s not, so an attempt to read the characters will interpret the raw bytes of the &amp;#8220;ü&amp;#8221; as though they were Latin-1 bytes. We ask Ruby to give us &lt;em&gt;that&lt;/em&gt; interpretation of the data in UTF-8 via &lt;code&gt;encode&lt;/code&gt; and print it out.&lt;/p&gt;
&lt;p&gt;We can see that while the bytes for &amp;#8220;hello &amp;#8221; and &amp;#8220;mlat&amp;#8221; were identical in both UTF-8 and Latin-1, the bytes for &amp;#8220;ü&amp;#8221; in UTF-8 mean &amp;#8220;Ã¼&amp;#8221; in Latin-1.&lt;/p&gt;
&lt;p&gt;Note that while &lt;code&gt;force_encoding&lt;/code&gt; simply tags the String with a different encoding, &lt;code&gt;encode&lt;/code&gt; converts the bytes of one encoding into the equivalent bytes of the second. As a result, while &lt;code&gt;force_encoding&lt;/code&gt; should almost never be used unless you know for sure that the bytes actually represent the characters you want in the target encoding, &lt;code&gt;encode&lt;/code&gt; is relatively safe to use to convert a String into the encoding you want.&lt;/p&gt;
&lt;p&gt;You&amp;#8217;ve probably also seen the reverse problem, where bytes encoded in Latin-1 ended up inside a page encoded in UTF-8.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#008000; font-style:italic;"&gt;# encoding: ISO-8859-1&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;hello ümlaut&amp;quot;&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;force_encoding&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;&amp;quot;UTF-8&amp;quot;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#008000; font-style:italic;"&gt;# Output&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# hello ?mlat&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here, the sequence of bytes that represents an &amp;#8220;ü&amp;#8221; in Latin-1 could not be recognized in UTF-8, so they were replaced with a &amp;#8220;?&amp;#8221;. Note that &lt;code&gt;puts&lt;/code&gt; will always simply write out the bytes to your terminal, and the terminal&amp;#8217;s encoding will determine how they are interpreted. The examples in this post are all outputted to a terminal using UTF-8 encoding.&lt;/p&gt;
&lt;p&gt;As you can imagine, this presents quite the issue when concatenating two Strings of different encodings. Simply smashing together the raw bytes of the two Strings can result in output that is incomprehensible in either encoding. To make matters worse, it&amp;#8217;s not always possible to represent all of the characters in one encoding in another. For instance, the characters of the Emoji encoding cannot be represented in the ISO-8859-1 encoding (or even in a standardized way onto the UTF-8 encoding).&lt;/p&gt;
&lt;p&gt;As a result, when you attempt to concatenate two Strings of different encodings in Ruby 1.9, Ruby displays an error.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#008000; font-style:italic;"&gt;# encoding: UTF-8&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;hello ümlaut&amp;quot;&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;encode&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;&amp;quot;ISO-8859-1&amp;quot;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;+&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;hello ümlaut&amp;quot;&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#008000; font-style:italic;"&gt;# Output&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# incompatible character encodings: ISO-8859-1 and UTF-8 (Encoding::CompatibilityError)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Because it&amp;#8217;s extremely tricky for Ruby to be sure that it can make a lossless conversion from one encoding to another (Ruby supports almost 100 different encodings), the Ruby core team has decided to raise an exception if two Strings in different encodings are concatenated together.&lt;/p&gt;
&lt;p&gt;There is one exception to this rule. If the bytes in one of the two Strings are all under 127 (and therefore valid characters in ASCII-7), and both encodings are compatible with ASCII-7 (meaning that the bytes of ASCII-7 represent exactly the same characters in the other encoding), Ruby will make the conversion without complaining.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#008000; font-style:italic;"&gt;# encoding: UTF-8&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;hello umlat&amp;quot;&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;encode&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;&amp;quot;ISO-8859-1&amp;quot;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;+&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;hello ümlaut&amp;quot;&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#008000; font-style:italic;"&gt;# Output&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# hello umlathello ümlaut&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Since Ruby does not allow characters outside of the ASCII-7 range in source files without a declared encoding, this exception eliminates a large number of potential problems that Ruby&amp;#8217;s strict concatenation rules might have introduced.&lt;/p&gt;
&lt;h2&gt;Binary Strings&lt;/h2&gt;
&lt;p&gt;By default, Strings with no encoding in Ruby are tagged with the &lt;code&gt;ASCII-8BIT&lt;/code&gt; encoding, which is an alias for &lt;code&gt;BINARY&lt;/code&gt;. Essentially, this is an encoding that simply means &amp;#8220;raw bytes here&amp;#8221;.&lt;/p&gt;
&lt;p&gt;In general, code in Rails applications should not encounter &lt;code&gt;BINARY&lt;/code&gt; strings, except for Strings created in source files without encodings. However, since these Strings will virtually always fall under the ASCII-7 exception, Ruby programmers should never have to deal with incompatible encoding exceptions where one of the two encodings is &lt;code&gt;ASCII-8BIT&lt;/code&gt; (i.e. &lt;code&gt;BINARY&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;That said, almost all of the encoding problems reported by users in the Rails bug tracker involved &lt;code&gt;ASCII-8BIT&lt;/code&gt; Strings. How did this happen?&lt;/p&gt;
&lt;p&gt;There are two reasons for this.&lt;/p&gt;
&lt;p&gt;The first reason is that early on, database drivers generally didn&amp;#8217;t properly tag Strings they retrieved from the database with the proper encoding. This involves a manual mapping from the database&amp;#8217;s encoding names to Ruby&amp;#8217;s encoding names. As a result, it was extremely common from database drivers to return Strings with characters outside of the ASCII-7 range (because the original content was encoded in the database as UTF-8 or ISO-8859-1/Latin-1). &lt;/p&gt;
&lt;p&gt;When attempting to concatenate that content onto another UTF-8 string (such as the buffer in an ERB template), Ruby would raise an incompatible encoding exception.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#008000; font-style:italic;"&gt;# encoding: UTF-8&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;hello ümlaut&amp;quot;&lt;/span&gt; &lt;span style="color:#006600; font-weight:bold;"&gt;+&lt;/span&gt; &lt;span style="color:#996600;"&gt;&amp;quot;hello ümlaut&amp;quot;&lt;/span&gt;.&lt;span style="color:#9900CC;"&gt;force_encoding&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#40;&lt;/span&gt;&lt;span style="color:#996600;"&gt;&amp;quot;BINARY&amp;quot;&lt;/span&gt;&lt;span style="color:#006600; font-weight:bold;"&gt;&amp;#41;&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#008000; font-style:italic;"&gt;# Output&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is essentially identical to the scenario many people encountered. A UTF-8 String was presented to Ruby as a &lt;code&gt;BINARY&lt;/code&gt; String, since the database driver didn&amp;#8217;t tag it. When attempting to concatenate it onto UTF-8, Ruby had no way to do so reliably, so it raised an exception.&lt;/p&gt;
&lt;p&gt;One reason that many people didn&amp;#8217;t encounter this problem was that either the contents of the template or the text from the database were entirely in the ASCII-7 subset of their character set. As a result, Ruby would not complain. This is deceptive, because if they made a small change to their template, or if a user happened to enter non-ASCII-7 data (for instance, they got their first user named José), they would suddenly start seeing an incompatible encoding exception.&lt;/p&gt;
&lt;p&gt;When people see this incompatible encoding exception, one common reaction is to call &lt;code&gt;force_encoding("UTF-8")&lt;/code&gt; on the &lt;code&gt;BINARY&lt;/code&gt; data. This will work great for Strings whose bytes actually are encoded in UTF-8. However, if people whose Strings were encoded in &lt;code&gt;ISO-8859-5&lt;/code&gt; (Russian) followed this instruction, they would end up with scrambled output.&lt;/p&gt;
&lt;p&gt;Additionally, it&amp;#8217;s impossible to simply &lt;code&gt;encode&lt;/code&gt; the data, since Ruby doesn&amp;#8217;t actually know the source encoding. In essence, a crucial piece of information has been lost at the database driver level.&lt;/p&gt;
&lt;p&gt;Unfortunately, this means that well-meaning people who have solved their problem by &lt;code&gt;force_encoding&lt;/code&gt; their Strings to UTF-8 (because the bytes actually &lt;strong&gt;did&lt;/strong&gt; represent UTF-8 characters) become baffled when their solution doesn&amp;#8217;t work for someone working on a Russian website.&lt;/p&gt;
&lt;p&gt;Thankfully, this situation is now mostly solved. There are updates for all database drivers that map the encodings from the database to a Ruby encoding, which means that UTF-8 text from the database will be UTF-8 Strings in Ruby, and Latin-1 text from the database will be ISO-8859-1 Strings in Ruby.&lt;/p&gt;
&lt;p&gt;Unfortunately, there is a second large source of &lt;code&gt;BINARY&lt;/code&gt; Strings in Ruby. Specifically, data received from the web in the form of URL encoded POST bodies often do not specify the content-type of the content sent from forms.&lt;/p&gt;
&lt;p&gt;In many cases, browsers send POST bodies in the encoding of the original document, but not always. In addition, some browsers say that they&amp;#8217;re sending content as &lt;code&gt;ISO-8859-1&lt;/code&gt; but actually send it in &lt;code&gt;Windows-1251&lt;/code&gt;. There is a long thread on the Rack tracker about this, but the bottom line is that it&amp;#8217;s extremely difficult to determine the encoding of a POST body sent from the web.&lt;/p&gt;
&lt;p&gt;As a result, Rack handlers send the raw bytes through as &lt;code&gt;BINARY&lt;/code&gt; (which is reasonable, since handlers shouldn&amp;#8217;t be in the business of trying to wade through this problem) and no middleware exists (yet) to properly tag the String with the correct encoding.&lt;/p&gt;
&lt;p&gt;This means that if the stars align, the raw bytes are UTF-8, end up in a UTF-8 database, and end up coming back out again tagged as UTF-8. If the stars do not align, the text might actually be encoded in &lt;code&gt;ISO-8859-1&lt;/code&gt;, get put into a UTF-8 database, and come out tagged as UTF-8 (and we know what happens when ISO-8859-1 data is mistakenly tagged as UTF-8).&lt;/p&gt;
&lt;p&gt;In this case, because the &lt;code&gt;ISO-8859-1&lt;/code&gt; data is improperly tagged as UTF-8, Ruby happily concatenates it with other UTF-8 Strings, and hilarity ensues.&lt;/p&gt;
&lt;p&gt;Because English characters have the same byte representation in all commonly used encodings, this problem is not as common as you might imagine. Unfortunately, this simply means that people who &lt;strong&gt;do&lt;/strong&gt; encounter it are baffled and find it hard to get help. Additionally, this problem doesn&amp;#8217;t manifest itself as a hard error. it can go unnoticed and dismissed as a minor annoyance if the number of non-ASCII-7 characters are low.&lt;/p&gt;
&lt;p&gt;In order to properly solve this problem for Ruby 1.9, we need a very good heuristic for properly determining the encoding of web-sent POST bodies. There are some promising avenues that will get it right 99.9% of the time, and we need to package them into up a middleware that will tag Strings correctly.&lt;/p&gt;
&lt;h2&gt;Incompatible Encodings&lt;/h2&gt;
&lt;p&gt;If you&amp;#8217;ve been paying attention, you&amp;#8217;ve probably noticed that while the database drivers have solved one problem, they actually introduced another one.&lt;/p&gt;
&lt;p&gt;Imagine that you&amp;#8217;re using a MySQL database encoded in ISO-8859-1 (or ISO-8859-5, popular for Russian applications, or any other non-UTF-8 encoding). Now that the String coming back from the database is properly tagged as &lt;code&gt;ISO-8859-1&lt;/code&gt;, Ruby will refuse to concatenate it onto the ERB buffer (which is encoded in UTF-8). Even if we solved this problem for ERB, it could be trivially reintroduced in other parts of the application through regular concatenation (&lt;code&gt;+&lt;/code&gt;, &lt;code&gt;concat&lt;/code&gt;, or even String interpolation).&lt;/p&gt;
&lt;p&gt;Again, this problem is somewhat mitigated due to the ASCII-7 subset exception, which means that as long as one of the two incompatible Strings uses only English characters, users won&amp;#8217;t see any problems. Again, because this &amp;#8220;solution&amp;#8221; means that the Ruby developer in question still may not understand encodings, this simply defers the problem to some uncertain point in the future when they either add a non-ASCII-7 character to their template or the user submits a non-ASCII-7 String.&lt;/p&gt;
&lt;h2&gt;The Solution&lt;/h2&gt;
&lt;p&gt;If you got this far, you&amp;#8217;re probably thinking &amp;#8220;Holy shit this encoding stuff is crazy. I don&amp;#8217;t want to have to know any of this! I just want to write my web app!&amp;#8221;&lt;/p&gt;
&lt;p&gt;And you&amp;#8217;d be correct.&lt;/p&gt;
&lt;p&gt;Other languages, such as Java and Python, solve this problem by &lt;code&gt;encode&lt;/code&gt;ing every String that enters the language as UTF-8 (or UTF-16). Theoretically, it is possible to represent the characters of every encoding in UTF-8. By doing this, programmers only ever deal with one kind of String, and concatenation happens between UTF-8 Strings.&lt;/p&gt;
&lt;p&gt;However, this solution does not work very well for the Japanese community. For a variety of complicated reasons, Japanese encoding, such as SHIFT-JIS, are not considered to losslessly encode into UTF-8. As a result, Ruby has a policy of not attempting to simply encode any inbound String into UTF-8.&lt;/p&gt;
&lt;p&gt;This decision is debatable, but the fact is that if Ruby transparently transcoded all content into UTF-8, a large portion of the Ruby community would see invisible lossy changes to their content. That part of the community is willing to put up with incompatible encoding exceptions because properly handling the encodings they regularly deal with is a somewhat manual process.&lt;/p&gt;
&lt;p&gt;On the other hand, many Rails applications work mostly with encodings that trivially encode to UTF-8 (such as UTF-8 itself, ASCII, and the ISO-8859-1 family). For this rather large part of the community, having to manually encode Strings to solve incompatible encoding problem feels like a burden that belongs on the machine has been inappropriately shifted onto Rails application developers.&lt;/p&gt;
&lt;p&gt;But there is a solution.&lt;/p&gt;
&lt;p&gt;By default, Ruby should continue to support Strings of many different encodings, and raise exceptions liberally when a developer attempts to concatenate Strings of different encodings. This would satisfy those with encoding concerns that require manual resolution.&lt;/p&gt;
&lt;p&gt;Additionally, you would be able to set a preferred encoding. This would inform drivers at the boundary (such as database drivers) that you would like them to convert any Strings that they tag with an encoding to your preferred encoding immediately. By default, Rails would set this to UTF-8, so Strings that you get back from the database or other external source would always be in UTF-8. &lt;/p&gt;
&lt;p&gt;If a String at the boundary could not be converted (for instance, if you set ISO-8859-1 as the preferred encoding, this would happen a lot), you would get an exception as soon as that String entered the system.&lt;/p&gt;
&lt;p&gt;In practice, almost all usage of this setting would be to specify UTF-8 as a preferred encoding. From your perspective, if you were dealing in UTF-8, ISO-8859-* and ASCII (most Western developers), you would never have to care about encodings.&lt;/p&gt;
&lt;p&gt;Even better, Ruby already has a mechanism that is mostly designed for this purpose. In Ruby 1.9, setting &lt;code&gt;Encoding.default_internal&lt;/code&gt; tells Ruby to encode all Strings crossing the barrier via its IO system into that preferred encoding. All we&amp;#8217;d need, then, is for maintainers of database drivers to honor this convention as well. &lt;/p&gt;
&lt;p&gt;It doesn&amp;#8217;t require any changes to Ruby itself, and places the burden squarely on the few people who already need to deal with encodings (because taking data from outside of Ruby, via C, always already requires a tagging step). I have spoken with Aaron Patterson, who has been working on the SQLite3 driver, and he feels that this change is simple enough for maintainers of drivers dealing with external Strings to make it a viable option. He has &lt;a href="http://github.com/luislavena/sqlite3-ruby/commit/4e4d1fc411b43ef50586f6a519e2303debbf1216"&gt;already patched SQLite3&lt;/a&gt; to make it respect default_internal.&lt;/p&gt;
&lt;p&gt;However you feel about Ruby&amp;#8217;s solution to expose String encodings directly in the language, you should agree that since we&amp;#8217;re stuck with it for the forseeable future, this solution shifts the burden of dealing with it from the unwashed masses (most of whom have no idea what an encoding is) to a few maintainers of C extensions and libraries that deal in binary data. Getting this right as soon as possible will substantially ease the transition from Ruby 1.8 to Ruby 1.9.&lt;/p&gt;
&lt;h2&gt;Postscript: What Happened in 1.8!?&lt;/h2&gt;
&lt;p&gt;When people first move to 1.9 and encounter these shenanigans, they often wonder why everything seemed so simple in Ruby 1.8, and yet seemed to work.&lt;/p&gt;
&lt;p&gt;There are a few reasons for this. &lt;/p&gt;
&lt;p&gt;First, keep in mind that in Ruby 1.8, Strings are simple sequences of bytes. Ruby String operations just concatenate those byte sequences together without any kind of check. This means that concatenating two UTF-8 Strings together will just work, since the combined byte sequence is still valid UTF-8. As long as the client for the Ruby code (such as the browser) is told that the bytes are encoded in UTF-8, all is well. Rails does this by setting the default charset for all documents to UTF-8.&lt;/p&gt;
&lt;p&gt;Second, Ruby 1.8 has a &amp;#8220;UTF-8&amp;#8243; mode that makes its regular expression engine treat all Strings as UTF-8. In this mode (which is triggered by setting $KCODE = &amp;#8220;UTF-8&amp;#8243;), the regular expression engine correctly matches a complete UTF-8 character for &lt;code&gt;/./&lt;/code&gt;, for instance. Rails sets this global by default, so if you were using Rails, regular expressions respect unicode characters, not raw bytes.&lt;/p&gt;
&lt;p&gt;Third, very little non-English content in the wild is actually encoded in ISO-8859-1. If you were expecting to deal with content that was not English, you would probably set your MySQL database to use a UTF-8 encoding. Since Rails sets UTF-8 as the charset of outbound documents, most browsers will in fact return UTF-8 encoded data.&lt;/p&gt;
&lt;p&gt;Fourth, the problems caused when an ISO-8859-1 String is accidentally concatenated into a UTF-8 String are not as jarring as the errors produced by Ruby 1.9. Let&amp;#8217;s try a little experiment. First, open up a text editor, create a new file, and save it in the  ISO-8859-1 encoding.&lt;/p&gt;

&lt;div class="wp_syntax"&gt;&lt;div class="code"&gt;&lt;pre class="ruby" style="font-family:monospace;"&gt;&lt;span style="color:#ff6633; font-weight:bold;"&gt;$KCODE&lt;/span&gt; = &lt;span style="color:#996600;"&gt;&amp;quot;UTF-8&amp;quot;&lt;/span&gt;
&amp;nbsp;
iso_8859_1 = &lt;span style="color:#996600;"&gt;&amp;quot;ümlaut&amp;quot;&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#008000; font-style:italic;"&gt;# the byte representation of ümlaut in unicode&lt;/span&gt;
utf8 = &lt;span style="color:#996600;"&gt;&amp;quot;&lt;span style="color:#000099;"&gt;\x&lt;/span&gt;C3&lt;span style="color:#000099;"&gt;\x&lt;/span&gt;BCmlat&amp;quot;&lt;/span&gt;
&amp;nbsp;
&lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; iso_8859_1
&lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; utf8
&amp;nbsp;
&lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; iso_8859_1 &lt;span style="color:#006600; font-weight:bold;"&gt;+&lt;/span&gt; utf8
&lt;span style="color:#CC0066; font-weight:bold;"&gt;puts&lt;/span&gt; utf8 &lt;span style="color:#006600; font-weight:bold;"&gt;+&lt;/span&gt; iso_8859_1
&amp;nbsp;
&lt;span style="color:#008000; font-style:italic;"&gt;# Output&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# ?mlat&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# ümlaut&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# ?mlatümlaut&lt;/span&gt;
&lt;span style="color:#008000; font-style:italic;"&gt;# ümlaut?mlat&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you somehow get ISO-8859-1 encoded content that uses characters outside of the ASCII-7 range, Ruby doesn&amp;#8217;t puke. Instead, it simply replaces the unidentified character with a &amp;#8220;?&amp;#8221;, which can easily go unnoticed in a mostly English site with a few &amp;#8220;José&amp;#8221;s thrown into the mix. It could also easily be dismissed as a &amp;#8220;weird bug that we don&amp;#8217;t have time to figure out right now&amp;#8221;.&lt;/p&gt;
&lt;p&gt;Finally, Rails itself provides a pure-Ruby UTF-8 library that mops up a lot of the remaining issues. Specifically, it provides an alternate String class that can properly handle operations like split, truncate, index, justify and other operations that need to operate on characters, not bytes. It then uses this library internally in helpers like &lt;code&gt;truncate&lt;/code&gt;, transparently avoiding a whole other class of issue.&lt;/p&gt;
&lt;p&gt;In short, if you&amp;#8217;re dealing mostly with English text, and you get unlucky enough the get ISO-8859-1 input from somewhere, the worst case is that you get a &amp;#8220;?&amp;#8221; instead of a &amp;#8220;é&amp;#8221;. If you&amp;#8217;re dealing with a lot of non-English text, you&amp;#8217;re probably being not using ISO-8859-1 sources. In either case, English (ASCII) text is compatible with UTF-8, and Rails provides solid enough pure-Ruby UTF-8 support to get you most of the rest of the way.&lt;/p&gt;
&lt;p&gt;That said, anyone dealing with encodings other than UTF-8 and ISO-8859-1 (Japanese and Russian Rubyists) were definitely not in a good place with Ruby 1.8.&lt;/p&gt;
&lt;h2&gt;Thanks&lt;/h2&gt;
&lt;p&gt;I want to personally thank Jay Freeman (aka saurik), who in addition to being a general badass, spent about 15 hours with me patiently explaining these issues and working through the Ruby 1.9 source to help fully understand the tradeoffs available.&lt;/p&gt;
&lt;script type="text/javascript"&gt;
  addthis_url    = 'http%3A%2F%2Fyehudakatz.com%2F2010%2F05%2F05%2Fruby-1-9-encodings-a-primer-and-the-solution-for-rails%2F';
  addthis_title  = 'Ruby+1.9+Encodings%3A+A+Primer+and+the+Solution+for+Rails';
  addthis_pub    = '';
&lt;/script&gt;&lt;script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" &gt;&lt;/script&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/KatzGotYourTongue?a=N0haIfkcoBw:K0ZQDwbVrsg:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/KatzGotYourTongue?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/KatzGotYourTongue/~4/N0haIfkcoBw" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/#comments" thr:count="20" />
		<link rel="replies" type="application/atom+xml" href="http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/feed/atom/" thr:count="20" />
		<thr:total>20</thr:total>
	<feedburner:origLink>http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/</feedburner:origLink></entry>
	</feed><!-- Dynamic page generated in 2.264 seconds. --><!-- Cached page generated by WP-Super-Cache on 2010-09-07 20:21:55 -->
