<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
 
  <title>James on Software</title>
  <subtitle>Thoughts on software, testing...</subtitle>
  
  <link href="http://jamesgolick.com/" />
  <updated>2013-05-16T10:11:26-07:00</updated>
  <author>
    <name>James Golick</name>
    <email>jamesgolick@gmail.com</email>
  </author>
  <id>http://jamesgolick.com/</id>
  
  <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/JamesOnSoftware" /><feedburner:info uri="jamesonsoftware" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry>
    <title>Memory Allocators 101</title>
    <link href="http://feedproxy.google.com/~r/JamesOnSoftware/~3/Yh8_0HkN2kY/memory-allocators-101.html" />
    <id>tag:jamesgolick.com,2013-05-15:1368632222</id>
    <updated>2013-05-15T08:37:02-07:00</updated>
    <content type="html">&lt;p&gt;For the last few weeks, I've been working on a couple of patches to tcmalloc, Google's super high performance memory allocator. I'm going to post about them soon, but first I thought it would be cool to give some background about what a memory allocator actually does. So, if you've ever wondered what happens when you call &lt;code&gt;malloc&lt;/code&gt; or &lt;code&gt;free&lt;/code&gt;, read on.&lt;/p&gt;

&lt;hr /&gt;


&lt;p&gt;A memory allocator's responsibility is to manage free blocks of memory. If you've never read a &lt;code&gt;malloc&lt;/code&gt; implementation, you may have assumed that calling &lt;code&gt;free&lt;/code&gt; simply causes memory to be released to the operating system. But acquiring memory from the OS has a cost, so allocators tend to keep free chunks around for a while for possible re-use before deciding to release them.&lt;/p&gt;

&lt;p&gt;Managing &lt;code&gt;free&lt;/code&gt;d memory is an incredibly interesting and hard problem with two main concerns: performance and reducing heap fragmentation / waste:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How do we organize free blocks of memory such that we can quickly locate a sufficiently large block (or determine that we lack one) when someone calls &lt;code&gt;malloc&lt;/code&gt; without making calls to &lt;code&gt;free&lt;/code&gt; prohibitively expensive?&lt;/li&gt;
&lt;li&gt;What can we do to reduce fragmentation and waste in the face of sometimes drastically changing allocation patterns over the lifetime of a (potentially long-running) program? It's worth noting that heap fragmentation can have a substantial impact on CPU cache efficiency.&lt;/li&gt;
&lt;li&gt;As a bonus, there's also the matter of concurrency, but that's probably beyond the scope of this post.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The most fun part of this problem is that our two primary objectives are often in direct opposition. For example, keeping one linked list of free blocks per allocation size (say, rounded up to number of &lt;a href="http://en.wikipedia.org/wiki/Page_(computer_memory)"&gt;pages&lt;/a&gt;) can make calls to &lt;code&gt;malloc&lt;/code&gt; best case constant time, but unless some waste is accepted and chunks are kept around long enough to be reused, the worst case path will be taken more often than not.&lt;/p&gt;

&lt;p&gt;Of course, there also are a multitude of other issues to consider, such as how to decide to release memory to the operating system, and how to avoid becoming the bottleneck in a concurrent program (I'm looking at you, glibc). And the implementation details are interesting, too.&lt;/p&gt;

&lt;h3&gt;Implementation&lt;/h3&gt;

&lt;p&gt;A very basic &lt;code&gt;malloc&lt;/code&gt; implementation might use the linux system call &lt;a href="http://linux.die.net/man/2/sbrk"&gt;&lt;code&gt;sbrk(2)&lt;/code&gt;&lt;/a&gt; to acquire memory from the operating system and a linked list to store free chunks. That would make calls to &lt;code&gt;free&lt;/code&gt; constant time, but &lt;code&gt;malloc&lt;/code&gt; would be &lt;code&gt;O(n)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Of course, the allocator needs to store metadata about each chunk it manages, such as its size, free/in-use status, free-list pointer(s), etc. But since you can't exactly call &lt;code&gt;malloc&lt;/code&gt; in an allocator, it's common to store metadata in a "header" that just precedes the address that is handed to the application. So, if the header is 16 bytes in size, then the header would start at &lt;code&gt;ptr - 16&lt;/code&gt;. Pointer arithmetic galore.&lt;/p&gt;

&lt;div style="text-align:center;"&gt;&lt;img src="/images/malloc_header.png" /&gt;&lt;/div&gt;


&lt;br/&gt;




&lt;script src="https://gist.github.com/jamesgolick/5593158.js?file=header.c"&gt;&lt;/script&gt;


&lt;p&gt;In order to reduce fragmentation and promote memory reuse, it's common for &lt;code&gt;malloc&lt;/code&gt; implementations to attempt to coalesce free blocks of memory with adjacent ones, if they happen to also be free. If metadata is being stored in a header, then it's easy to determine the size and status of the pointer to the right.&lt;/p&gt;

&lt;div style="text-align:center;"&gt;&lt;img src="/images/malloc_right.png" /&gt;&lt;/div&gt;


&lt;br/&gt;




&lt;script src="https://gist.github.com/jamesgolick/5593158.js?file=right.c"&gt;&lt;/script&gt;


&lt;p&gt;But the header doesn't provide any way of determining the size of the chunk to the left, so coalescing &lt;code&gt;malloc&lt;/code&gt;s frequently put the size of each block in a footer, which is typically sized to fit a &lt;code&gt;size_t&lt;/code&gt;. Then finding the chunk to the left would look something like this:&lt;/p&gt;

&lt;div style="text-align:center;"&gt;&lt;img src="/images/malloc_left.png" /&gt;&lt;/div&gt;


&lt;br/&gt;




&lt;script src="https://gist.github.com/jamesgolick/5593158.js?file=left.c"&gt;&lt;/script&gt;


&lt;p&gt;Things get even more complicated because subsequent invocations of system calls like &lt;code&gt;sbrk&lt;/code&gt; or &lt;code&gt;mmap&lt;/code&gt; aren't guaranteed to return contiguous virtual addresses. So, when looking for chunks to coalesce, care has to be taken to make sure that invalid pointers aren't dereferenced by adding to a pointer that's on the edge of what's being managed.&lt;/p&gt;

&lt;p&gt;Typically this means creating and maintaining a separate data structure with which to keep track of the regions of virtual address space that the allocator is managing. Some allocators, such as &lt;code&gt;tcmalloc&lt;/code&gt;, simply store their metadata in that data structure rather than in headers and footers, which avoids a lot of error-prone pointer arithmetic.&lt;/p&gt;

&lt;hr /&gt;


&lt;p&gt;I could probably continue writing about this forever, but this seems like a good place to stop for now. If your interest is piqued and you'd like to learn more about memory allocators, I highly recommend diving in to writing your own &lt;code&gt;malloc&lt;/code&gt; implementation. It's a challenging project, but it's fun and it'll give you a lot of insight in to an important part of how your computer works.&lt;/p&gt;

&lt;p&gt;Soon I'll follow up on this post with one about the allocator work I've been doing lately. Also, if you're interested in this kind of stuff, check out my new &lt;a href="http://realtalk.io"&gt;podcast&lt;/a&gt; where &lt;a href="http://timetobleed.com"&gt;Joe Damato&lt;/a&gt; and I talk about systems programming. We'll definitely be covering allocators in the next few weeks sometime.&lt;/p&gt;

&lt;p&gt;Some allocator resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://man7.org/linux/man-pages/man3/malloc.3.html"&gt;malloc(3) man page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.cs.cmu.edu/afs/cs/academic/class/15213-f10/www/lectures/17-allocation-basic.pdf"&gt;http://www.cs.cmu.edu/afs/cs/academic/class/15213-f10/www/lectures/17-allocation-basic.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://goog-perftools.sourceforge.net/doc/tcmalloc.html"&gt;tcmalloc docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/jeremie-koenig/glibc/tree/master-beware-rebase/malloc"&gt;glibc malloc source&lt;/a&gt; &amp;mdash; this is some random github repo, so I have no idea how much its been fucked with&lt;/li&gt;
&lt;/ul&gt;

&lt;img src="http://feeds.feedburner.com/~r/JamesOnSoftware/~4/Yh8_0HkN2kY" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://jamesgolick.com/2013/5/15/memory-allocators-101.html</feedburner:origLink></entry>
  
  <entry>
    <title>Introducing The Real Talk Podcast</title>
    <link href="http://feedproxy.google.com/~r/JamesOnSoftware/~3/K8mZVK2PInw/introducing-the-real-talk-podcast.html" />
    <id>tag:jamesgolick.com,2013-04-28:1367209602</id>
    <updated>2013-04-28T21:26:42-07:00</updated>
    <content type="html">&lt;p&gt;[Joe Damato] and I have released the inaugural episode of our new, highly technical podcast realtalk.io.&lt;/p&gt;

&lt;p&gt;We will be doing frequent technical deep dives and releasing our conversations raw and unedited with all errors, omissions, awkward pauses, and curse words intact.&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href="http://realtalk.io"&gt;website&lt;/a&gt;, &lt;a href="tune%20in"&gt;soundcloud&lt;/a&gt;, and &lt;a href="http://feeds.feedburner.com/realtalkio"&gt;subscribe&lt;/a&gt;.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/JamesOnSoftware/~4/K8mZVK2PInw" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://jamesgolick.com/2013/4/28/introducing-the-real-talk-podcast.html</feedburner:origLink></entry>
  
  <entry>
    <title>MRI's Method Caches</title>
    <link href="http://feedproxy.google.com/~r/JamesOnSoftware/~3/YUhhYm647ic/mris-method-caches.html" />
    <id>tag:jamesgolick.com,2013-04-14:1365955274</id>
    <updated>2013-04-14T09:01:14-07:00</updated>
    <content type="html">&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Method resolution is expensive, so method caches are crucial to invocation performance.&lt;/li&gt;
&lt;li&gt;Your Ruby code probably calls methods kind of often, so invocation performance matters.&lt;/li&gt;
&lt;li&gt;MRI's method cache invalidation strategy is quite naive, leading to very low hit rates in most Ruby code.&lt;/li&gt;
&lt;li&gt;I wrote &lt;a href="https://github.com/jamesgolick/ruby/tree/jamesgolick"&gt;some patches&lt;/a&gt; that substantially improve the situation.&lt;/li&gt;
&lt;li&gt;This blog post is surprisingly uninflammatory.&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;The Long Version&lt;/h3&gt;

&lt;p&gt;One of MRI's big performance problems is that method cache expiry is global. That is, any time you make a change to any class anywhere, the entire VM's method caches get busted at the same time. This is why you'll frequently hear people saying that "calling &lt;code&gt;Object#extend&lt;/code&gt; is bad".&lt;/p&gt;

&lt;p&gt;Actually, it's not just &lt;code&gt;Object#extend&lt;/code&gt;. &lt;a href="https://twitter.com/charliesome"&gt;Charlie Somerville&lt;/a&gt; put together what I believe to be an &lt;a href="http://charlie.bz/blog/things-that-clear-rubys-method-cache"&gt;exhaustive list&lt;/a&gt; of things that clear MRI's method caches. Method cache busting is so pervasive that it's almost impossible to avoid using somebody's code that does it. Disaster.&lt;/p&gt;

&lt;p&gt;Let's back up for a second, though. What is a method cache and why are they important?&lt;/p&gt;

&lt;h3&gt;Method Cache Basics&lt;/h3&gt;

&lt;p&gt;Take the following class hierarchy:&lt;/p&gt;

&lt;script src="https://gist.github.com/jamesgolick/5347185.js"&gt;&lt;/script&gt;


&lt;p&gt;Internally, MRI stores methods in a hash table on the &lt;a href="https://github.com/jamesgolick/ruby/blob/094f2f438ae79f0b9afe9b4f5966c5bf1a6a3d9c/include/ruby/ruby.h#L630"&gt;&lt;code&gt;RClass struct&lt;/code&gt;&lt;/a&gt;. When you call an inherited method on a descendent, MRI has to walk up the class hierarchy to find it, checking the method table at each step to see if there's anything there to call.&lt;/p&gt;

&lt;p&gt;So, in our above example, if we wanted to call &lt;code&gt;hello&lt;/code&gt; on an instance of &lt;code&gt;E&lt;/code&gt;, MRI would have to execute method lookups on &lt;code&gt;E&lt;/code&gt;, &lt;code&gt;D&lt;/code&gt;, &lt;code&gt;C&lt;/code&gt;, &lt;code&gt;B&lt;/code&gt;, only to finally find the method in &lt;code&gt;A&lt;/code&gt;'s method table.&lt;/p&gt;

&lt;p&gt;It turns out that method resolution is actually quite expensive, which is why method caches exist. Rather than resolving a method each time we want to call it, we cache a reference to the method somewhere we can get to it cheaply, substantially reducing the cost of subsequent invocations.&lt;/p&gt;

&lt;p&gt;But Ruby is dynamic, so those caches can't necessarily live forever. If we call &lt;code&gt;String#gsub&lt;/code&gt;, for example, and then &lt;code&gt;undef&lt;/code&gt; it without expiring the method cache, it'll still be reachable. Cache invalidation is hard, as we know, so MRI takes a somewhat brute force approach.&lt;/p&gt;

&lt;h3&gt;How MRI's Method Caches Work&lt;/h3&gt;

&lt;p&gt;Currently, MRI has two types of method caches. Ruby code is compiled down to MRI's &lt;a href="https://github.com/jamesgolick/ruby/blob/094f2f438ae79f0b9afe9b4f5966c5bf1a6a3d9c/insns.def"&gt;instructions&lt;/a&gt;. Each instruction has some data associated with it. The &lt;a href="https://github.com/jamesgolick/ruby/blob/094f2f438ae79f0b9afe9b4f5966c5bf1a6a3d9c/insns.def#L998"&gt;send&lt;/a&gt; instruction has an &lt;a href="https://github.com/jamesgolick/ruby/blob/094f2f438ae79f0b9afe9b4f5966c5bf1a6a3d9c/vm_core.h#L128"&gt;&lt;code&gt;iseq_inline_cache_entry&lt;/code&gt;&lt;/a&gt;, which acts as an inline method cache.&lt;/p&gt;

&lt;p&gt;You can read the logic &lt;a href="https://github.com/jamesgolick/ruby/blob/094f2f438ae79f0b9afe9b4f5966c5bf1a6a3d9c/vm_insnhelper.c#L1369"&gt;here&lt;/a&gt;. Basically, it works like this: look at the inline cache. &lt;a href="https://github.com/jamesgolick/ruby/blob/094f2f438ae79f0b9afe9b4f5966c5bf1a6a3d9c/vm_insnhelper.c#L1374-L1375"&gt;If it's valid&lt;/a&gt;, &lt;a href="https://github.com/jamesgolick/ruby/blob/094f2f438ae79f0b9afe9b4f5966c5bf1a6a3d9c/vm_insnhelper.c#L1376"&gt;use it&lt;/a&gt;. If not, &lt;a href="https://github.com/jamesgolick/ruby/blob/094f2f438ae79f0b9afe9b4f5966c5bf1a6a3d9c/vm_insnhelper.c#L1379-1382"&gt;go actually look up the method and cache it&lt;/a&gt;. Pretty much exactly what you'd expect.&lt;/p&gt;

&lt;p&gt;In the case of an inline instruction cache miss, there's actually a &lt;a href="https://github.com/jamesgolick/ruby/blob/094f2f438ae79f0b9afe9b4f5966c5bf1a6a3d9c/vm_method.c#L25"&gt;secondary, global method cache&lt;/a&gt;. Oddly, though, the global method cache is limited to &lt;a href="https://github.com/jamesgolick/ruby/blob/094f2f438ae79f0b9afe9b4f5966c5bf1a6a3d9c/vm_method.c#L8"&gt;2048&lt;/a&gt; entries, and its semantics for deciding what to keep and what to dump are &lt;a href="https://github.com/jamesgolick/ruby/blob/094f2f438ae79f0b9afe9b4f5966c5bf1a6a3d9c/vm_method.c#L10"&gt;essentially random&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It's not as unlikely as you might hope for two methods in a tight loop to be clobbering each others' entries in the global method cache table.&lt;/p&gt;

&lt;p&gt;Both caches' entries have a field that stores the &lt;a href="https://github.com/jamesgolick/ruby/blob/094f2f438ae79f0b9afe9b4f5966c5bf1a6a3d9c/vm_insnhelper.h#L224"&gt;&lt;code&gt;ruby_vm_global_state_version&lt;/code&gt;&lt;/a&gt; from when they were filled. A cache entry is considered valid if it matches the &lt;code&gt;klass&lt;/code&gt; pointer and the current &lt;code&gt;ruby_vm_global_state_version&lt;/code&gt;. So, incrementing the global state version by &lt;code&gt;1&lt;/code&gt; invalidates all of the inline instruction caches as well as the global method cache.&lt;/p&gt;

&lt;p&gt;This has the effect of making invalidation very cheap, but far reaching. Whenever you make a change to any class, call extend, or do any of the other things detailed in &lt;a href="http://charlie.bz/blog/things-that-clear-rubys-method-cache"&gt;Charlie's article&lt;/a&gt;, all of the method caches that have built up since your program started become invalid and you have to repay the cost of method resolution all over again.&lt;/p&gt;

&lt;h3&gt;The Numbers&lt;/h3&gt;

&lt;p&gt;After years of complaining about Ruby's method caching behaviour, I finally decided to instrument it a couple of weeks ago. I found that for our application, the method cache was being invalidated at least 20 times per request, and that around 10% of our request profile was spent performing method resolution. For our application, the cost of Ruby's global method cache invalidation was extremely high.&lt;/p&gt;

&lt;p&gt;The average cost of each method resolution for our production application is around one microsecond, which doesn't sound like a lot but it adds up. We were seeing at least 8000 cache misses per request, totalling 8ms or more.&lt;/p&gt;

&lt;p&gt;As part of my instrumentation patchset, I also created a mechanism that logs a stacktrace each time the method cache is invalidated. I found that the majority of invalidations in our app were from inside of ActiveRecord - &lt;a href="https://github.com/rails/rails/pull/10058"&gt;some&lt;/a&gt; easier to fix than others. Many were also caused by random gems doing things like instantiating &lt;code&gt;OpenStruct&lt;/code&gt;s.&lt;/p&gt;

&lt;p&gt;At this point, it started seeming somewhat impractical to go and patch rails and all these other gems that I use, so I decided to investigate the amount of effort that would be required to actually solve the problem in MRI.&lt;/p&gt;

&lt;h3&gt;Hierarchical Method Cache Invalidation&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;I'll be using &lt;code&gt;class&lt;/code&gt; to mean &lt;code&gt;class&lt;/code&gt; or &lt;code&gt;module&lt;/code&gt; here, since they have the same backing structure in the VM.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Ruby's inheritance tree is a &lt;a href="http://en.wikipedia.org/wiki/Directed_acyclic_graph"&gt;directed acyclic graph&lt;/a&gt;, and the semantics of method resolution mean that a change to a given class only affects it and its descendents. So, in an ideal scenario, we would only need to invalidate the method caches for those branches of the inheritance tree.&lt;/p&gt;

&lt;p&gt;I've written a patch for MRI that implements such an algorithm, and it's currently serving 100% of our production traffic. We've seen around a 9% reduction in average latency with this patch. Others who've tried it haven't seen such big jumps. Your mileage may vary.&lt;/p&gt;

&lt;p&gt;The algorithm is actually quite simple (credit to Charlie Nutter / JRuby for the idea). We have a 64 bit global (one per VM instance) sequence that is monotonically increasing. Every time we alloc a new &lt;code&gt;RClass&lt;/code&gt;, we increment the sequence, and assign the class a unique value.&lt;/p&gt;

&lt;p&gt;Method and inline cache entries are tagged with the class's sequence value when they're filled. When a class is modified, we traverse the class hierarchy downwards from the modification point, assigning each class a new sequence number.&lt;/p&gt;

&lt;p&gt;A method cache entry is considered valid if its sequence number matches the current sequence of the class. So, if the class or one of its parents has been modified since the cache entry was created, its class will have a new sequence number, and it will have therefore been invalidated.&lt;/p&gt;

&lt;h3&gt;Performance&lt;/h3&gt;

&lt;p&gt;For our application, this cache invalidation strategy substantially reduces the number of method cache misses we see in production and has reduced request latency by ~8-9%, but there &lt;em&gt;are&lt;/em&gt; tradeoffs involved. Since invalidation requires a graph traversal, it's a lot more expensive than the current strategy of merely incrementing an integer.&lt;/p&gt;

&lt;p&gt;If your application makes frequent modifications to classes and modules which have a large number of descendents, the cost of invalidation may outweigh the increase in method cache hit rate. That said, I would imagine that such modifications are relatively uncommon and should be considered a bad practice either way.&lt;/p&gt;

&lt;p&gt;It's also worth noting here that while this patch will likely improve the performance of apps that employ the strategy of extending arbitrary objects to implement &lt;a href="http://en.wikipedia.org/wiki/Data,_context_and_interaction"&gt;DCI&lt;/a&gt;, that pattern is still a performance problem, because it creates tons of one-off metaclasses whose methods wind up being mostly uncacheable.&lt;/p&gt;

&lt;h3&gt;The Code&lt;/h3&gt;

&lt;p&gt;My patchset includes several things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Subclass tracking: &lt;code&gt;Class#subclasses&lt;/code&gt;, and &lt;code&gt;Module#included_in&lt;/code&gt;. Rails implements this with an O(n) traversal of &lt;code&gt;ObjectSpace&lt;/code&gt;. With my patches, that's no longer necessary.&lt;/li&gt;
&lt;li&gt;Hierarchical method cache invalidation: the subject of this whole article.&lt;/li&gt;
&lt;li&gt;Method cache instrumentation: &lt;code&gt;RubyVM::MethodCache&lt;/code&gt; has several useful singleton methods you may want to track, including &lt;code&gt;hits&lt;/code&gt;, &lt;code&gt;misses&lt;/code&gt;, &lt;code&gt;miss_time&lt;/code&gt;, &lt;code&gt;invalidation_time&lt;/code&gt;, etc.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;You can find the code in &lt;a href="https://github.com/jamesgolick/ruby/tree/jamesgolick"&gt;my branch&lt;/a&gt; or install it with &lt;code&gt;rvm install jamesgolick&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I am planning to submit these patches back upstream, but I have to port them to Ruby 2.0 first, so I guess that's my next project. Huge thanks and credit to &lt;a href="https://twitter.com/tmm1"&gt;Aman Gupta&lt;/a&gt;, &lt;a href="https://twitter.com/charliesome"&gt;Charlie Somerville&lt;/a&gt;, &lt;a href="https://twitter.com/headius"&gt;Charles Nutter&lt;/a&gt;, and &lt;a href="https://github.com/funny-falcon"&gt;funny-falcon&lt;/a&gt; for all their code, help, and testing!&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/JamesOnSoftware/~4/YUhhYm647ic" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://jamesgolick.com/2013/4/14/mris-method-caches.html</feedburner:origLink></entry>
  
  <entry>
    <title>The Cost of Ruby 1.9.3's GC::Profiler</title>
    <link href="http://feedproxy.google.com/~r/JamesOnSoftware/~3/XVpbdSZXQ44/the-cost-of-ruby-1.9.3-s-gc-profiler.html" />
    <id>tag:jamesgolick.com,2012-11-19:1353356029</id>
    <updated>2012-11-19T12:13:49-08:00</updated>
    <content type="html">&lt;p&gt;This is a long one, and y'all are busy I'm sure so here's the tl;dr:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you run ruby in production, you need to keep track of GC stats.&lt;/li&gt;
&lt;li&gt;Ruby 1.9.3's &lt;a href="http://www.ruby-doc.org/core-1.9.3/GC/Profiler.html"&gt;&lt;code&gt;GC::Profiler&lt;/code&gt;&lt;/a&gt; does a bunch of really weird shit.

&lt;ul&gt;
&lt;li&gt;It keeps a 104 byte sample of every GC run since it was enabled forever.&lt;/li&gt;
&lt;li&gt;Calling &lt;a href="http://www.ruby-doc.org/core-1.9.3/GC/Profiler.html#method-c-total_time"&gt;&lt;code&gt;GC::Profiler.total_time&lt;/code&gt;&lt;/a&gt; loops over every sample in memory to calculate the total.&lt;/li&gt;
&lt;li&gt;The space used to keep those samples in memory is &lt;strong&gt;never freed&lt;/strong&gt;. However, it does get reused when you call &lt;a href="http://www.ruby-doc.org/core-1.9.3/GC/Profiler.html#method-c-clear"&gt;&lt;code&gt;GC::Profiler.clear&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Therefore: if you are using &lt;code&gt;GC::Profiler&lt;/code&gt; in production, and you're not calling &lt;code&gt;GC::Profiler.clear&lt;/code&gt; regularly, you're leaking a substantial amount of memory (&amp;gt;1GB / machine for us), slowing down garbage collection somewhat, and the cost of retreiving the stats (&lt;code&gt;GC::Profiler.total_time&lt;/code&gt;) will continue to increase unbounded until the process is restarted&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;I am working on an alternative, low-overhead GC Profiler that is designed to be run in production. It's called &lt;code&gt;GC::BasicProfiler&lt;/code&gt;. You can find the patch &lt;a href="https://github.com/jamesgolick/ruby/commit/576cba1e79842f7c5ee80d3668958e1571da13d7#L0R3992"&gt;here&lt;/a&gt; and follow development &lt;a href="https://github.com/jamesgolick/ruby/commit/576cba1e79842f7c5ee80d3668958e1571da13d7#L0R3992"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Also, you may want to check out &lt;a href="https://github.com/thecodeshop/ruby/commits/tcs-ruby_1_9_3"&gt;this fork&lt;/a&gt; for some backports from ruby 2.0 &amp;mdash; including the COW-friendly garbage collector. Good stuff.&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;The Long Version&lt;/h3&gt;

&lt;p&gt;Ruby's GC is a steaming pile of shit &amp;mdash; but that's not news to anybody. If you're running ruby in production, tracking GC behaviour is essential so that you can minimize its effects on perceived performance (hence &lt;a href="http://unicorn.bogomips.org/Unicorn/OobGC.html"&gt;oob_gc&lt;/a&gt;, etc). Fortunately, Ruby 1.9.3 ships with &lt;a href="http://www.ruby-doc.org/core-1.9.3/GC/Profiler.html"&gt;GC::Profiler&lt;/a&gt;, which provides detailed instrumentation on GC runs.&lt;/p&gt;

&lt;p&gt;Over the last month or so, I've been working on some rails performance tooling. Last night, I noticed that requests with my instrumentation enabled were taking around an order of magnitude longer than those without it. Weird. So, I installed &lt;a href="https://github.com/tmm1/perftools.rb"&gt;perftools.rb&lt;/a&gt; and &lt;a href="https://github.com/bhb/rack-perftools_profiler"&gt;rack-perftools_profiler&lt;/a&gt; and got a really surprising result (irrelevant lines ommitted):&lt;/p&gt;

&lt;script src="https://gist.github.com/4113649.js?file=profile1.txt"&gt;&lt;/script&gt;


&lt;p&gt;Apparently calling &lt;code&gt;GC::Profiler.total_time&lt;/code&gt; is so slow that more than 50% of request time was spent in there? Is that actually possible? My instrumentation calls &lt;code&gt;GC::Profiler.total_time&lt;/code&gt; frequently under the assumption that it's inexpensive, but obviously that was a faulty assumption unless perftools.rb is wrong. Let's take a look at the implementation. (the code in context is &lt;a href="https://github.com/ruby/ruby/blob/ruby_1_9_3/gc.c#L3627-3647"&gt;here&lt;/a&gt;)&lt;/p&gt;

&lt;script src="https://gist.github.com/4113649.js?file=total_time.c"&gt;&lt;/script&gt;


&lt;p&gt;There's a loop in &lt;code&gt;total_time&lt;/code&gt;? What the fuck is going on here?&lt;/p&gt;

&lt;p&gt;Turns out that if you have &lt;code&gt;GC::Profiler&lt;/code&gt; enabled, the VM records a &lt;a href="https://github.com/ruby/ruby/blob/ruby_1_9_3/gc.c#L106-124"&gt;&lt;code&gt;gc_profile_record&lt;/code&gt;&lt;/a&gt; every time the garbage collector runs.&lt;/p&gt;

&lt;script src="https://gist.github.com/4113649.js?file=gc_profile_record.c"&gt;&lt;/script&gt;


&lt;p&gt;Then, when you call &lt;code&gt;total_time&lt;/code&gt;, it loops over all of the &lt;code&gt;gc_profile_record&lt;/code&gt;s that have been created in order to sum the total. According to my profile, &lt;code&gt;total_time&lt;/code&gt; was responsible for more than 50% of request time. How many &lt;code&gt;gc_profile_record&lt;/code&gt;s could there actually be?&lt;/p&gt;

&lt;script src="https://gist.github.com/4113649.js?file=gdb1.txt"&gt;&lt;/script&gt;


&lt;p&gt;Oh. Well I guess that explains that. So &amp;mdash; stupid question, but when do these things get freed? Apparently only in &lt;a href="https://github.com/ruby/ruby/blob/ruby_1_9_3/gc.c#L479-506"&gt;&lt;code&gt;rb_objspace_free&lt;/code&gt;&lt;/a&gt; which only ever gets called &lt;a href="https://github.com/ruby/ruby/blob/ruby_1_9_3/vm.c#L1624"&gt;when the VM terminates&lt;/a&gt;, so the answer is &lt;em&gt;never&lt;/em&gt;. Cool.&lt;/p&gt;

&lt;p&gt;Upon further investigation, it's pretty clear that this whole system was designed with the expectation that you'd call &lt;a href="http://www.ruby-doc.org/core-1.9.3/GC/Profiler.html#method-c-clear"&gt;&lt;code&gt;GC::Profiler.clear&lt;/code&gt;&lt;/a&gt; regularly. The profiler keeps its samples in an array at &lt;code&gt;objspace-&amp;gt;profile.record&lt;/code&gt; that it &lt;a href="https://github.com/ruby/ruby/blob/ruby_1_9_3/gc.c#L168-171"&gt;increases in size by 1000&lt;/a&gt; every time it runs out of space.&lt;/p&gt;

&lt;script src="https://gist.github.com/4113649.js?file=sample-array-size-increase.c"&gt;&lt;/script&gt;


&lt;p&gt;&lt;strong&gt;If you don't call &lt;code&gt;GC::Profiler.clear&lt;/code&gt;, that array keeps increasing in size forever.&lt;/strong&gt; This is not documented. Obviously.&lt;/p&gt;

&lt;p&gt;On our production systems, unicorn workers that have been running for a few hours had an &lt;code&gt;objspace-&amp;gt;profile.size&lt;/code&gt; of around 350000. On x86_64, &lt;code&gt;sizeof(struct gc_profile_record)&lt;/code&gt; == 104, so around 35MB of overhead per process multiplied by 25 processes per machine for a total of nearly 1GB per machine &amp;mdash; after only 3 hours. That will grow forever until the processes are restarted.&lt;/p&gt;

&lt;p&gt;That's the bad news.&lt;/p&gt;

&lt;h3&gt;The good news: GC::BasicProfiler&lt;/h3&gt;

&lt;p&gt;Ultimately, &lt;code&gt;GC::Profiler&lt;/code&gt; was designed to provide detailed information about every GC run &amp;mdash; probably for the VM implementers to use when tuning the GC (haha yeah right). But seriously, somebody probably wants that, but it isn't me. For those of us who simply want to keep track of GC stats on our production applications, we need a less expensive implementation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/jamesgolick/ruby/commit/576cba1e79842f7c5ee80d3668958e1571da13d7"&gt;&lt;code&gt;GC::BasicProfiler&lt;/code&gt;&lt;/a&gt; is a first step towards something like that. It has a very simple, low-overhead implementation, and &lt;a href="https://github.com/jamesgolick/ruby/blob/tcs-ruby_1_9_3/gc.c#L3915-3921"&gt;&lt;code&gt;GC::BasicProfiler.total_time&lt;/code&gt;&lt;/a&gt; works the way you might expect.&lt;/p&gt;

&lt;script src="https://gist.github.com/4113649.js?file=gc_basic_profile_total_time.c"&gt;&lt;/script&gt;


&lt;p&gt;Enabling and disabling &lt;code&gt;BasicProfiler&lt;/code&gt; works exactly the same as &lt;code&gt;Profiler&lt;/code&gt; but you don't need to call &lt;code&gt;clear&lt;/code&gt; to avoid leaking memory. In fact, there's no &lt;code&gt;clear&lt;/code&gt; method at all.&lt;/p&gt;

&lt;p&gt;If you're interested in following the development of this patch, it'll be &lt;a href="https://github.com/jamesgolick/ruby/blob/tcs-ruby_1_9_3"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;One more lol for the road&lt;/h3&gt;

&lt;script src="https://gist.github.com/4113649.js?file=lol.txt"&gt;&lt;/script&gt;


&lt;p&gt;It's the little things. Don't worry, though &amp;mdash; fixed in &lt;code&gt;BasicProfiler&lt;/code&gt;.&lt;/p&gt;

&lt;script src="https://gist.github.com/4113649.js?file=trulth.txt"&gt;&lt;/script&gt;

&lt;img src="http://feeds.feedburner.com/~r/JamesOnSoftware/~4/XVpbdSZXQ44" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://jamesgolick.com/2012/11/19/the-cost-of-ruby-1.9.3-s-gc-profiler.html</feedburner:origLink></entry>
  
  <entry>
    <title>Moving On</title>
    <link href="http://feedproxy.google.com/~r/JamesOnSoftware/~3/y7mVuVbQrhY/moving-on.html" />
    <id>tag:jamesgolick.com,2012-09-05:1346869989</id>
    <updated>2012-09-05T11:33:09-07:00</updated>
    <content type="html">&lt;p&gt;Almost four years ago, I was speaking at a software engineering conference in Montreal. At the speakers lunches, I met up with one of the founders of the conference, and we immediately hit it off. He told me about his growing company, and a month later, the consulting firm I'd been running was closed, our office vacant, and I had joined BitLove (the company that runs FetLife &amp;mdash; which was then known as Protose) as CTO. It's bittersweet to announce that as of a few weeks ago, I've decided to move on.&lt;/p&gt;

&lt;p&gt;Over the last four years, I played a huge role in every part of running FetLife. In addition to being responsible for our technology, I made business and product decisions, helped design features, wrote copy, communicated with the community, worked on support stuff, and more. I've always had an interest in &amp;mdash; and read about &amp;mdash; all this stuff, but actually having the opportunity to participate in it, make real mistakes, and have real successes was incredible.&lt;/p&gt;

&lt;p&gt;As a technologist, it's hard to imagine somewhere I could've grown more quickly. When I joined, I knew a few things about writing Rails apps. While I was there, I got the opportunity to do everything with every part of the stack. I learned how to make it all run in production for a big user base and a lot of traffic, with a tiny team.&lt;/p&gt;

&lt;p&gt;When I joined, we had ~100k users (I got user ID 129315) and Rails was serving around 50 million requests a month. Since then, our user base grew to over 1.5 million users &amp;mdash; our traffic to almost 500 million pageviews a month and over 1 billion Rails requests (not to mention requests to other services like chat). We did it with an engineering team that hovered around 2 people (including me).&lt;/p&gt;

&lt;p&gt;I'm really proud of the engineering work I did at BitLove. We operated an extremely high throughput MySQL installation, and I was able to &lt;a href="/2012/7/18/innodb-kernel-mutex-contention-and-memory-allocators.html"&gt;solve&lt;/a&gt; various InnoDB scalability limitations that we encountered. I implemented a web-based IM system (similar to Facebook Chat) that hundreds of thousands of people use to send tens of millions of messages every month. I built everything from the presence and routing implementation in Erlang to the UI in Javascript. I also designed and built an extremely stable and fast activity stream architecture, almost single-handedly ran operations for the ~40 machine cluster in around 2 hours a week, and perhaps most importantly, I &lt;a href="/2012/7/7/how-to-lose-100-pounds.html"&gt;dramatically improved my health&lt;/a&gt; in the process.&lt;/p&gt;

&lt;p&gt;The work I did at BitLove certainly represents the biggest challenges and accomplishments of my life and career to date. It was a wild ride with many ups and downs &amp;mdash; everything they promised a startup would be. So it was incredibly difficult to leave a growing and successful company that I had a big hand in building. But it's time for new challenges.&lt;/p&gt;

&lt;h3&gt;So what's next?&lt;/h3&gt;

&lt;p&gt;I've got a couple of really amazing opportunities on the table right now that I'm super excited about. Because I get so heavily invested in my work and like to stick around companies for many years, this is a very big decision, and I'm not taking it lightly. I'm certainly open to hearing about any opportunities you think I might be a fit for, so do get in touch!&lt;/p&gt;

&lt;h3&gt;Consulting&lt;/h3&gt;

&lt;p&gt;In the meantime, I'm available for consulting work. If your company needs help with any of the kinds of things I discussed above &amp;mdash; especially performance and scalability stuff, we should chat. &lt;a href="mailto:jamesgolick@gmail.com"&gt;Email me&lt;/a&gt;.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/JamesOnSoftware/~4/y7mVuVbQrhY" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://jamesgolick.com/2012/9/5/moving-on.html</feedburner:origLink></entry>
  
  <entry>
    <title>InnoDB kernel_mutex Contention and Memory Allocators</title>
    <link href="http://feedproxy.google.com/~r/JamesOnSoftware/~3/roj_zkJ1QCc/innodb-kernel-mutex-contention-and-memory-allocators.html" />
    <id>tag:jamesgolick.com,2012-07-18:1342651301</id>
    <updated>2012-07-18T15:41:41-07:00</updated>
    <content type="html">&lt;p&gt;&lt;em&gt;tl;dr: We found that in our case, contention for InnoDB's &lt;code&gt;kernel_mutex&lt;/code&gt; was caused by contention for a malloc arena lock. We fixed it by moving to tcmalloc. Instructions on how to do that &lt;a href="#preload-instructions"&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We recently doubled the IO throughput capacity of our near-capacity MySQL master by adding a second RAID controller, and striping the two together. As we were climbing up to a record throughput peak the following weekend, there was a major db latency spike (&gt;3x).&lt;/p&gt;

&lt;p&gt;A look at SHOW ENGINE INNODB STATUS indicated quite a bit of contention for InnoDB's &lt;code&gt;kernel_mutex&lt;/code&gt;.&lt;/p&gt;

&lt;script src="https://gist.github.com/3146976.js?file=innodb-status"&gt;&lt;/script&gt;


&lt;p&gt;&lt;em&gt;Note: the contention I observed was actually considerably worse than what I pasted above, but I didn't save the output, so this is all I have to show.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;kernel_mutex&lt;/code&gt; has &lt;a href="http://blogs.innodb.com/wp/2011/04/mysql-5-6-innodb-scalability-fix-kernel-mutex-removed/"&gt;been removed&lt;/a&gt; in MySQL 5.6, but that's unfortunately not ready for production. As a workaround, the Percona guys &lt;a href="http://www.mysqlperformanceblog.com/2011/12/02/kernel_mutex-problem-or-double-throughput-with-single-variable/"&gt;suggest&lt;/a&gt; modifying &lt;code&gt;innodb_sync_spin_loops&lt;/code&gt;, which had absolutely no effect for our workload. They also &lt;a href="http://www.mysqlperformanceblog.com/2011/12/02/kernel_mutex-problem-cont-or-triple-your-throughput/"&gt;suggest&lt;/a&gt; lowering &lt;code&gt;innodb_thread_concurrency&lt;/code&gt;, which did reduce contention, but it also reduced concurrency, which left us right back where we started.&lt;/p&gt;

&lt;p&gt;I pulled out my &lt;a href="http://poormansprofiler.org/"&gt;poor man's profiler&lt;/a&gt; to see if I could figure out exactly what was holding the lock and what it was doing with it. Here are the stacks I got.&lt;/p&gt;

&lt;script src="https://gist.github.com/3146976.js?file=gistfile1.txt"&gt;&lt;/script&gt;


&lt;p&gt;Immediately, we can see that lots of stuff is waiting on locks inside of malloc/free-related functions. After reading through the MySQL sources, it was clear that this thread was holding the &lt;code&gt;kernel_mutex&lt;/code&gt;.&lt;/p&gt;

&lt;script src="https://gist.github.com/3146976.js?file=lock-holder"&gt;&lt;/script&gt;


&lt;p&gt;&lt;em&gt;Note: all links to glibc code below are specifically to the version that we are using.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Reading through &lt;a href="https://github.com/cdepillabout/glibc/blob/e28c88707ef0529593fccedf1a94c3fce3df0ef3/malloc/malloc.c#L4763"&gt;&lt;code&gt;_int_free&lt;/code&gt;&lt;/a&gt; in the glibc sources seemed to indicate that there was only &lt;a href="https://github.com/cdepillabout/glibc/blob/e28c88707ef0529593fccedf1a94c3fce3df0ef3/malloc/malloc.c#L2362"&gt;one lock&lt;/a&gt; (&lt;code&gt;malloc_state-&amp;gt;mutex&lt;/code&gt;) in there.&lt;/p&gt;

&lt;p&gt;Our glibc &lt;em&gt;was&lt;/em&gt; built with &lt;a href="https://github.com/cdepillabout/glibc/blob/e28c88707ef0529593fccedf1a94c3fce3df0ef3/malloc/Makefile#L128"&gt;&lt;code&gt;--enable-experimental-malloc&lt;/code&gt;&lt;/a&gt;, which is supposed to &lt;a href="https://github.com/cdepillabout/glibc/blob/e28c88707ef0529593fccedf1a94c3fce3df0ef3/malloc/arena.c"&gt;reduce contention&lt;/a&gt; by dividing the heap in to multiple arenas, each with their own lock (at least as far as I understand it &amp;mdash; and I'm far from an expert).&lt;/p&gt;

&lt;p&gt;&lt;a href="http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html"&gt;tcmalloc&lt;/a&gt; is a malloc implementation from &lt;a href="http://google-perftools.googlecode.com/"&gt;google-perftools&lt;/a&gt; that satisfies small malloc requests without locks by using a per-thread cache. Using tcmalloc should mean that the allocations inside the &lt;code&gt;kernel_mutex&lt;/code&gt; are (at least mostly) lockless.&lt;/p&gt;

&lt;p&gt;&lt;a name="preload-instructions"&gt;&lt;/a&gt;
Here's how to &lt;code&gt;LD_PRELOAD&lt;/code&gt; tcmalloc.&lt;/p&gt;

&lt;p&gt;Put this in &lt;code&gt;/usr/local/bin/mysqld_wrapper&lt;/code&gt;:&lt;/p&gt;

&lt;script src="https://gist.github.com/3146976.js?file=mysqld_wrapper.sh"&gt;&lt;/script&gt;


&lt;p&gt;Put this fragment in my.cnf:&lt;/p&gt;

&lt;script src="https://gist.github.com/3146976.js?file=my.cnf"&gt;&lt;/script&gt;


&lt;p&gt;Since we moved to tcmalloc, all of the contention for the &lt;code&gt;kernel_mutex&lt;/code&gt; has completely disappeared. We're also seeing better performance overall and using ~15% less memory in total. This fix probably isn't applicable in all cases, but if you're seeing &lt;code&gt;kernel_mutex&lt;/code&gt; contention, it's worth using your poor man's profiler to see whether swapping allocators might help.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/JamesOnSoftware/~4/roj_zkJ1QCc" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://jamesgolick.com/2012/7/18/innodb-kernel-mutex-contention-and-memory-allocators.html</feedburner:origLink></entry>
  
  <entry>
    <title>How to Lose 100 Pounds</title>
    <link href="http://feedproxy.google.com/~r/JamesOnSoftware/~3/kYHno-QlxeM/how-to-lose-100-pounds.html" />
    <id>tag:jamesgolick.com,2012-07-07:1341683833</id>
    <updated>2012-07-07T10:57:13-07:00</updated>
    <content type="html">&lt;p&gt;I've struggled with my weight for nearly my entire life. I went from being chubby in elementary school to overweight in high school to obese in university. At my biggest, I was almost 280 pounds (I'm 5'6"). Finally, around 5 years ago, I got a spark of inspiration that ultimately led to me dropping a total of 110 pounds (and counting). Here's how I did it.&lt;/p&gt;

&lt;p&gt;But first, the obligatory before and after shots:&lt;/p&gt;

&lt;p&gt;
  &lt;img src="http://farm2.staticflickr.com/1410/1476468549_ed43518cbc_z.jpg?zz=1" width="350" /&gt;
  &lt;img src="https://flpics0.a.ssl.fastly.net/129/129315/0004b750-74d0-e92b-397e-2b8ad87ed1f7_720.jpg" width=350 /&gt;
&lt;/p&gt;


&lt;h3&gt;Motivation&lt;/h3&gt;

&lt;p&gt;Losing weight requires an enormous amount of motivation. You're going to have to change your lifestyle and make real sacrifices. It's going to be hard. Motivation will help you continue to justify the changes you've made, and prevent you from slipping back in to old habits.&lt;/p&gt;

&lt;p&gt;Funny enough, I actually got my first seed of motivation from pneumonia. I was 278 pounds at the time. After three horrible, bed-ridden weeks, I was down to 258. It was painful, but it taught me the most important weight loss lesson of all: it's possible.&lt;/p&gt;

&lt;p&gt;Like a lot of other kids from my generation, I grew up overweight. When you can't remember a time when you weren't, being fat is a part of your identity. So, silly as it sounds, I think there was a part of me that believed that weight loss was impossible on some level &amp;mdash; or at least that the amount of weight I needed to lose was insurmountable.&lt;/p&gt;

&lt;p&gt;If you only take one thing away from this article, let it be that. You can lose weight. No matter how fucked up your metabolism (more on that later), no matter how long you've been overweight, it &lt;i&gt;is&lt;/i&gt; possible.&lt;/p&gt;

&lt;h3&gt;Strategies&lt;/h3&gt;

&lt;p&gt;I'm going to talk about a few of the strategies, diets, and other random things that I have tried because I think people will find them interesting. But I'll give you an easy way out of reading the rest of this article just in case you're already bored. Ready? Here it is.&lt;/p&gt;

&lt;p&gt;STOP EATING PROCESSED FOOD. THAT INCLUDES SUGAR, WHEAT PRODUCTS, SUGAR REPLACEMENTS LIKE SUCRALOSE, ASPARTAME, ETC, AND EVERYTHING ELSE YOU'RE THINKING OF THAT MIGHT BE AN EXCEPTION. EXCEPT STEVIA. YOU CAN HAVE STEVIA.&lt;/p&gt;

&lt;p&gt;Ok, so with that yelling out of the way, here's a bit about my journey.&lt;/p&gt;

&lt;h3&gt;Briefly On Exercise&lt;/h3&gt;

&lt;p&gt;I'm going to keep this short. Exercise has never helped me lose weight. For much of the time that I was grossly over weight, I was also extremely physically active, often whitewater kayaking or downhill skiing for several hours 4 or 5 days a week, and continuing to put on fat. Despite conventional wisdom to the contrary, exercise isn't an effective weight loss strategy &lt;i&gt;for me&lt;/i&gt;.&lt;/p&gt;

&lt;h3&gt;Portion Control&lt;/h3&gt;

&lt;p&gt;After I lost the pneumonia weight, I was literally terrified that I might put it back on. So I decided to try eating less. I ate all the same things, but avoided going back for seconds. I ate pasta, pizza, and dessert until I was full, but not stuffed. I lost 20 more pounds over a few months. Then, it leveled off.&lt;/p&gt;

&lt;p&gt;That, really, is the story of my weight loss effort. Strategies, and diets that work for a while and then plateau. Sometimes, it's possible to break through a plateau, but other times, you need to up your game with better eating.&lt;/p&gt;

&lt;p&gt;I tried for another six or so months to break through the portion control plateau. It never happened. I was actually feeling pretty good about where I was, though, so I didn't really make much of an effort to progress for a few more months.&lt;/p&gt;

&lt;h3&gt;Lower Carb Diet&lt;/h3&gt;

&lt;p&gt;Shortly after moving from Montreal to Vancouver, I started seeing a personal trainer, hoping to accelerate my progress on the scale and in the gym. She had me keep a food journal, and immediately picked up on the amount of carbs that I was eating back then. I was vegetarian at the time, and I was eating tons of breads and pastas. She told me to eat more vegetables, and tofu, and watch my carb intake. I lost about 20 pounds before plateauing &lt;i&gt;hard&lt;/i&gt;.&lt;/p&gt;

&lt;p&gt;On this diet, I was still eating bread, pasta, and sugar, just less. And after a while, I found it impossible to continue losing weight. So I started looking for other solutions.&lt;/p&gt;

&lt;h3&gt;Eat to Live&lt;/h3&gt;

&lt;p&gt;Eat to Live is an all vegan diet designed by Dr. Joel Fuhrman. Only fruits, vegetables, legumes, nuts, and seeds are allowed; no oils, dairy, sugar, or even juices (bit of an exaggeration, but for our purposes this is accurate enough) are permitted. I had very mixed results on Eat to Live. I did lose about 15 pounds, but I had a very difficult time keeping it off, and found it very difficult to eat enough food to feel full for more than an hour at a time. I found that I was constantly eating, and still often feeling starvingly hungry.&lt;/p&gt;

&lt;p&gt;With that said, I actually know a lot of people who've had great success on ETL including my ex-girlfriend, who I was living with at the time (so we were eating nearly identically, though me significantly more than her), and my good friend &lt;a href="http://gilesbowkett.blogspot.com"&gt;Giles&lt;/a&gt;, who actually introduced me to the book. Which brings me to another one of my weight loss conclusions.&lt;/p&gt;

&lt;p&gt;Everybody's body is different. Some people have amazing success on a diet, while others are incapable of losing weight. I no longer believe that there's one perfect diet out there that suits everybody. Your mileage will vary with every approach.&lt;/p&gt;

&lt;p&gt;The only consistent thing I've been able to identify across all my friends and family who've lost weight is avoiding processed foods.&lt;/p&gt;

&lt;h3&gt;Psoriasis and acne&lt;/h3&gt;

&lt;p&gt;An interesting aside here is that ETL led me discover that it's possible to control psoriasis with diet. The medical community doesn't seem to be aware of this, but I am completely psoriasis free after years of being covered in it.&lt;/p&gt;

&lt;p&gt;At first, I thought that it was the greens that caused my skin to clear up, but since then, I've realized that it's a balance of factors. Greens &lt;i&gt;do&lt;/i&gt; help, but merely avoiding processed foods is enough to keep me completely psoriasis free. That being said, I started drinking coffee again a little while ago, and noticed that a small amount of psoriasis came back. Upping my intake of greens seems to make it clear up. So, it's a bit of a balancing act.&lt;/p&gt;

&lt;p&gt;Oh also, I'm extremely prone to acne, but I've found that avoiding high &lt;a href="http://www.glycemicindex.com/"&gt;glycemic index&lt;/a&gt; foods keeps my face and body completely clear of pimples.&lt;/p&gt;

&lt;h3&gt;On Vegetarianism&lt;/h3&gt;

&lt;p&gt;I'm definitely going to get hate mail for this, but here goes anyway. I was vegetarian for most of my weight loss journey. My conclusion was ultimately that vegetarianism made it significantly more difficult to lose weight. Here's why.&lt;/p&gt;

&lt;p&gt;At home, cooking my own meals from my own groceries, vegetarianism was perfectly fine. But, every time I ate in a restaurant, on the street, or even at a friend's place, my options were nearly invariably some combination of pasta, bread, and sugar. I probably have the shittiest metabolism in the world, but when I eat that stuff, I gain weight. Lots of it.&lt;/p&gt;

&lt;p&gt;I really enjoy eating in restaurants, which made the whole thing all the more difficult. During the whole time that I was on Eat to Live, I would painstakingly lose 7 or 8 pounds by religiously sticking to the diet for a month, then travel to a conference for a week and gain 15. It was frustrating to say the least, which led me to the very difficult conclusion that I needed to at least try breaking my nearly ten years of vegetarianism.&lt;/p&gt;

&lt;h3&gt;My Current Diet&lt;/h3&gt;

&lt;p&gt;My current diet is really simple: no processed carbs (that includes 'carbless' sugar replacements except stevia). I go through periods where I eat a ton of fruits and vegetables, but lately, I've mostly been eating meat and fish.&lt;/p&gt;

&lt;p&gt;Do I miss chocolate and ice cream? Definitely. But I eat guilt-free bacon or chicken wings whenever I feel like it, and seeing results makes the sacrifice more than worthwhile.&lt;/p&gt;

&lt;p&gt;This diet means that when I go out to eat (which I do regularly), I can have a steak without feeling guilty. I tell people that I'm allergic to sugar and flour, which gives me a reasonable excuse for being the pain in the ass guy who has to ask the waiter about the ingredients in every dish on the menu. I'd encourage you to tell similar lies if they help you stick to a diet.&lt;/p&gt;

&lt;p&gt;There've been a few periods over the last year and a half where I've started eating bread again and gained back a bunch of weight. In April of this year, though, I finally committed to this diet as a more permanent lifestyle, and have been ever since. I've dropped around 40 pounds since then, and I'm not stopping until I can see my abs.&lt;/p&gt;

&lt;h3&gt;Conclusions&lt;/h3&gt;

&lt;p&gt;Everybody's body is different. Your friends may have had success with diet X, but you may not. Don't let that discourage you. You'll find something that works.&lt;/p&gt;

&lt;p&gt;The best diet is the one that you can stick to, even if the weight loss is slower. If a diet fights against your lifestyle, it's going to be that much harder to maintain. That was my problem with ETL. I love to eat out and I travel a lot, so I couldn't stick to it. And at the end of the day, I didn't lose weight. The less you have to change your lifestyle to accomplish your goals, the better your chances of success.&lt;/p&gt;

&lt;p&gt;You can lose weight, still enjoy the food you eat, and even go out to restaurants while you do it. Obviously, you won't be able to eat everything you're eating now, because if you could, you'd already be thin. But, it'll be a sacrifice worth making. The best thing you've ever done.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/JamesOnSoftware/~4/kYHno-QlxeM" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://jamesgolick.com/2012/7/7/how-to-lose-100-pounds.html</feedburner:origLink></entry>
  
  <entry>
    <title>Objectify: A Better Way to Build Rails Applications</title>
    <link href="http://feedproxy.google.com/~r/JamesOnSoftware/~3/Eef6LYJ28D8/objectify-a-better-way-to-build-rails-applications.html" />
    <id>tag:jamesgolick.com,2012-05-22:1337711809</id>
    <updated>2012-05-22T11:36:49-07:00</updated>
    <content type="html">&lt;p&gt;
  For almost 6 years, the dominant "best practice" for building rails applications has been &lt;a href="http://weblog.jamisbuck.org/2006/10/18/skinny-controller-fat-model"&gt;skinny controller, fat model&lt;/a&gt;. In other words, put all of your business logic in to your models &amp;mdash; keeping it out of your controllers. The result is typically a small number of bloated objects that are impossible to reason about or test in isolation &lt;a href="#footnote1"&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
  That property is important. To understand why, let's take a quick and highly selective look at the origins of object oriented programming.
&lt;/p&gt;

&lt;h3&gt;On Encapsulation&lt;/h3&gt;

&lt;p&gt;
  One of the early papers that emphasized the importance of encapsulation in software development was James H. Morris Jr.'s &lt;a href="http://www.erights.org/history/morris73.pdf"&gt;Protection in Programming Languages&lt;/a&gt;. He argued that if we were going to be able to write correct software, &amp;ldquo;programs&amp;rdquo; (probably most analogous to objects in the OOP world) needed &amp;ldquo;protection&amp;rdquo; from each other. 
&lt;/p&gt;


&lt;blockquote&gt;
  ...hostility is not a necessary precondition for catastrophic interference between programs. An undebugged program, coexisting with other programs, might as well be regarded as having been written by a malicious enemy&amp;mdash;even if all the programs have the same author!
&lt;/blockquote&gt;

&lt;blockquote&gt;
  We offer the following as a desideratum for a programming system: A programmer should be able to prove that his programs have various properties and do not malfunction, solely on the basis of what he can see from his private balliwick.
&lt;/blockquote&gt;

&lt;p&gt;
  The idea is that if we can prove that components work in isolation, then we have a better chance of having a functioning system when we assemble them in to a larger whole.
&lt;/p&gt;
  
&lt;p&gt;
  Aside from being central to the thesis in a paper that heavily influenced the development of object oriented programming itself, it doesn't seem like a stretch to argue that components that are provably correct in isolation would make a good building block from which to build working systems &lt;sup&gt;&lt;a href="#footnote2"&gt;[2]&lt;/a&gt;&lt;/sup&gt;.
&lt;/p&gt;
  
&lt;p&gt;
  We can derive a lot of good object oriented practices from this idea. Two that are relevant here:
&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;
      &lt;b&gt;Single responsibility principle (SRP)&lt;/b&gt;: To be able to prove that an object works correctly in isolation, that object's behaviour has to be clearly defined. The more responsibilities an object has, the more complex its behaviour becomes, and is therefore more difficult to prove and reason about.
    &lt;/p&gt;
    &lt;p&gt;
      For example, if we put all of our business logic in ActiveRecord::Base subclasses, the burden of proving that &lt;i&gt;our&lt;/i&gt; code is correct becomes immense because we expose ourselves to mistaken interactions with all of the behaviour we've inherited. ActiveRecord::Base already has a responsibility: persistence.&lt;br/&gt;
    &lt;/p&gt;
  &lt;/li&gt;

  &lt;li&gt;&lt;b&gt;Dependency injection (DI)&lt;/b&gt;: Many objects have dependencies &amp;mdash; other objects that they collaborate with to accomplish their goals. If those dependencies aren't configurable in some way, then we can't test the behaviour of an object without implicitly testing its collaborators as well. By injecting our dependencies, we can easily replace them with alternative implementations. In our tests, we can inject &lt;a href="http://jamesgolick.com/2010/3/10/on-mocks-and-mockist-testing.html"&gt;fake objects&lt;/a&gt; in order to isolate the object in question.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;In Search of Something Better&lt;/h3&gt;

&lt;p&gt;
  So, how should we organize our applications? There are a few approaches floating around.
&lt;/p&gt;

&lt;p&gt;
  One in particular is something I've been &lt;a href="http://jamesgolick.com/2010/3/14/crazy-heretical-and-awesome-the-way-i-write-rails-apps.html"&gt;thinking about&lt;/a&gt; and refining for a while now &lt;a href="#footnote3"&gt;&lt;sup&gt;[3]&lt;/sup&gt;&lt;/a&gt;. In this approach, persistence objects remain extremely thin, and business logic is encapsulated in lots of very simple objects known as &amp;ldquo;services&amp;rdquo; and &amp;ldquo;policies&amp;rdquo;. Not all objects in this methodology will fit in to one of those two categories, but they are two of the most important concepts.
&lt;/p&gt;

&lt;p&gt;
  Service objects are responsible for retrieving and/or manipulating data &amp;mdash; essentially any work that needs to be done that you might have put in a controller or model object before. They typically only have one method, which I like to define as &amp;ldquo;#call&amp;rdquo; (they're usually named something like PictureCreationService, so naming the method #create would seem redundant).
&lt;/p&gt;

&lt;p&gt;
  Policy objects are responsible for enforcing access control policies. We use them in before filters to guard controller actions, reuse them in views to conditionally display pieces of UI (example: a delete button that requires administrative privileges), and anywhere else policies need enforcing, like background jobs. Policy objects only ever have one public method: &amp;ldquo;#allowed?&amp;rdquo;.
&lt;/p&gt;

&lt;h3&gt;Composability&lt;/h3&gt;

&lt;p&gt;
  Decomposing the behaviour of our rails application makes it extremely simple for us to prove that our objects work in isolation because we're adhering to SRP. As a bonus, since all of our unit tests inject test doubles &amp;mdash; that don't actually do any real work &amp;mdash; in place of real collaborators, our tests are &lt;a href="http://pyvideo.org/video/631/fast-test-slow-test"&gt;extremely fast&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
  There's one other &lt;i&gt;major&lt;/i&gt; benefit to this approach: that which has been decomposed can also be recomposed. In other words, because our behaviours are factored in to many focused objects instead of a small number of bloated ones, we can (and do) recompose them to create other things. If you set out to build a User object, all you'll ever have is a User object; if you build the pieces of a User object, then you get a User object &lt;b&gt;and&lt;/b&gt; any number of other things you can build from its pieces.
&lt;/p&gt;

&lt;h3&gt;Objectify&lt;/h3&gt;

&lt;p&gt;
  We've been following and refining this approach with a real app of nontrivial complexity in production for a few years now, and it's been very successful. However, certain things have always felt somewhat ad hoc, and since rails isn't designed to support this kind of structure, it's easy for people (myself included) to get confused about exactly how functionality should be structured.
&lt;/p&gt;

&lt;p&gt;
  So, I spent the last few weeks building a framework that sits on top of rails and exposes these abstractions (and a few others) as first-class citizens. It's called &lt;a href="https://github.com/bitlove/objectify"&gt;objectify&lt;/a&gt;. It's very far from being perfect or finished, but it's a start, and we're already using it to clean up sections of our code with great success. You can read more about objectify in the &lt;a href="https://github.com/bitlove/objectify#how-it-works"&gt;README&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
  If you're interested in the future of objectify and better OOP practices for rails apps (and elsewhere), fork &lt;a href="https://github.com/bitlove/objectify"&gt;the code&lt;/a&gt;, join &lt;a href="https://groups.google.com/forum/#!forum/objectify"&gt;the mailing list&lt;/a&gt;, come hang out in #objectify on freenode, or just hit me up &lt;a href="https://twitter.com/jamesgolick"&gt;on twitter&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
  P.S. If you're interested in working on this kind of stuff and/or on a rails app that has been committed to these practices literally for &lt;i&gt;years&lt;/i&gt;, &lt;a href="mailto:james@bitlove.co"&gt;get in touch&lt;/a&gt; because we're hiring!
&lt;/p&gt;

&lt;ul style="font-size:10px; color: #999"&gt;
  &lt;li&gt;
    &lt;a name="footnote1"&gt;&lt;/a&gt;
    [1] Note that the skinny controller, fat model approach doesn't necessarily indicate that business logic belongs in ActiveRecord::Base subclasses, and even mentions presenter objects. The article wasn't necessarily was wrong, but peoples' implementation of it has been.
  &lt;/li&gt;
  &lt;li&gt;
    &lt;a name="footnote2"&gt;&lt;/a&gt;
    [2] &lt;a href="http://en.wikipedia.org/wiki/Barbara_Jane_Liskov"&gt;Barbara Liskov&lt;/a&gt; &lt;a href="http://www.youtube.com/watch?v=GDVAHA0oyJU"&gt;credits&lt;/a&gt; Morris's paper as being one of the primary influences of her &lt;a href="http://en.wikipedia.org/wiki/Turing_Award"&gt;Turing Award&lt;/a&gt;-winning research on &lt;a href="http://dl.acm.org/citation.cfm?id=800233.807045"&gt;abstract data types&lt;/a&gt; (a predecessor to what we now think of as classes).
  &lt;/li&gt;
  &lt;li&gt;
    &lt;a name="footnote3"&gt;&lt;/a&gt;
    [3] Another option is to implement all of our functionality as a series of modules, which we then include in to our model classes (or extend our model objects with). It &lt;i&gt;is&lt;/i&gt; possible to test these modules in isolation by creating fake objects to include them on. But a lot of care still needs to be taken to make sure that modules being included together don't interfere with each other, which means that these modules aren't encapsulated. Without encapsulation, the benefits of isolated testing are mostly negated. This approach is actually equivalent to putting everything in the same class, except for the fact that it's spread out over multiple source files.
  &lt;/li&gt;
&lt;/ul&gt;
&lt;img src="http://feeds.feedburner.com/~r/JamesOnSoftware/~4/Eef6LYJ28D8" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://jamesgolick.com/2012/5/22/objectify-a-better-way-to-build-rails-applications.html</feedburner:origLink></entry>
  
  <entry>
    <title>Why RubyGems Needs Loren Segal</title>
    <link href="http://feedproxy.google.com/~r/JamesOnSoftware/~3/DS0rqYYjwwY/why-rubygems-needs-loren-segal.html" />
    <id>tag:jamesgolick.com,2011-06-01:1306995687</id>
    <updated>2011-06-01T23:21:27-07:00</updated>
    <content type="html">&lt;p&gt;
  &lt;i&gt;Full disclosure before I get started here. Loren and I are friends. I'd like to think that this blog post is mostly unbiased, but I'll let you come to your own conclusions.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
  Maintaining a piece of core infrastructure for a growing community is hard. Even if the code isn't especially complex, getting the release management issues right and keeping everybody happy is incredibly challenging.
&lt;/p&gt;

&lt;p&gt;
  Sometimes, that means making concessions, like continuing to maintain an API you don't like &amp;mdash; or code that gets in the way of a refactoring you want to do to make your own life easier. But that's the challenge of maintaining software that tens of thousands of people depend on every day.
&lt;/p&gt;

&lt;h3&gt;On Legacy Code&lt;/h3&gt;

&lt;p&gt;
  Did you know that in the linux kernel project, every commit absolutely must build cleanly and run the tests successfully to be accepted? Kernel hackers go to great lengths to make every patch fit these requirements. It's an enormous pain in the ass for the developers, but it means that when something breaks, git-bisect will find it for them.
&lt;/p&gt;

&lt;p&gt;
  Going to that much trouble just to be able to use git-bisect is a huge headache for developers. But it's worth it because the linux kernel is an important project that gets used by millions of people.
&lt;/p&gt;

&lt;p&gt;
  The problem with RubyGems isn't that the tests don't pass every commit, but that APIs have been disappearing too quickly. It's an important enough project used by enough people that deprecations should be measured in &lt;i&gt;years&lt;/i&gt; not months.
&lt;/p&gt;

&lt;p&gt;
  The current maintainers of RubyGems don't want to live in that kind of world. They want to move quickly to refactor the codebase, deprecating and removing APIs where it suits them. &lt;b&gt;And who could blame them? I wouldn't want to maintain a huge pile of legacy code either.&lt;/b&gt; Most programmers wouldn't.
&lt;/p&gt;

&lt;h3&gt;You'd have to be crazy...&lt;/h3&gt;

&lt;p&gt;
  Actually, Loren is just about the &lt;i&gt;only&lt;/i&gt; guy I know who isn't bothered by this sort of thing in the slightest. In fact, he seems to enjoy maintaining legacy code. It's a sick pleasure, and I know a little something about sick pleasures...
&lt;/p&gt;

&lt;p&gt;
  In all seriousness, a guy who cares about release management as deeply as Loren is one in a million. The fact that he's also an extremely talented engineer makes him one in a hundred million.  When he told me a couple of months ago that he'd be willing to maintain RubyGems, I gave him an "are you serious?" look. Turns out, he was.
&lt;/p&gt;

&lt;p&gt;
  So, here we've got a ridiculously talented programmer who wants to make all of our lives easier by living with a whole bunch of legacy code for us. It really is a no-brainer.
&lt;/p&gt;

&lt;p&gt;
  The RubyGems team should bring Loren on board to run the project. He's more than willing to put in the time and effort, and he's the best person I can imagine for the job. The proof is in the pudding: Loren (and team) have put together a fork of RubyGems (1.3.7) that maintains backwards compatibility, and backports all of the performance improvements made since them.
&lt;/p&gt;

&lt;h3&gt;SlimGems&lt;/h3&gt;

&lt;p&gt;
  &lt;a href="http://slimgems.github.com"&gt;SlimGems&lt;/a&gt; is a really great project with a really horrible name. It's an effort to make a RubyGems with a stable API (the one from 1.3.7), a better code base, and faster gem installs. And there are a lot more exciting plans for the future. Check out &lt;a href="http://gnuu.org/2011/06/01/slimgems-a-drop-in-replacement-for-rubygems/"&gt;Loren's blog post&lt;/a&gt; for more info.
&lt;/p&gt;

&lt;p&gt;
  More importantly than any of that, though, is how helpful and friendly Loren is when it comes to bug reports and pull requests. He did it with &lt;a href="http://github.com/lsegal/yard"&gt;YARD&lt;/a&gt;, and now he's doing it with SlimGems. I've never heard anybody give him anything but rave reviews.
&lt;/p&gt;

&lt;p&gt;
  Until Loren is an official RubyGems maintainer, I'll be running SlimGems. I moved our 30 our so servers at work over too. If that sounds interesting to you, 'gem install slimgems' is all it takes. Oh, and if you want to revert back to your original RG install, just 'gem uninstall slimgems'. That's just how we roll.
&lt;/p&gt;

&lt;p&gt;
  If you have any trouble or feedback, jump in to #slimgems on freenode, or open a &lt;a href="https://github.com/slimgems/slimgems/issues"&gt;GitHub issue&lt;/a&gt;, and we'll be happy to help you out.
&lt;/p&gt;

&lt;h3&gt;The Future&lt;/h3&gt;

&lt;p&gt;
  Forks are good for communities. They're a great place for new ideas to be proven (think rails/merb). SlimGems has already demonstrated that its goals are possible. They've already achieved many of them. I'm ready to see the code and teams merged. Until then, do yourself a favour and 'gem install slimgems'.
&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/JamesOnSoftware/~4/DS0rqYYjwwY" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://jamesgolick.com/2011/6/1/why-rubygems-needs-loren-segal.html</feedburner:origLink></entry>
  
  <entry>
    <title>VERIFY_NONE</title>
    <link href="http://feedproxy.google.com/~r/JamesOnSoftware/~3/CHnjGTYQmC0/verify-none..html" />
    <id>tag:jamesgolick.com,2011-02-15:1297828321</id>
    <updated>2011-02-15T19:52:01-08:00</updated>
    <content type="html">&lt;p&gt;
  A while back, it came to my attention that ruby's net/https implementation had an insecure default: not verifying TLS certificates (OpenSSL::SSL::VERIFY_NONE). I wrote &lt;a href="http://www.rubyinside.com/how-to-cure-nethttps-risky-default-https-behavior-4010.html"&gt;an article about it&lt;/a&gt; for RubyInside, and helped &lt;a href="http://twitter.com/geemus"&gt;@geemus&lt;/a&gt; &lt;a href="https://github.com/geemus/excon/commit/f84cbd8fd15fb3da13453d13c2a0164d62bef50b"&gt;fix&lt;/a&gt; the issue in his excon gem. Despite this being an incredibly serious security issue, nobody really seemed to care. Oh well.
&lt;/p&gt;

&lt;p&gt;
  Then today, one of the biggest names in the ruby community, &lt;a href="http://twitter.com/tenderlove"&gt;Aaron Patterson (aka @tenderlove)&lt;/a&gt;, posted a gist of a little campfire bot that he wrote that forced net/https in to this insecure mode.
&lt;/p&gt; 

&lt;p&gt;
  &lt;img src="/images/gisttweet.jpg" /&gt;
&lt;/p&gt;

&lt;p&gt;
  Yes, a campfire bot is relatively unimportant security-wise (except that if there's a man-in-the-middle, he now has credentials to access your campfire room, which may or may not contain company secrets &amp;mdash; but I digress). Eventually I remarked that despite the relative unimportance of a campfire bot, tenderlove is a leader in the ruby community, and leaders should set good examples.
&lt;/p&gt;

&lt;p&gt;
  &lt;img src="/images/communityleader.jpg" /&gt;
&lt;/p&gt;

&lt;p&gt;
  A few other people also responded. @joedamato posted an &lt;a href="http://twitter.com/#!/joedamato/status/37653616445620225"&gt;admittedly less constructive response&lt;/a&gt;. And Ben Black, a &lt;a href="http://twitter.com/#!/b6n/status/37665737057247233"&gt;somewhat snarky, but not particularly harsh suggestion&lt;/a&gt;. That's when the hate started pouring in.
&lt;/p&gt;

&lt;p&gt;
  Here's the thing: this is a very serious security issue, and nearly every rubygem that uses net/http is guilty of it (yes, even active_merchant, the thing that everybody uses to interact with payment gateways). Why? Because of the prevelance of copy and paste coding. Yes, I do it, and so do you. 
&lt;/p&gt;

&lt;p&gt;
  And nearly every net/https example uses VERIFY_NONE. It's so common in example code that in the related links on &lt;b&gt;the RubyInside article about the perils of VERIFY_NONE&lt;/b&gt;, there's &lt;a href="http://www.rubyinside.com/nethttp-cheat-sheet-2940.html"&gt;a link&lt;/a&gt; to example code that uses it (lol?).
&lt;/p&gt;

&lt;p&gt;
  &lt;img src="http://img.skitch.com/20101209-xd9ax1baxn3p4a2g679t3ak5w1.jpg" width="725" height="313"/&gt;
&lt;/p&gt;

&lt;p&gt;
  Aaron is one of a small group of people in the ruby community who actually has the power to do something about this problem. By setting the right example, people will copy and paste good code instead of bad code. That's more useful than a million tweets or blog posts.
&lt;/p&gt;

&lt;p&gt;
  Yes, this may all seem trivial to you. It's just a hack, after all. Obviously, Aaron wasn't being deliberately insecure. He was just hacking, which is perfectly fine. But, we all know that hacks have a way of ending up in production.
&lt;/p&gt;

&lt;p&gt;
  &lt;img src="/images/broken.jpg" /&gt;
&lt;/p&gt;

&lt;p&gt;
  It probably won't be tenderlove's app; it might be some noob who found and modified his code. But sometimes one man's hack winds up (however indirectly) being another man's business.
&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/JamesOnSoftware/~4/CHnjGTYQmC0" height="1" width="1"/&gt;</content>
  <feedburner:origLink>http://jamesgolick.com/2011/2/15/verify-none..html</feedburner:origLink></entry>
  
</feed>
