<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
  <channel>
    <title>asemanfar</title>
    <description>a blog about programming</description>
    <link>http://asemanfar.com/posts.rss</link>
    <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/Asemanfar" type="application/rss+xml" /><feedburner:browserFriendly></feedburner:browserFriendly><item>
      <title>Detecting User Automation</title>
      <description>&lt;p&gt;Over the past several months at &lt;a href="http://seriousbusiness.com"&gt;Serious Business&lt;/a&gt;, we've had several incidents of users botting our games by using various Greasemonkey scripts all the way up to automated binaries that login and level you up without even launching a browser. At one point it got so bad that we doubled our requests per second due to bots. How do we fight this? &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;War is Bad&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The first occurrence of this was when people were automating the collection of randomly appearing coins on user profile pages. We quickly thwarted this bot by placing decoy coins that humans wouldn't be able to see and using the click as an indicator of automation. We punished these users ultimately by taking away what they got by cheating, and of course they were upset. We quickly learned that this wasn't going to work against all bots and that targeted punishment pisses people off; it was a war we didn't want to fight and quickly abandoned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;New Perspective&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The next logical step was to approach the problem from a context-free perspective. Ignore the individual bots and what they do, what's common about all bots? They can't pass &lt;a href="http://en.wikipedia.org/wiki/CAPTCHA"&gt;CAPTCHAs&lt;/a&gt;. But we can't show all 4 million of our users a CAPTCHA when the login, it'd get annoying and be bad for business. So we have to distinguish between those who we suspect are botting and those are clearly not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Heuristics&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In order to pick out the &lt;strike&gt;cylons&lt;/strike&gt; bots from the humans, we needed to identify some heuristics. Among the many different things we could choose, we chose requests per second and session length for our first implementation.&lt;/p&gt;

&lt;p&gt;Enter the fun part...&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We have several million users, hundreds of requests per second, 11 app servers, and hundreds of mongrels. We need to collect user data, analyze it, and feed it to the mongrels to act on flagged users. The main concerns are that it must not affect response times, it mustn't take up much space, and it should be fast. The solution we came up with has two parts a &lt;a href="http://github.com/arya/pandemic"&gt;pandemic&lt;/a&gt; daemon and a mongrel handler (using pre-rack Rails).&lt;/p&gt;

&lt;p&gt;The mongrel handler sends an asynchronous request to a cluster of daemons. Each node in the cluster is responsible for tracking and analyzing a subset of the users' activity, which enables us to easily scale it out by adding more nodes. Each node periodically analyzes the new data and publishes a list of misbehaving users, which each mongrels pulls down. All user activity is stored only in memory and thrown out once they become inactive.&lt;/p&gt;

&lt;p&gt;Asynchronously collecting data, analyzing it offline and in-memory , and periodically syncing it back to the mongrels turned out to efficient with virtually zero impact on response times and server load.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/UnboundImagination/~4/XymaYz8xCwM" height="1" width="1"/&gt;</description>
      <pubDate>Tue, 14 Jul 2009 23:30:52 -0700</pubDate>
      <link>http://feedproxy.google.com/~r/UnboundImagination/~3/XymaYz8xCwM/Detecting-User-Automation</link>
      <guid isPermaLink="false">http://asemanfar.com/Detecting-User-Automation</guid>
    <feedburner:origLink>http://asemanfar.com/Detecting-User-Automation</feedburner:origLink></item>
    <item>
      <title>Pandemic -- because information needs to be contagious</title>
      <description>&lt;p&gt;Pandemic is a map-reduce framework. You give it the map, process, and reduce methods and it handles the rest. It's designed to serve requests in real-time, but can also be used for offline tasks.&lt;/p&gt;

&lt;p&gt;It's different from the typical map-reduce framework in that it doesn't have a master-worker structure. Every node can map, process, and reduce. It also doesn't have the concept of jobs, everything is a request.&lt;/p&gt;

&lt;p&gt;The framework is designed to be as flexible as possible, there is no rigid request format, or API, you can specify it however you want. You can send it http-style headers and a body, you can send it JSON, or you can even just send it a single line and have it do whatever you want. The only requirement is that you write your handler to appropriately act on the request and return the response.&lt;/p&gt;

&lt;p&gt;Here is how you use it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Server&lt;/strong&gt;
&lt;code lang="ruby"&gt;
require 'rubygems'
require 'pandemic'&lt;/p&gt;

&lt;p&gt;class Handler &amp;lt; Pandemic::ServerSide::Handler
  def process(body)&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;body.reverse
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;  end
end&lt;/p&gt;

&lt;p&gt;pandemic_server = epidemic!
pandemic_server.handler = Handler.new
pandemic_server.start.join
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In this example, the handler doesn't define the map or reduce methods, and the defaults are used. The default for each is as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;map: Send the full request body to every connected node&lt;/li&gt;
&lt;li&gt;process: Return the body (do nothing)&lt;/li&gt;
&lt;li&gt;reduce: Concatenate all the responses&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&lt;strong&gt;Client&lt;/strong&gt;
&lt;code lang="ruby"&gt;
require 'rubygems'
require 'pandemic'&lt;/p&gt;

&lt;p&gt;class TextFlipper
  include Pandemize
  def flip(str)&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pandemic.request(str)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;  end
end
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Config&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Both the server and client have config files:
&lt;code lang="yaml"&gt;&lt;/p&gt;

&lt;h1&gt;pandemic_server.yml&lt;/h1&gt;

&lt;p&gt;servers:
  - host1:4000
  - host2:4000
response_timeout: 0.5
&lt;/code&gt;
Each value for the server list is the &lt;em&gt;host:port&lt;/em&gt; that a node can bind to. The servers value can be a hash or an array of hashes, but I'll get to that later. The response timeout is how long to wait for responses from nodes before returning to the client.
&lt;code lang="yaml"&gt;&lt;/p&gt;

&lt;h1&gt;pandemic_client.yml&lt;/h1&gt;

&lt;p&gt;servers:
  - host1:4000
  - host2:4000
max_connections_per_server: 10
min_connections_per_server: 1
response_timeout: 1
&lt;/code&gt;&lt;br/&gt;
The min/max connections refers to how many connections to each node. If you're using the client in Rails, then just use 1 for both min/max since it's single threaded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;More Config&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There are three ways to start a server:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ruby server.rb -i 0&lt;/li&gt;
&lt;li&gt;ruby server.rb -i machine1hostname&lt;/li&gt;
&lt;li&gt;ruby server.rb -a localhost:4000&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The first refers to the index in the servers array:
&lt;code lang="yaml"&gt;
servers:
  - host1:4000 # started with ruby server.rb -i 0
  - host2:4000 # started with ruby server.rb -i 1
&lt;/code&gt;    &lt;br/&gt;
The second refers to the index in the servers &lt;em&gt;hash&lt;/em&gt;. This can be particularly useful if you use the hostname as the key.
&lt;code lang="yaml"&gt;
servers:
  machine1: host1:4000 # started with ruby server.rb -i machine1
  machine2: host2:4000 # started with ruby server.rb -i machine2
&lt;/code&gt;    &lt;br/&gt;
The third is to specify the host and port explicitly. Ensure that the host and port you specify is actually in the config otherwise the other nodes won't be able to communicate with it.&lt;/p&gt;

&lt;p&gt;You can also set node-specific configuration options.
&lt;code lang="yaml"&gt;
servers:
  - host1:4000:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;  database: pandemic_node_1
  host: localhost
  username: foobar
  password: f00bar
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;host2:4000:
  database: pandemic_node_2
  host: localhost
  username: fizzbuzz
  password: f1zzbuzz
&lt;/code&gt;
And you can access these additional options using &lt;em&gt;config.get(keys)&lt;/em&gt; in your handler:
&lt;code lang="ruby"&gt;
class Handler &amp;lt; Pandemic::ServerSide::Handler
def initialize
@dbh = Mysql.real_connect(*config.get('host', 'username',

&lt;pre&gt;&lt;code&gt;                                  'password', 'database'))
&lt;/code&gt;&lt;/pre&gt;

end
end
&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&lt;strong&gt;Code: &lt;/strong&gt;
&lt;a href="http://github.com/arya/pandemic/tree/master"&gt;github repository&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Install:&lt;/strong&gt;&lt;br/&gt;
sudo gem -a http://gems.github.com&lt;br/&gt;
sudo gem install arya-pandemic&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/UnboundImagination/~4/6mW3UpGd11A" height="1" width="1"/&gt;</description>
      <pubDate>Fri, 10 Apr 2009 23:03:50 -0700</pubDate>
      <link>http://feedproxy.google.com/~r/UnboundImagination/~3/6mW3UpGd11A/Pandemic----because-information-needs-to-be-contagious</link>
      <guid isPermaLink="false">http://asemanfar.com/Pandemic----because-information-needs-to-be-contagious</guid>
    <feedburner:origLink>http://asemanfar.com/Pandemic----because-information-needs-to-be-contagious</feedburner:origLink></item>
    <item>
      <title>Request Queue via Mongrel Proctitle</title>
      <description>&lt;p&gt;In our cluster, we run many mongrels on several app servers all behind a load balancer. In order to get an idea of how each app server and its mongrels are doing, we use &lt;a href="http://github.com/rtomayko/mongrel_proctitle/tree"&gt;rtomayko's mongrel_proctitle&lt;/a&gt; gem. This gem sets up a mongrel handler that extracts request information from the client's request and sets it as the process title so when you do a &lt;em&gt;ps aux&lt;/em&gt; or run &lt;em&gt;top&lt;/em&gt; you can see each mongrel's queue length and request info:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
mongrel_rails [8000/1/4]: handling 127.0.0.1: GET /users&lt;/p&gt;

&lt;h1&gt;that's [port/queue length/requests handled]: handling client ip: request&lt;/h1&gt;

&lt;p&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Last week, we were having some misbehaving mongrels that would lock up and essentially stop serving requests. We'd look at the process titles and see something like this:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
mongrel rails [8021/14/6123]: handling 127.0.0.1: GET /status
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;We have an action designated to be a lightweight health check to let the load balancer know what's up and according to this process title, this action was locking up the mongrel and growing to a queue length of around 14. This didn't make sense, why is the lightest weight action locking up the mongrel?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With a little digging, I found that the mongrel_proctitle gem is broken. More precisely, it is only accurate when there is a single request being handled and none queued. The handler was setting the process title as soon as it received the request, which then led that client's thread to the next handler, which happens to be the synchronized Rails handler. So the request the process title shows is the most recent received request and not the current request being handled.&lt;/p&gt;

&lt;p&gt;With a &lt;a href="http://github.com/arya/mongrel_proctitle/commit/f553dea913f8a425d9a712fc328835abf57e5305"&gt;few small modifications&lt;/a&gt;, we now have a more accurate process title that also shows the rest of the queue (up to a character limit) as a comma-separated list:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
mongrel rails [8021/2/6123]: handling 127.0.0.1: GET /users, 127.0.0.1: GET /status
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The updated version is &lt;a href="http://github.com/arya/mongrel_proctitle/tree/"&gt;available&lt;/a&gt; on &lt;a href="http://github.com"&gt;github&lt;/a&gt; and also as a gem at gems.github.com as arya-mongrel_proctitle:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
gem sources -a http://gems.github.com
sudo gem install arya-mongrel_proctitle
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Note, as described in the comments in the code, it's still not exact. There is at least one race case where the entire list won't be ordered accurately, but it's unlikely to occur and still gives a good idea of what's going on.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/UnboundImagination/~4/LXhBj7UMGzg" height="1" width="1"/&gt;</description>
      <pubDate>Wed, 11 Feb 2009 23:04:57 -0800</pubDate>
      <link>http://feedproxy.google.com/~r/UnboundImagination/~3/LXhBj7UMGzg/Request-Queue-via-Mongrel-Proctitle</link>
      <guid isPermaLink="false">http://asemanfar.com/Request-Queue-via-Mongrel-Proctitle</guid>
    <feedburner:origLink>http://asemanfar.com/Request-Queue-via-Mongrel-Proctitle</feedburner:origLink></item>
    <item>
      <title>I Went Unobtrusive</title>
      <description>&lt;p&gt;So I've been cleaning up my blog's code base and decided to switch to unobtrusive javascript. I've heard many reasons advocating its use but for some reason, they never stuck. Until a couple of weeks ago when I worked on a &lt;a href="http://railsrumble.com"&gt;Railsrumble&lt;/a&gt; project with a couple of buddies of mine, one of which was a big advocate of unobtrusive javascript using &lt;a href="http://www.danwebb.net/2006/9/3/low-pro-unobtrusive-scripting-for-prototype"&gt;LowPro&lt;/a&gt;. Actually using unobtrusive javascript turned out to be much more gratifying that I anticipated. It satisfied some inner OCD tendencies of mine that a lot of programmers share and it made developing javascript'd UIs much easier to develop and &lt;em&gt;much much&lt;/em&gt; easier to maintain.&lt;/p&gt;

&lt;p&gt;Looking back, I don't think know why I chose to use the Rails javascript helpers or RJS; I like writing javascript, especially when there are great frameworks out there like &lt;a href="http://www.danwebb.net/2006/9/3/low-pro-unobtrusive-scripting-for-prototype"&gt;LowPro&lt;/a&gt; and &lt;a href="http://www.prototypejs.org/"&gt;Prototype&lt;/a&gt;. The fact that the Rails javascript helpers generate inline javascript, which sometimes turns out being longer than a simple one-liner, is kind of gross. As for RJS, the fact that it's generates code that contains both the data and the logic makes OCD baby Jesus cry.&lt;/p&gt;

&lt;p&gt;The separation of HTML (markup), CSS (presentation), Javascript (interactivity) is almost as awesome as the separation of church and state...oh wait, crap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you haven't already switched, do it now!&lt;/strong&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/UnboundImagination/~4/O0523oygMGw" height="1" width="1"/&gt;</description>
      <pubDate>Sat, 08 Nov 2008 10:44:24 -0800</pubDate>
      <link>http://feedproxy.google.com/~r/UnboundImagination/~3/O0523oygMGw/I-Went-Unobtrusive</link>
      <guid isPermaLink="false">http://asemanfar.com/I-Went-Unobtrusive</guid>
    <feedburner:origLink>http://asemanfar.com/I-Went-Unobtrusive</feedburner:origLink></item>
  </channel>
</rss>
