nutrun

Earl - webified strings

2011-03-02T00:00:00Z

Earl makes Ruby strings do web stuff like extracting urls or finding redirect destinations.

text = "Blah blah http://tinyurl.com/4bnjzbu blah http://tinyurl.com/4tefu9f"

text.urls # => ["http://tinyurl.com/4bnjzbu", "http://tinyurl.com/4tefu9f"]

"http://tinyurl.com/4tefu9f".location # => "http://nutrun.com/"

Supercharged ruby console output

2010-11-17T00:00:00Z

https://gist.github.com/703943

Sinatra reloader

2010-06-24T00:00:00Z

When I first started using it to write web apps a couple of years ago, Sinatra supported code reloading in development mode. That feature was dropped from the core of Sinatra at some point and we just got used to restarting the app every time we made a change whilst developing, it's not that huge an overhead, especially considering Sinatra's fast start up.

I recently had to work on a Rails codebase for a while, which reminded me that code reloading without restarting in dev mode is functionality I don't mind spoiling myself with. At the time, Abs pointed me to sinatra-reloader which I installed and used in a couple of apps and it works well. As I'm writing this, I'm also looking at Rack::Reloader, which I've never used and seems somewhat different with its own set of interesting features. Shotgun is out of the question for me, because it feels like manually restarting the app is faster than the time Shotgun takes to load everything per request.

RVM has prompted me to switch between Ruby versions more often than in the past, resulting in installing gems more frequently than I used to, which in turn brings out an OCD side of me when it comes to gems that download other gems as dependencies. That's the one thing that bugs me about sinatra-reloader and since I found myself with a bit of time on my hands, I wrote my own Sinatra reloader which I've put in this gist in case someone else finds it useful.

It works by reloading all source files and routes when it detects a change. This is less efficient than selectively reloading only code from files that have changed, although I tried it in a few of my projects without noticeable penalties. A thing to watch out for is that once a constant has been loaded, it will still be around after you delete the code that declares it. Restarting is required for such changes to take effect. I've also noticed a similar issue with classes that extend Sequel::Model - if I run a migration and don't restart, database field mappings don't get updated, because Sequel makes those mappings at the time Sequel::Model is subclassed.

In summary, if you don't mind installing a bunch of gems you're likely to never use, I recommend sinatra-reloader. If you're after code reloading which you might want to customise with a couple of lines of code on the spot to suit your particular project's needs, this can be a starting point.

Incremental deployment

2009-12-22T00:00:00Z

I've recently had a chance to look at a high availability system designed and built by Forward colleagues Andy Kent and Paul Ingles. It is a critical web service with a very high impact of failure. Essentially, it must stay up at all times.

The service is hosted on Amazon EC2. It makes use of EC2's geographically distributed regions and different availability zones within each region, fronted by AWS Elastic Load Balancing and additional global DNS fail over outside of EC2/AWS.

A part of the project that struck me as particularly interesting is the deployment strategy Paul and Andy settled on. Regardless of how much trust we have in our builds and QA process, deployments become a whole different, much more stressful activity when critical systems like the one under discussion are involved. Andy mentioned it is important to find the balance between what to automate and bits that should require manual input.

# deploy.rb

task :us_1b do
  set :region, 'us-east-1'
  set :servers, us_1b
  # More US 1b specific setup...
end

task :eu_1a do
  set :region, 'eu-west-1'
  set :servers, eu_1a
  # More EU 1a specific setup...
end

This service is incrementally deployed one availability zone at a time, e.g. cap us_1b deploy. Each deployment step is manual - it requires someone to push the button. This means that if something goes wrong, only part of the system will be affected, achieving significant redundancy. If the failure was severe enough to bring the system down, only one availability zone in one region will fail and the load balancers will make sure that this failure is transparent to end users and does not overall affect the entire system.

Deployment setup automation

2009-11-10T00:00:00Z

Part of my work these days has to do with building and deploying numerous experimental applications with varying life cycles. Many of these applications get built and put on a server in less than a day only to be shut down and never looked at again a couple of days later, others get turned off and revisited after some time, while others graduate to larger, wider scope systems.

This means that I get to deploy applications for the first time more frequently than usual. Also, because we deploy to virtualised infrastructures (including an internal cloud, Slicehost and Amazon EC2), slice instances (servers) tend to get rebuilt more often than they would in the absence of virtualisation. First time deployments are generally more involved than subsequent ones because there is setup up to be made and software to be installed in order for the host servers to accommodate the application.

One way to treat first time deployment woes is to create and maintain images of the system in the state required to host the application. I find this to work well when dealing with moderate numbers of applications and servers, whereas creating and keeping images up to date has a tendency to become tedious and inflexible as the number of applications and images increases.

As an alternative, we can move prerequisite system setup and installations responsibility closer to the application code, in the form of an after hook to the deploy:setup task that we call the first time we deploy an application with Capistrano. Here's some Capistrano code that performs one time setup tasks.

namespace :util do
  task :install_libraries do
    sudo 'apt-get install libxml2 libxml2-dev libmysqlclient15-dev -y'
  end  
end

after 'deploy:setup', 'util:install_libraries'

With this approach, the application knows how to setup the system the way it needs it to be next time it gets deployed for the first time. As an added benefit, the Capistrano code serves as documentation for the application's system requirements.

VCS practices over features

2009-08-29T00:00:00Z

I've often heard people I know and respect say that git is leaps and bounds better than Subversion . I've been a relatively early adopter of git, it's been my VCS of choice for almost two years now. Even though I find it superior to most of the competition I struggle to justify the "leaps and bounds" claim and would rather more modestly call it "a step forward".

This is probably due to the practices we find benefit our development process. Git puts great emphasis on branching, something we generally tend to avoid (to clarify, I'm not referring to local branching). We concentrate on feedback based on the usage of our applications. This means that we strive to commit as often as possible and, most importantly, deploy to production at a constant rate. Grossly simplified, the process is: identify a small coherent feature, build it, commit it to the master branch and deploy. No part of the codebase is owned by a subdivision of the team, everyone works on everything.

By far the most popular git commands we issue are git pull, git add and git push, not that different to svn update and svn commit.

When I first started using git I was wondering if I had developed a fear of branching because of Subversion's inefficiencies in that area. In reality, I think that an environment where every developer constantly has an up to date understanding of the codebase and especially a current grasp of the design and overall vision will always be more efficient than working remotely and having merge checkpoints, no matter how cleverly the VCS handles branching. This is why I think a faster, distributed, superior at merging VCS is not something more dramatic than a desirable step forward.

Hello world nginx module

2009-08-15T00:00:00Z

Several times over the past few months I made short lived attempts of delving into the mechanics of nginx modules. Although an invaluable resource to anyone seriously interested in the subject, Emiller's Guide To Nginx Module Development doesn't at the time of this writing include a quick-start example I could hack together and see in action. Getting something to run as quickly as possible is my preferred way of starting the study of new things and every time I caught myself searching the web for a "Hello world nginx module".

I will not go into any details, Emiller's Guide does an excellent job at that, I'm only going to mention the steps I believe are absolutely necessary to write, compile and run an nginx handler module that responds to every request with the string "Hello world".

There is a minimum of two files required for writing an nginx module, the first should be called config and looks something like this:

ngx_addon_name=ngx_http_hello_world_module
HTTP_MODULES="$HTTP_MODULES ngx_http_hello_world_module"
NGX_ADDON_SRCS="$NGX_ADDON_SRCS $ngx_addon_dir/ngx_http_hello_world_module.c"

The second is the module's implementation in C and nginx convention suggests a name like ngx_http_modulename_module.c, in this case ngx_http_hello_world_module.c.

#include <ngx_config.h>
#include <ngx_core.h>
#include <ngx_http.h>

static char *ngx_http_hello_world(ngx_conf_t *cf, ngx_command_t *cmd, void *conf);

static ngx_command_t  ngx_http_hello_world_commands[] = {

  { ngx_string("hello_world"),
    NGX_HTTP_LOC_CONF|NGX_CONF_NOARGS,
    ngx_http_hello_world,
    0,
    0,
    NULL },

    ngx_null_command
};


static u_char  ngx_hello_world[] = "hello world";

static ngx_http_module_t  ngx_http_hello_world_module_ctx = {
  NULL,                          /* preconfiguration */
  NULL,                          /* postconfiguration */

  NULL,                          /* create main configuration */
  NULL,                          /* init main configuration */

  NULL,                          /* create server configuration */
  NULL,                          /* merge server configuration */

  NULL,                          /* create location configuration */
  NULL                           /* merge location configuration */
};


ngx_module_t ngx_http_hello_world_module = {
  NGX_MODULE_V1,
  &ngx_http_hello_world_module_ctx, /* module context */
  ngx_http_hello_world_commands,   /* module directives */
  NGX_HTTP_MODULE,               /* module type */
  NULL,                          /* init master */
  NULL,                          /* init module */
  NULL,                          /* init process */
  NULL,                          /* init thread */
  NULL,                          /* exit thread */
  NULL,                          /* exit process */
  NULL,                          /* exit master */
  NGX_MODULE_V1_PADDING
};


static ngx_int_t ngx_http_hello_world_handler(ngx_http_request_t *r)
{
  ngx_buf_t    *b;
  ngx_chain_t   out;

  r->headers_out.content_type.len = sizeof("text/plain") - 1;
  r->headers_out.content_type.data = (u_char *) "text/plain";

  b = ngx_pcalloc(r->pool, sizeof(ngx_buf_t));

  out.buf = b;
  out.next = NULL;

  b->pos = ngx_hello_world;
  b->last = ngx_hello_world + sizeof(ngx_hello_world);
  b->memory = 1;
  b->last_buf = 1;

  r->headers_out.status = NGX_HTTP_OK;
  r->headers_out.content_length_n = sizeof(ngx_hello_world);
  ngx_http_send_header(r);

  return ngx_http_output_filter(r, &out);
}


static char *ngx_http_hello_world(ngx_conf_t *cf, ngx_command_t *cmd, void *conf)
{
  ngx_http_core_loc_conf_t  *clcf;

  clcf = ngx_http_conf_get_module_loc_conf(cf, ngx_http_core_module);
  clcf->handler = ngx_http_hello_world_handler;

  return NGX_CONF_OK;
}

Both config and ngx_http_hello_world_module.c should be placed in the same directory, let's say /etc/ngxhelloworld. Modules are compiled into the nginx binary. To do so, download the nginx source, uncompress, and in the nginx source directory run:

./configure --add-module=/etc/ngxhelloworld
make
sudo make install

Finally, add a module directive to nginx's configuration (default is /usr/local/nginx/conf/nginx.conf) to enable the module for a location.

location = /hello {
  hello_world;
}

At this point, we can start nginx and navigating to http://localhost/hello will yield the result of all this labor.

Alongside Emiller's Guide, I also found reading nginx third party module code helpful.

Asynchronous session content injection

2009-08-06T00:00:00Z

Applying a clear distinction between stateless and stateful content when designing a web application is tricky but worth tackling early so that content not specific to user sessions can benefit from web caching. The technique we are trying out for scramble.com reminds me of what I described in State separation and was introduced to me by Mike Jones who was inspired by the Dynamically Update Cached Pages chapter in Advanced Rails Recipes.

The idea involves serving non session specific resources independent from personalized content and use AJAX calls to inject the page with session specific content.

require 'rubygems'
require 'sinatra'
require 'json'

configure do
  enable :sessions
end

get '/' do
  headers['Cache-Control'] = 'max-age=60, must-revalidate'
  erb :index
end

get '/userinfo' do
  if session[:user]
    JSON.dump(:user => session[:user])
  else
    halt 401
  end
end

get '/login' do
  session[:user] = 'rock'
  redirect '/'
end

get '/logout' do
  session.clear
  redirect '/'
end

Notice some of the headers for '/':

$ curl -I http://localhost:4567/
Cache-Control: max-age=60, must-revalidate
Set-Cookie: rack.session=BAh7AA%3D%3D%0A; path=/

The Cache-Control policy instructs a web cache to keep this version of the resource for 60 seconds before requesting a fresh one. Set-Cookie however will usually cause a web cache to never store the response and always query its back end.

The following configuration tells Varnish to throw away the cookie from any request/response that doesn' match one of the URLs that require authorization, thus causing it to react to response cache policies.

sub vcl_recv {
  if (req.url !~ "^(/login|/logout|/userinfo)") {
    unset req.http.cookie;
  }
}

sub vcl_fetch {
  if (req.url !~ "^(/login|/logout|/userinfo)") {
    unset obj.http.set-cookie;
  }
}

A snippet from the HTML response for '/':

<h1>Hi</h1>
<div id="nav">
  <a href="/login" class="login-control">Login</a>
</div>

... and the javascript for asynchronously injecting session data to the page:

$(function() {
  $.getJSON('/userinfo', function(data) {
    $('h1').text('Hi ' + data.user);
    $('#nav .login-control').attr('href', '/logout').html('logout');
  })
})

In summary, it is likely that a website will have significant amounts of content that is intended for everyone without the need for personalization. The performance of serving that content can benefit from web caching, but that becomes difficult as many websites' user experience depends on the presence of user sessions. Separating stateless from session specific content at the resource level and using a combination of HTTP and AJAX to merge the results of requests for both types of resources will make stateless content cacheable by decoupling it from the unnecessary cookie dependency.

Runnable code example: http://pastie.org/573878

Rack::CacheHeaders code

2009-05-18T00:00:00Z

A few months ago I wrote about a possible method for centrally configuring HTTP cache headers in Rack based web applications which I called Rack::CacheHeaders. This is useful if your application's architecture involves tools like Squid or Varnish, or if you are generally interested in harvesting the numerous advantages of HTTP caching for your web application.

The code has evolved a bit since and proven useful in a number of production systems. I created a gist of Rack::CacheHeaders in case someone else finds it handy. The tool is not exhaustive in terms of policies as found in the HTTP specs, it's a collection of the ones we needed in the projects it's been used so far. Consider adding ones you need to the gist to make the code more complete and widely useful.

Rack::CacheHeaders allows configuring HTTP cache policy response headers based on request URI patterns. For example, to set the Cache-Control: max-age header for a /guitars/:id resource to one hour:

Rack::CacheHeaders.configure do |cache|
  cache.max_age(/^\/guitars\/d+$/, 3600)
end

Download/develop Rack::CacheHeaders

97 Things Every Software Architect Should Know

2009-02-28T00:00:00Z

A few months ago I wrote one of the axioms for a community effort called 97 Things Every Software Architect Should Know which was driven and edited by Richard Monson-Haefel. This collection of principles, as contributed by an impressive range of software architects around the world, was recently released as a book by O'Reilly Media and is well worth a look if you're interested in pragmatic advice based on how some of our colleagues approach technology projects.