<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>nutrun » Software</title>
	
	<link>http://nutrun.com</link>
	<description>nutrun</description>
	<lastBuildDate>Mon, 18 May 2009 13:22:34 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/nutrun/feed" type="application/rss+xml" /><item>
		<title>Rack::CacheHeaders code</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/Xw1hobGSM3E/</link>
		<comments>http://nutrun.com/weblog/rackcacheheaders-code/#comments</comments>
		<pubDate>Mon, 18 May 2009 13:22:34 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=261</guid>
		<description><![CDATA[A few months ago I wrote about a possible method for centrally configuring HTTP cache headers in Rack based web applications which I called Rack::CacheHeaders. This is useful if your application&#8217;s architecture involves tools like Squid or Varnish, or if you are generally interested in harvesting the numerous advantages of HTTP caching for your web [...]]]></description>
			<content:encoded><![CDATA[<p>A few months ago I <a href="http://nutrun.com/weblog/rack-cache-headers/" title="nutrun  &raquo; Blog Archive   &raquo; Rack cache headers">wrote</a> about a possible method for centrally configuring HTTP cache headers in <a href="http://rack.rubyforge.org/" title="Rack: a Ruby Webserver Interface">Rack</a> based web applications which I called <code>Rack::CacheHeaders</code>. This is useful if your application&#8217;s architecture involves tools like <a href="http://www.squid-cache.org/" title="squid : Optimising Web Delivery">Squid</a> or <a href="http://varnish.projects.linpro.no/" title="Varnish - Trac">Varnish</a>, or if you are generally interested in harvesting the numerous advantages of HTTP caching for your web application.</p>
<p>The code has evolved a bit since and proven useful in a number of production systems. I created a <a href="http://gist.github.com/113441" title="gist: 113441 - GitHub">gist</a> of <code>Rack::CacheHeaders</code> in case someone else finds it handy. The tool is not exhaustive in terms of policies as found in the HTTP <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html" title="HTTP/1.1: Caching in HTTP">specs</a>, it&#8217;s a collection of the ones we needed in the projects it&#8217;s been used so far. Consider adding ones you need to the gist to make the code more complete and widely useful.</p>
<p><code>Rack::CacheHeaders</code> allows configuring HTTP cache policy response headers based on request URI patterns. For example, to set the <code>Cache-Control: max-age</code> header for a <code>/guitars/:id</code> resource to one hour:</p>
<pre>
Rack::CacheHeaders.configure do |cache|
  cache.max_age(/^\/guitars\/d+$/, 3600)
end
</pre>
<p><a href="http://gist.github.com/113441" title="gist: 113441 - GitHub">Download/develop Rack::CacheHeaders</a></p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/Xw1hobGSM3E" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/rackcacheheaders-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/rackcacheheaders-code/</feedburner:origLink></item>
		<item>
		<title>97 Things Every Software Architect Should Know</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/lrEjngk4Ris/</link>
		<comments>http://nutrun.com/weblog/97-things-every-software-architect-should-know/#comments</comments>
		<pubDate>Sat, 28 Feb 2009 13:26:27 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=258</guid>
		<description><![CDATA[A few months ago I wrote one of the axioms for a community effort called 97 Things Every Software Architect Should Know which was driven and edited by Richard Monson-Haefel. This collection of principles, as contributed by an impressive range of software architects around the world, was recently released as a book by O&#8217;Reilly Media [...]]]></description>
			<content:encoded><![CDATA[<p>A few months ago I wrote one of the axioms for a community effort called <a href="http://97-things.near-time.net/wiki" title="Home Page for 97 Things 		 [97 Things] : Near-Time">97 Things Every Software Architect Should Know</a> which was driven and edited by <a href="http://www.monson-haefel.com/" title="Monson-Haefel's Web Site">Richard Monson-Haefel</a>. This collection of principles, as contributed by an impressive range of software architects around the world, was recently released as a <a href="http://oreilly.com/catalog/9780596522698/index.html" title="97 Things Every Software Architect Should Know | O'Reilly Media">book</a> by <a href="http://oreilly.com/" title="O'Reilly Media - Spreading the knowledge of technology innovators">O&#8217;Reilly Media</a> and is well worth a look if you&#8217;re interested in pragmatic advice based on how some of our colleagues approach technology projects.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/lrEjngk4Ris" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/97-things-every-software-architect-should-know/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/97-things-every-software-architect-should-know/</feedburner:origLink></item>
		<item>
		<title>Caching proxy fronted web consumer</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/dhGygk-Fs1Y/</link>
		<comments>http://nutrun.com/weblog/caching-proxy-fronted-web-consumer/#comments</comments>
		<pubDate>Sat, 14 Feb 2009 14:31:16 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=242</guid>
		<description><![CDATA[Consider an application which as part of its functionality queries a product search web service.

WEB_SERVICE_ADDRESS = 'http://www.example.com'

url = URI.parse(WEB_SERVICE_ADDRESS)

Net::HTTP.start(url.host, url.port) do &#124;http&#124;
  http.get('/product-search', 'q' => 'guitar')
end

Inspecting the response headers, we notice the web service instructs consumers that the results of the query will remain the same for one hour.

curl -I "http://www.example.com/product-search?q=guitar"

HTTP/1.1 200 OK
Content-Type: text/html
Cache-Control: [...]]]></description>
			<content:encoded><![CDATA[<p>Consider an application which as part of its functionality queries a product search web service.</p>
<pre>
WEB_SERVICE_ADDRESS = 'http://www.example.com'

url = URI.parse(WEB_SERVICE_ADDRESS)

Net::HTTP.start(url.host, url.port) do |http|
  http.get('/product-search', 'q' => 'guitar')
end
</pre>
<p>Inspecting the response headers, we notice the web service instructs consumers that the results of the query will remain the same for one hour.</p>
<pre>
curl -I "http://www.example.com/product-search?q=guitar"

HTTP/1.1 200 OK
Content-Type: text/html
<strong>Cache-Control: max-age=3600, must-revalidate</strong>
Content-Length: 32650
Date: Sat, 14 Feb 2009 13:53:31 GMT
Age: 0
Connection: keep-alive
</pre>
<p>At this point we can choose to ignore the cache control header and keep on querying the service for this specific resource regardless of whether the response is going to be the same. This is suboptimal for the consumer, which will suffer unnecessary latency penalties, the service, which will have to respond to inessential requests, and the network which will be subject to unnecessary bandwidth usage. Another option involves making the web consumer aware of the service&#8217;s caching policies so that it only queries for data that it doesn&#8217;t have or data that&#8217;s become stale. This option remedies the above problems but introduces additional complexity to the consumer.</p>
<p>A third option involves introducing a caching proxy to the web consumer&#8217;s stack responsible for mediating the service/consumer interactions solely based on the content&#8217;s caching characteristics.</p>
<p><a href="http://www.flickr.com/photos/nutrun/3278914298/" title="caching-proxy-fronted-web-consumer by nutrunflickr, on Flickr"><img src="http://farm4.static.flickr.com/3054/3278914298_f039f380ff_o.png" width="422" height="149" alt="caching-proxy-fronted-web-consumer" /></a></p>
<p>Benefits of this approach include: The consumer never has to deal with any caching logic; No effort is required in re-implementing cache handling code; It is likely that the caching engine will perform better than custom caching code in the consumer because it&#8217;s been built and optimized for this purpose; The caching proxy can be re-used by more than one types of consumer or more than one instances of the same consumer in the stack. As a possible side-effect, the caching proxy is an additional layer to the consumer stack and this can result in network (the consumer&#8217;s LAN) latency.</p>
<p>Here&#8217;s the configuration needed in order to use <a href="http://varnish.projects.linpro.no/" title="Varnish - Trac">Varnish</a> as a caching web consumer proxy for the above example.</p>
<pre>
<strong># varnish.conf</strong>

backend default {
  .host = "www.example.com";
  .port = "http";
}
</pre>
<p>The only thing that changes in the consumer is the address it directs its requests to.</p>
<pre>
WEB_SERVICE_ADDRESS = <strong>'http://service-proxy'</strong>

url = URI.parse(WEB_SERVICE_ADDRESS)

Net::HTTP.start(url.host, url.port) do |http|
  http.get('/product-search', 'q' => 'guitar')
end
</pre>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/dhGygk-Fs1Y" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/caching-proxy-fronted-web-consumer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/caching-proxy-fronted-web-consumer/</feedburner:origLink></item>
		<item>
		<title>Distributed key-value store indexing</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/k_Lx-PfL0PM/</link>
		<comments>http://nutrun.com/weblog/distributed-key-value-store-indexing/#comments</comments>
		<pubDate>Sun, 01 Feb 2009 13:57:36 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=233</guid>
		<description><![CDATA[Distributed key-value stores present an interesting alternative to some of the functionality relational databases are commonly employed for. Advantages include improved performance, easy replication, horizontal scaling and redundancy.
By nature, key value stores offer one way of retrieving data, by some sort of primary key which uniquely identifies each entry. But what about queries that require [...]]]></description>
			<content:encoded><![CDATA[<p>Distributed key-value stores present an interesting alternative to some of the functionality relational databases are commonly employed for. Advantages include improved performance, easy replication, horizontal scaling and redundancy.</p>
<p>By nature, key value stores offer one way of retrieving data, by some sort of primary key which uniquely identifies each entry. But what about queries that require more elaborate input in order to collect relevant entries? Full text search engines like <a href="http://www.sphinxsearch.com/" title="Sphinx - Free open-source SQL full-text search engine" rel="nofollow">Sphinx</a> and <a href="http://lucene.apache.org/java/docs/" title="Apache Lucene - Overview" rel="nofollow">Lucence</a> do exactly this and when used in conjunction with a database will query their indexes and return a collection of ids which are then used to retrieve the results from the database. Full text search engines support indexing data sources other than RDBMSs, so there&#8217;s no reason why one couldn&#8217;t index a distributed key-value store.</p>
<p><a href="http://www.flickr.com/photos/nutrun/3244315588/" title="distributed-key-value-store-index by nutrunflickr, on Flickr"><img src="http://farm4.static.flickr.com/3515/3244315588_b9e2f08356_o.png" width="413" height="390" alt="distributed-key-value-store-index" /></a></p>
<p>Here, we&#8217;ll look at how we can integrate Sphinx with <a href="http://memcachedb.org/" title="MemcacheDB: A distributed key-value storage system designed for persistent">MemcacheDB</a>, a distributed key-value store which conforms to the <a href="http://www.danga.com/memcached/" rel="nofollow" title="memcached: a distributed memory object caching system">memcached</a> protocol and uses Berkeley DB as its storage back-end.</p>
<p>Sphinx comes with an <a href="http://www.sphinxsearch.com/docs/current.html#xmlpipe2" title="Sphinx - Free open-source SQL full-text search engine" rel="nofollow">xmlpipe2 datasource</a>, a generic XML interface aimed at simplifying custom integration. What this means is that our application can transform content from MemcacheDB into this format and feed it to Sphinx for indexing. The highlighted lines from the following Sphinx configuration instruct Sphinx to use the <code>xmlpipe2</code> source type and invoke the <code>ruby /app/lib/sphinxpipe.rb</code> script in order to retrieve the data to index.</p>
<pre>
<strong># sphinx.conf</strong>

source products_src
{
  <strong>type = xmlpipe2</strong>
  <strong>xmlpipe_command = ruby /app/lib/sphinxpipe.rb</strong>
}

index products
{
  source = products_src
  path = /app/sphinx/data/products
  docinfo = extern
  mlock = 0
  morphology = stem_en
  min_word_len = 1
  charset_type = utf-8
  enable_star = 1
  html_strip = 0
}

indexer
{
  mem_limit = 256M
}

searchd
{
  port = 3312
  log = /app/sphinx/log/searchd.log
  query_log = /app/sphinx/log/query.log
  read_timeout = 5
  max_children = 30
  pid_file = /app/sphinx/searchd.pid
  max_matches = 10000
  seamless_rotate = 1
  preopen_indexes = 0
  unlink_old = 1
}
</pre>
<p>Following is a Product class. Each product instance can present itself as <code>xmlpipe2</code> data. The class itself gets the entire product catalog as a <code>xmlpipe2</code> data source. It also has a <code>search</code> method used for querying Sphinx and retrieving matched products from MemcacheDB. Finally, there&#8217;s a <code>bootstrap</code> method for populating the store with some example data.</p>
<pre>
<strong># product.rb</strong>

require "rubygems"
require "xml/libxml"
require "memcached"
require "riddle"

class Product
  attr_reader :id
  MEM = Memcached.new('localhost:21201')

  def initialize(id, title)
    @id, @title = id, title
  end

  def to_sphinx_doc
    sphinx_document = XML::Node.new('sphinx:document')
    sphinx_document['id'] = @id
    sphinx_document &lt;&lt; title = XML::Node.new('title')
    title &lt;&lt; @title
    sphinx_document
  end

  <strong># Query sphinx and load products with matched ids from MemcacheDB</strong>
  def self.search(query)
    client = Riddle::Client.new
    client.match_mode = :any
    client.max_matches = 10_000
    results = client.query(query, 'products')
    ids = results[:matches].map {|m| m[:doc].to_s}
    MEM.get(ids) if ids.any?
  end

  <strong># Load all products from MemcacheDB and convert them to xmlpipe2 data</strong>
  def self.sphinx_datasource
    docset = XML::Document.new.root = XML::Node.new("sphinx:docset")
    docset &lt;&lt; sphinx_schema = XML::Node.new("sphinx:schema")
    sphinx_schema &lt;&lt; sphinx_field = XML::Node.new('sphinx:field')
    sphinx_field['name'] = 'title'

    keys = MEM.get('product_keys')
    products = MEM.get(keys)
    products.each { |id, product| docset &lt;&lt; product.to_sphinx_doc }

    %(&lt;?xml version="1.0" encoding="utf-8"?&gt;\n#{docset})
  end

  <strong># Create a some products and store them in MemcacheDB</strong>
  def self.bootstrap
    product_ids = ('1'..'5').to_a.inject([]) do |ids, id|
      product = Product.new(id, "product #{id}")
      MEM.set(product.id, product)
      ids &lt;&lt; id
    end
    MEM.set('product_keys', product_ids)
  end
end
</pre>
<p>The <code>sphinxpipe.rb</code> script looks like this.</p>
<pre>
<strong># sphinxpipe.rb</strong>
Product.bootstrap
puts Product.sphinx_datasource
</pre>
<p>With MemcacheDB (or even memcached for the purpose of this example) running, we can tell Sphinx to create an index of products by invoking <code>indexer --all -c sphinx.conf</code> and then start the search daemon &#8211; <code>searchd -c sphinx.conf</code>. Now we&#8217;re ready to start querying the index and retrieving results from the distributed store.</p>
<pre>
puts Product.search('product 1').inspect
</pre>
<p>It is not uncommon for the database to become a performance hotspot. The integration of a fast, distributed key-value store with an efficient search engine can be an interesting substitute for high throughput data retrieval operations.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/k_Lx-PfL0PM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/distributed-key-value-store-indexing/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/distributed-key-value-store-indexing/</feedburner:origLink></item>
		<item>
		<title>State separation</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/yelXLmV9BTU/</link>
		<comments>http://nutrun.com/weblog/state-separation/#comments</comments>
		<pubDate>Sun, 01 Feb 2009 03:06:14 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=230</guid>
		<description><![CDATA[It is usual for web applications to deal with serving content specific to a user&#8217;s session. This makes web caching harder to implement as we don&#8217;t want content that is meant to be viewed by a particular user being cached and accidentally offered to others. Some HTTP accelerators like Varnish choose to by default completely [...]]]></description>
			<content:encoded><![CDATA[<p>It is usual for web applications to deal with serving content specific to a user&#8217;s session. This makes web caching harder to implement as we don&#8217;t want content that is meant to be viewed by a particular user being cached and accidentally offered to others. Some HTTP accelerators like <a href="http://varnish.projects.linpro.no/" title="Varnish - Trac">Varnish</a> choose to by default completely ignore responses that contain cookies. However, not all content is always tied to a user&#8217;s session, and if that content doesn&#8217;t change in real time, it makes sense to cache the parts that are common to all users in order to improve efficiency. With this in mind, one logical split could be made between parts of the system that are globally cache friendly and ones that aren&#8217;t.</p>
<p>Consider online retailer websites which usually operate in two modes, one for visitors and one for logged in users. Logged in users are presented with a customized, session specific experience, yet data like the product catalog is essentially the same regardless of whether one is logged in or not and it makes sense for everyone to be accessing the same cached copy of a common resource.</p>
<p>A possible solution involves creating two separate web applications, one entirely dedicated to stateless interactions and one meant for pages that are rendered as part of a user&#8217;s session. This might seem like overkill, but it clearly enforces the divide between what can and what can&#8217;t be cached. It also promotes reuse of the system&#8217;s web caching layer, which now serves content to site &#8220;visitors&#8221; as well as to the stateful components. The stateful application can delegate requests for potentially cached content to its stateless counterpart via the caching layer and decorate the responses with session specific data.</p>
<p><a href="http://www.flickr.com/photos/nutrun/3242283753/" title="split_by_state by nutrunflickr, on Flickr"><img src="http://farm4.static.flickr.com/3128/3242283753_9731c494c0_o.png" width="378" height="271" alt="split_by_state" /></a></p>
<p>Web caching presents but one way to cache data that remains static for predefined periods of time. Apart from harnessing proven existing tools, this form of caching comes with the advantage that its policies are universally understood and can significantly improve a website&#8217;s efficiency in ways beyond the maintainer&#8217;s control. Retrofitting web caching into an application that hasn&#8217;t been designed with it mind can be difficult, therefore it is worth to logically separate cacheable and non cacheable resources early on.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/yelXLmV9BTU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/state-separation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/state-separation/</feedburner:origLink></item>
		<item>
		<title>Live component rotation</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/3Fc-sXBg4Wo/</link>
		<comments>http://nutrun.com/weblog/live-component-rotation/#comments</comments>
		<pubDate>Thu, 22 Jan 2009 01:42:22 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=226</guid>
		<description><![CDATA[Many applications comprise of a number of components, the majority of which are shared by others in the system. Different parts of the system exercise their collaborators in a variety of ways, think of a website where data is periodically processed by jobs and stored in a database while presentation modules handle rendering the data [...]]]></description>
			<content:encoded><![CDATA[<p>Many applications comprise of a number of components, the majority of which are shared by others in the system. Different parts of the system exercise their collaborators in a variety of ways, think of a website where data is periodically processed by jobs and stored in a database while presentation modules handle rendering the data in ways meaningful to end users. Shared resources can yield the unwanted side effect of performance degradation when a given component is being pushed too hard to perform part of its tasks, affecting each piece of the system that depends on it. In the shared database website example, the website might suffer low response times while potentially heavy on the database processing jobs are running.</p>
<p>One way of getting around this problem involves creating more than one instances of the shared resource, one of which is considered &#8220;live&#8221;, the one the system&#8217;s clients interact with, and perform expensive operations on a copy which will itself become live the moment these operations conclude. This solution does not apply to every situation but can be useful in scenarios where real time is not a concern. In the example website&#8217;s case, we can create a copy of the database on which we run the processing jobs. The front end components run off the &#8220;stale&#8221;, live database copy whose performance is not affected by the jobs. Once the jobs complete we can switch databases and repeat the live component rotation process as needed. Live component rotation also nicely lends itself to distribution, as component copies can exist on different physical hosts.</p>
<p>Virtualization and cloud computing make this method all the more interesting. Imagine hosting a database server on Amazon EC2 with its static data stored on an EBS volume. We can snapshot the EBS volume, fire up a new EC2 instance, attach the snapshot to it, run the job and rotate live database instances once the jobs are complete with most parts of the system never having to worry about the costly operations taking place.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/3Fc-sXBg4Wo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/live-component-rotation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/live-component-rotation/</feedburner:origLink></item>
		<item>
		<title>Code on demand</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/l6CEX9_BFT8/</link>
		<comments>http://nutrun.com/weblog/code-on-demand/#comments</comments>
		<pubDate>Sat, 10 Jan 2009 22:23:25 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=211</guid>
		<description><![CDATA[Code-on-demand on the web is commonly encountered in the form of JavaScript or applets. As we examine the web as a platform for services spanning beyond the typical server/browser interaction, it&#8217;d be interesting to further explore the code-on-demand constraint from a service integration perspective.
One of the advantages of offering executable code alongside a service&#8217;s data [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm#sec_5_1_7" title="Fielding Dissertation: CHAPTER 5: Representational State Transfer (REST)">Code-on-demand</a> on the web is commonly encountered in the form of JavaScript or applets. As we examine the web as a platform for services spanning beyond the typical server/browser interaction, it&#8217;d be interesting to further explore the code-on-demand constraint from a service integration perspective.</p>
<p>One of the advantages of offering executable code alongside a service&#8217;s data is client simplification by code reuse. For example, we can distribute a library that&#8217;s specific to the data on offer, so interested clients can make use of that functionality and avoid having to re-implement it. Another advantage is distributing computational load, which would otherwise have to be handled by the server, to clients.</p>
<p>To put things into perspective, consider a simplistic web API call that lists guitar models. Much like a JavaScript include, the response to <code>http://example.com/guitars</code> contains a line which advertises a guitar model Ruby library available at <code>/libguit.rb</code>.</p>
<pre>
&lt;!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"&gt;
&lt;html&gt;
  &lt;head&gt;
    &lt;title&gt;Guitars&lt;/title&gt;
    &lt;script type="text/ruby" charset="utf-8" src="/libguit.rb"&gt;&lt;/script&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;ul id="guitars"&gt;
      &lt;li&gt;SG&lt;/li&gt;
      &lt;li&gt;Les Paul&lt;/li&gt;
      &lt;li&gt;Tele&lt;/li&gt;
      &lt;li&gt;Strat&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/body&gt;
&lt;/html&gt;
</pre>
<p>The <code>libguit</code> library has one method for iterating over an alphabetically sorted list of guitars.</p>
<pre>
module LibGuit
  class List
    def initialize(guitars)
      @guitars = guitars
    end

    def each_guitar_alphabetically(&amp;block)
      @guitars.sort.each(&amp;block)
    end
  end
end
</pre>
<p>Interested clients can load and use the library together with the retrieved data. Code-on-demand is an optional constraint, so clients that cannot interpret the code, Ruby in this case, or are not interested in using the library can safely ignore it without side effects.</p>
<pre>
require "rubygems"
require "hpricot"
require "open-uri"

doc = Hpricot(open("http://example.com/guitars"))

libguit_address = (doc / 'script[@type="text/ruby"]')[0][:src]
libguit_src = open("http://example.com#{libguit_address}").read
eval(libguit_src)

guitars = (doc / "#guitars li").map { |e| e.html }
LibGuit::List.new(guitars).each_guitar_alphabetically { |g| puts g }
</pre>
<p>This is a superficial example, but imagine a service which advertises an e-commerce website&#8217;s daily updated catalog of products. Instead of clients making queries like <code>/products.xml?category=sports&amp;sort=price</code>, they could once a day download a zipped version of the day&#8217;s entire catalog and a library to manipulate its entries, relieving the service from any further requests and at the same time avoid maintenance costs, in case the data&#8217;s structure changes, as long as this is well abstracted by the on-demand library.</p>
<p>At this point many would voice well founded, security implication based objections. Although one could propose a security system reminiscent to that of applets, I would opt for a controlled environment where trust is granted, such as inter-department service offer/consumption inside the company. Also, in an Internet where many of us store our private email on Gmail or trust Amazon&#8217;s S3 with mission critical data, I wouldn&#8217;t have a problem dynamically loading code provided by, say, Amazon. It&#8217;s not very difficult to put basic safeguards in place to avoid catastrophic effects and, in any case, every option is viable as long as the benefits outweigh the costs.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/l6CEX9_BFT8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/code-on-demand/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/code-on-demand/</feedburner:origLink></item>
		<item>
		<title>Rack cache headers</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/HR9638yHReo/</link>
		<comments>http://nutrun.com/weblog/rack-cache-headers/#comments</comments>
		<pubDate>Sat, 08 Nov 2008 01:17:38 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=204</guid>
		<description><![CDATA[Rack is an interface between web servers and Ruby web frameworks. The HTTP protocol, amongst other things, defines requirements on HTTP caches in terms of header fields that control cache behavior. The purpose of this article is to demonstrate a possible implementation of a piece of Rack Middleware which enables web application developers to configure [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://rack.rubyforge.org/" title="Rack: a Ruby Webserver Interface">Rack</a> is an interface between web servers and Ruby web frameworks. The <a href="http://www.w3.org/Protocols/" title="HTTP - Hypertext Transfer Protocol Overview">HTTP</a> protocol, amongst other things, defines requirements on <a href="http://www.ietf.org/internet-drafts/draft-ietf-httpbis-p6-cache-04.txt" title="">HTTP caches</a> in terms of header fields that control cache behavior. The purpose of this article is to demonstrate a possible implementation of a piece of Rack Middleware which enables web application developers to configure a web application&#8217;s resource cache related headers in a non obtrusive, centralized manner.</p>
<p>Rack supports the notion of Middleware, pieces of code that sit between the HTTP request and response life cycle. Rack::Lint, for example, validates an application&#8217;s requests and responses according to the Rack specification.</p>
<pre>
Rack::Handler::Mongrel.run(
  Rack::Lint.new(app), :Host => "0.0.0.0", <img src='http://nutrun.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> ort => 9999
)
</pre>
<p>Similarly, if we were to implement a cache header producing layer on top of Rack we&#8217;d end up with a construct similar to the following.</p>
<pre>
Rack::Handler::Mongrel.run(
  Rack::Lint.new(
    Rack::CacheHeaders.new(app)
  ), :Host => "0.0.0.0", <img src='http://nutrun.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> ort => 9999
)
</pre>
<p>Here&#8217;s a possible way of configuring how an application provides HTTP caching headers based on URL path patterns.</p>
<pre>
Rack::CacheHeaders.configure do |cache|
  cache.max_age("/rock", 3600)
  cache.expires("/metal", "16:00")
end
</pre>
<p>Following is a potential implementation for the above.</p>
<pre>
module Rack
  class CacheHeaders
    def initialize(app)
      @app = app
    end

    def call(env)
      result = @app.call(env)
      header = Configuration[env['PATH_INFO']].to_header
      result[1][header.key] = header.value
      result
    end

    def self.configure(&amp;block)
      yield Configuration
    end

    class Configuration
      def self.max_age(path, duration)
        paths[path] = MaxAge.new(duration)
      end

      def self.expires(path, date)
        paths[path] = Expires.new(date)
      end

      def self.[](key)
        paths[key]
      end

      def self.paths
        @paths ||= {}
      end
    end

    class MaxAge
      def initialize(duration)
        @duration = duration
      end

      def to_header
        Header.new("Cache-Control", "max-age=#{@duration}, must-revalidate")
      end
    end

    class Expires
      def initialize(date)
        @date = date
      end

      def to_header
        Header.new("Expires", Time.parse(@date).httpdate)
      end
    end

    class Header &lt; Struct.new(:key, :value);end
  end
end
</pre>
<p>The code below is a minimal Rack based application.</p>
<pre>
require "rubygems"
require "rack"

app = proc {|env| [200, {"Content-Type" => "text/plain"}, "hello"]}

Rack::Handler::Mongrel.run(
  Rack::Lint.new(
    Rack::CacheHeaders.new(app)
  ), :Host => "0.0.0.0", <img src='http://nutrun.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> ort => 9999
)
</pre>
<p>In order to observe the caching related headers the application&#8217;s responses are decorated with we can use <code>curl</code> or something similar, i.e <code>curl -I http://0.0.0.0:9999/rock</code> or <code>curl -I http://0.0.0.0:9999/metal</code>. Output should look something like the following.</p>
<pre>
air:~ gmalamid$ curl -I http://0.0.0.0:9999/rock
HTTP/1.1 200 OK
Connection: close
Date: Sat, 08 Nov 2008 00:51:23 GMT
Cache-Control: max-age=3600, must-revalidate
Content-Type: text/plain
Content-Length: 5

air:~ gmalamid$ curl -I http://0.0.0.0:9999/metal
HTTP/1.1 200 OK
Connection: close
Date: Sat, 08 Nov 2008 00:51:16 GMT
Content-Type: text/plain
Expires: Sat, 08 Nov 2008 16:00:00 GMT
Content-Length: 5
</pre>
<p>Understanding and employing HTTP cache configuration not only enables harnessing the power of tools like <a href="http://varnish.projects.linpro.no/" title="Varnish - Trac">Varnish</a> or <a href="http://www.squid-cache.org/" title="squid : Optimising Web Delivery">Squid</a>, it also makes good citizens in a diverse ecosystem of HTTP aware browsers and caches outside an application&#8217;s knowledge or control.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/HR9638yHReo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/rack-cache-headers/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/rack-cache-headers/</feedburner:origLink></item>
		<item>
		<title>HTTP accelerator cache purging</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/DLbieYCkobk/</link>
		<comments>http://nutrun.com/weblog/http-accelerator-cache-purging/#comments</comments>
		<pubDate>Sun, 02 Nov 2008 14:47:37 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=198</guid>
		<description><![CDATA[The use of an HTTP accelerator such as Varnish or Squid in reverse proxy/accelerator mode can drastically improve a web application&#8217;s content delivery capabilities. Successfully implementing caching comes with numerous challenges but the fundamental goal is straightforward: A stack&#8217;s dynamic content generating layer should ideally not have to generate the same content more than once.

require [...]]]></description>
			<content:encoded><![CDATA[<p>The use of an HTTP accelerator such as <a href="http://varnish.projects.linpro.no/" title="Varnish - Trac">Varnish</a> or <a href="http://www.squid-cache.org/" title="squid : Optimising Web Delivery">Squid</a> in reverse proxy/accelerator mode can drastically improve a web application&#8217;s content delivery capabilities. Successfully implementing caching comes with numerous challenges but the fundamental goal is straightforward: A stack&#8217;s dynamic content generating layer should ideally not have to generate the same content more than once.</p>
<pre>
require "rubygems"
require "sinatra"

def guitars
  @@guitars ||= ['Les Paul', 'SG']
end

get "/guitars" do
  guitars * ', '
end
</pre>
<p>This application exposes a <code>/guitars</code> resource, a request for which will always hit the application server if no caching has been in place. This can prove suboptimal had this been a high traffic website, especially if the operation of generating the content is system resource intensive. Luckily this problem has been solved before. A running instance of Varnish, for example, will only require the following configuration to enable caching of all resources the application serves.</p>
<pre>
backend default {
  .host = "127.0.0.1";
  .port = "4567";
}
</pre>
<p>One of the challenges associated with caching has to do with the cached content&#8217;s freshness. We want to relieve server stress as much as possible, but we also need our application&#8217;s consumers to receive correct data at all times. Let&#8217;s assume that the application contacts guitar manufacturers&#8217; websites once a day to refresh its inventory and we have scheduled this operation to complete at 16:00 every day. This suggests that the cached resource should be refreshed every day at four o&#8217;clock in the afternoon to reflect the latest list of available guitar models. One of the ways of achieving this in HTTP is by making use of the <code>Expires</code> header, whose semantics are understood by (hopefully) any caching aware HTTP component.</p>
<pre>
require "time"

get "/guitars" do
  headers "Expires" => Time.parse("16:00").httpdate
  guitars * ', '
end
</pre>
<p>Things aren&#8217;t always as straightforward. In many cases we cannot fully control the exact time or frequency a resource&#8217;s content changes. The example application also comes with an admin interface, allowing the guitar list administrators to manually enter new guitar models.</p>
<pre>
post "/guitars" do
  guitars &lt;&lt; params["guitar"]
  redirect("/guitars")
end
</pre>
<p>It is clear that a means for arbitrary expiration of cached content needs to be available in order to maintain content freshness. With Varnish, this capability comes in two flavors, one of which involves the use of a <code>PURGE</code> HTTP call. The following configuration enables this functionality.</p>
<pre>
acl purge {
  "localhost";
}

sub vcl_recv {
  if (req.request == "PURGE") {
    if (!client.ip ~ purge) {
      error 405 "Not allowed.";
    }
    lookup;
  }
}

sub vcl_hit {
  if (req.request == "PURGE") {
    set obj.ttl = 0s;
    error 200 "Purged.";
  }
}

sub vcl_miss {
  if (req.request == "PURGE") {
    error 404 "Not in cache.";
  }
}
</pre>
<p>To natively make use of this in Ruby, we need to extend the <code>Net::HTTP</code> library to support the <code>PURGE</code> method.</p>
<pre>
require "net/http"
require "uri"

module Net
  class HTTP
    class Purge &lt; HTTPRequest
      METHOD = "PURGE"
      REQUEST_HAS_BODY = false
      RESPONSE_HAS_BODY = false
    end

    def purge(path, initheader=nil)
      request(Purge.new(path, initheader))
    end
  end
end

def purge_cache(u)
  uri = URI.parse(u)
  query = "?#{uri.query}" if uri.query
  Net::HTTP.new(uri.host, uri.port).start {|h| h.purge("#{uri.path}#{query}")}
end
</pre>
<p>Now we can expire the cached <code>/guitars</code> resource every time the list is amended.</p>
<pre>
post "/guitars" do
  guitars &lt;&lt; params["guitar"]
  purge_cache("http://localhost/guitars")
  redirect("/guitars")
end
</pre>
<p>Although this method is effective, there can be cases where the bidirectional coupling between the application and caching layers might be undesirable. With the fundamental functional pieces in place, however, it is not hard to implement a more elaborate strategy such as the one described in <a href="http://www.mnot.net/cache_channels/" title="HTTP Cache Channels">Cache Channels</a> in order to reduce the application layer&#8217;s knowledge of the caching infrastructure.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/DLbieYCkobk" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/http-accelerator-cache-purging/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/http-accelerator-cache-purging/</feedburner:origLink></item>
		<item>
		<title>Parallelize by process</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/Hc8lpxN1OX4/</link>
		<comments>http://nutrun.com/weblog/parallelize-by-process/#comments</comments>
		<pubDate>Sun, 26 Oct 2008 02:57:12 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=189</guid>
		<description><![CDATA[Performing computations in parallel is a popular technique for improving application performance and can be achieved in a number of ways, most commonly by employing threads or by splitting workload in a number of concurrent processes.
Memory usage is often a headache with large dataset computations. While memory optimization is something to be sought after, tracking [...]]]></description>
			<content:encoded><![CDATA[<p>Performing computations in parallel is a popular technique for improving application performance and can be achieved in a number of ways, most commonly by employing threads or by splitting workload in a number of concurrent processes.</p>
<p>Memory usage is often a headache with large dataset computations. While memory optimization is something to be sought after, tracking down memory leaks can become tedious and time consuming. We can decrease the chances of a heavy job running a system&#8217;s memory dry by coming up with a strategy for fragmenting the job into a number of shorter running processes. By doing so, any memory used by a worker process will be released the moment the process completes. Additionally, we can run job fragments in parallel, allow ourselves to harness the operating system&#8217;s multi-core capabilities and potentially distribute worker processes over a number of physical hosts and scale out when the need arises. Smaller processes also dictate more manageable chunks of code which are easier to maintain, optimize and test.</p>
<p>Let&#8217;s look at an example where a job involves fetching a large number of categorized products from various sources and processes them for use by our own application.</p>
<pre>
class Job
  def perform
    ADDRESSES.each do |address|
      category = load_category(address)
      category.products.each { |product| process(product) }
    end
  end

  def process(product)
    #some intensive computation
  end

  def load_category(address)
    #load an addressable category dataset
  end
end
</pre>
<p>Let&#8217;s assume that the <code>ADDRESSES</code> constant in the example is a list consisting of entries such as <code>example.com/toys</code>, <code>example.com/phones</code>, <code>example.org/guitars</code>, etc. The job fetches the addressable by category product datasets, iterates over the products and performs a long processing operation on each. Supposing that after every possible optimization the job takes three hours to complete, we can at best run the job eight times a day. What happens if the product categories are updated more often than eight times a day and a requirement in order for our application to be successful suggests that it needs to deal with fresh data all the time?</p>
<p>One natural split can involve creating a worker process for each address entry. We can do so by extracting the majority of the code from the <code>Job</code> class into a <code>Worker</code> class meant to run as a standalone process.</p>
<pre>
class Worker
  def self.process_category(address)
    category = load_category(address)
    category.products.each { |product| process(product) }
  end

  def self.process(product)
    #some intensive computation
  end

  def self.load_category(address)
    #load an addressable category dataset
  end
end

Worker.process_category(ARGV[0]) if ARGV.size == 1
</pre>
<p>Each worker will operate on a significantly smaller dataset and will complete much faster than the initial long running job. Any memory used by each worker will be immediately released the moment the process finishes execution.</p>
<p>After the latest change, <code>Job</code> can take on the role of instrumenting the worker processes. We start by only allowing an arbitrary maximum number of concurrent workers, three in this case.</p>
<pre>
require "thread"

class Job
  def initialize
    @worker_count, @mutex = 3, Mutex.new
  end

  def perform
    ADRESSESES.each do |address|
      sleep 0.1 until @worker_count > 0
      @worker_count -= 1
      Thread.new do
        system("ruby worker.rb #{address}")
        @mutex.synchronize {@worker_count += 1}
      end
    end
  end
end
</pre>
<p>At this point it is a good idea to run the job and monitor the time it takes for it to complete while also measuring system resource usage. This way we can determine the optimal number of concurrent worker processes based on the system&#8217;s specs. Once available resources have been exhausted and both <code>Job</code> and <code>Worker</code> have been sufficiently optimized, we can start thinking about running workers on separate physical nodes.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/Hc8lpxN1OX4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/parallelize-by-process/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/parallelize-by-process/</feedburner:origLink></item>
	</channel>
</rss>
