<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"><channel><title>Nicholas Piël</title> <link>http://nichol.as</link> <description /> <lastBuildDate>Wed, 21 Jul 2010 16:56:36 +0000</lastBuildDate> <generator>http://wordpress.org/?v=2.9.2</generator> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/Nichol4s" /><feedburner:info uri="nichol4s" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId>Nichol4s</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item><title>ZeroMQ an introduction</title><link>http://feedproxy.google.com/~r/Nichol4s/~3/S2DARW0sr2o/zeromq-an-introduction</link> <comments>http://nichol.as/zeromq-an-introduction#comments</comments> <pubDate>Wed, 23 Jun 2010 08:50:59 +0000</pubDate> <dc:creator>Nicholas Piël</dc:creator> <category><![CDATA[Uncategorized]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[scalability]]></category> <category><![CDATA[zeromq]]></category><guid isPermaLink="false">http://nichol.as/?p=606</guid> <description><![CDATA[
ZeroMQ is a messaging library, which allows you to design a complex communication system without much effort. It has been wrestling with how to effectively describe itself in the recent years. In the beginning it was introduced as &#8216;messaging middleware&#8217; later they moved to &#8216;TCP on steroids&#8217; and right now it is a &#8216;new layer [...]]]></description> <content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fnichol.as%2Fzeromq-an-introduction"><br /> <img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fnichol.as%2Fzeromq-an-introduction&amp;source=nichol4s&amp;style=normal&amp;service=bit.ly&amp;service_api=R_067b6a3b78b750bb3ff5fdfa4e005c82" height="61" width="50" title="ZeroMQ an introduction" alt=" ZeroMQ an introduction" /><br /> </a></div><p><a href="http://www.zeromq.org/">ZeroMQ</a> is a messaging library, which allows you to design a complex communication system without much effort. It has been wrestling with how to effectively describe itself in the recent years. In the beginning it was introduced as &#8216;messaging middleware&#8217; later they moved to &#8216;TCP on steroids&#8217; and right now it is a &#8216;new layer on the networking stack&#8217;.</p><p><img class="size-medium wp-image-618 alignright" title="zeromq" src="http://nichol.as/wp-content/uploads/2010/06/zeromq1-300x115.png" alt="zeromq1 300x115 ZeroMQ an introduction" width="126" height="48" /></p><p>I had some trouble understanding ZeroMQ at first and really had to reset my brain. First of all, it is not a complete messaging system such as <a href="http://www.rabbitmq.com/">RabbitMQ</a> or <a href="http://activemq.apache.org/">ActiveMQ</a>. I know the guys of Linden Research <a href="http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes">compared them</a>, but it is apples and oranges. A full flexed messaging system gives you an out of the box experience.  Unwrap it, configure it, start it up and you&#8217;re good to go once you have figured out all its complexities.</p><p>ZeroMQ is not such a system at all; it is a simple messaging library to be used programmatically. It basically gives you a pimped socket interface allowing you to quickly build your own messaging system.</p><h2>Float like a butterfly, sting like a bee</h2><p>But why use ZeroMQ and not just use the low level Berkeley socket interface or a high level messaging system? I think the answer is balance. You probably want the flexibility and performance of the low level while still having the ease of implementation of the high level. However, maintaining raw sockets is difficult and cumbersome when you want to implement a scalable system. A high level system often works perfect if you use it for the situation it was designed for, but it can be difficult to change core elements of the system and its ease of use often comes with a cost in performance. This isn&#8217;t a problem that is limited to messaging systems only. We can see the previous dilemma also in web frameworks; it could very well be that this is exactly the reason why &#8216;Micro Frameworks&#8217; gain in popularity.</p><p>I believe that ZeroMQ perfectly fits this gap between the high and the low level, so what are its features?</p><h3>Performance</h3><p>ZeroMQ is blazing fast.  It is orders of magnitude faster than most AMQP messaging systems and it can obtain this high performance because of the following techniques:</p><ul><li>It does not have the overhead of an over-engineered protocol such as AMQP</li><li>It can make use of efficient transports such as <a title="Pragmatic General Multicast" href="http://en.wikipedia.org/wiki/Pragmatic_General_Multicast">reliable Multicast</a> or the<a href="http://www.zeromq.org/whitepapers:y-suite"> Y-suite IPC transport</a></li><li>It makes use of <a href="http://www.zeromq.org/whitepapers:design-v01#toc10">intelligent message batching</a>. This allows 0MQ to efficiently utilize a TCP/IP connection by minimizing not only protocol overhead but also system calls.</li></ul><h3>Simplicity</h3><p>The API is deceptively simple, and it makes sending messages really simple compared with a raw socket implementation where you have to continuously &#8216;feed&#8217; the socket buffer. In ZeroMQ you can just fire off an async send call, it will queue the message in a separate thread and do all the work for you.  Because of this async nature, your application does not have to waste time waiting until the message has been flushed.  The async nature of 0MQ makes it a perfect companion for an event-based framework.</p><p>ZeroMQ&#8217;s simple wire protocol fits perfectly in the current time setting where we have lots of different transport protocols. With AMQP it always felt a bit weird to use an extra protocol layer on top. 0MQ gives you complete freedom on how you encode your message, as it will just interpret it as a blob. So you can send simple <a href="http://www.json.org/">JSON</a> messages, go the binary route with for example <a href="http://bsonspec.org/">BSON</a>, <a href="http://code.google.com/p/protobuf/">Protocol Buffers</a> or <a href="http://incubator.apache.org/thrift/">Thrift</a> and all this without feeling <a href="http://nichol.as/wp-content/uploads/2010/06/guilty-puppy.jpg">guilty</a>.</p><h3>Scalability</h3><p>While ZeroMQ sockets look low level they provide lots of features. A single ZeroMQ socket can for example connect to multiple end points and automatically load balance messages over them. Or it can work as some sort of Fan-In, collecting messages from multiple sources through a single socket.</p><p>ZeroMQ  follows a <a href="http://www.zeromq.org/whitepapers:brokerless">brokerless design</a> so that there is no single point of failure. Combine this with its simplicity and performance and you get something that you can use to make your application distributed.</p><h2>Implementing a messaging layer with ZeroMQ</h2><p>In the next section I will show how to design and implement a messaging layer with ZeroMQ.  For the code example I will use Brian Granger&#8217;s <a href="http://github.com/zeromq/pyzmq">PyZMQ</a>, which is the excellent Python binding to ZeroMQ.</p><p>Implementing a ZeroMQ messaging layer is a three-step approach:</p><ol><li>Choose a transport</li><li>Set up the infrastructure</li><li>Select a messaging pattern</li></ol><h3>Choosing a transport</h3><p>The first step is to choosing a transport. ZeroMQ provides 4 different transports:</p><ol><li><em><a href="http://api.zeromq.org/zmq_inproc.html">INPROC</a></em> an In-Process communication model</li><li><em><a href="http://api.zeromq.org/zmq_ipc.html">IPC</a></em> an Inter-Process communication model</li><li><em><a href="http://api.zeromq.org/zmq_pgm.html">MULTICAST</a></em> multicast via PGM, possibly encapsulated in UDP</li><li><em><a href="http://api.zeromq.org/zmq_tcp.html">TCP</a></em> a network based transport</li></ol><p>The <em>TCP</em> transport is often the best choice, it is very performant and robust. However, when there is no need to cross the machine border it can be interesting to look at the <em>IPC</em> or <em>INPROC</em> protocol to lower the latency even more. The <em>MULTICAST</em> transport can be interesting in special cases. But personally, I am a bit careful with applying multicast, as it is difficult to understand how it will behave when scaling up. Think of issues such as figuring out how many multicast groups you can create with this or that hardware and how much stress it is going to put on the different switches in your network. If you want to be sure that your code runs cross platforms it is probably best to go with <em>TCP</em> as the other transports are not guaranteed to be available on the different platforms.</p><h3>Setting up the infrastructure</h3><p>When you have decided upon your transport you will have to think about how the different components are connected to each other. It is simply answering the question: &#8220;Who connects to whom?&#8221;  You probably want the most stable part of the network to <em><a href="http://api.zeromq.org/zmq_bind.html">BIND</a></em> on a specific port and have the more dynamic parts <em><a href="http://api.zeromq.org/zmq_connect.html">CONNECT</a></em> to that. In the image below we have depicted how a server binds to a certain port and how a client connects to it.</p><p style="text-align: center;"><a href="http://nichol.as/wp-content/uploads/2010/06/cs.png"><img class="size-full wp-image-682  aligncenter" title="cs" src="http://nichol.as/wp-content/uploads/2010/06/cs.png" alt="cs ZeroMQ an introduction" width="256" height="64" /></a></p><p>It is possible that both ends of the networks are relatively  dynamic so that it is difficult to have a single stable connection point. If this is the case, you could make use of the forwarding devices that ZeroMQ provides.  These devices can bind to 2 different ports and forward messages from one end to the other. By doing so, the forwarding device can become the stable point in your network where each component can connect to.  ZeroMQ provides three kinds of devices:</p><ol><li><em><a href="http://api.zeromq.org/zmq_queue.html">QUEUE</a></em>, a forwarder for the request/response messaging pattern</li><li><em><a href="http://api.zeromq.org/zmq_forwarder.html">FORWARDER</a>, </em>a forwarder for the publish/subscribe messaging pattern</li><li><em><a href="http://api.zeromq.org/zmq_streamer.html">STREAMER</a>, </em>a forwarder for the pipelined messaging pattern</li></ol><p>In the image below we can see such a device being used, in this situation both the client and the server initialize a connection to the forwarder, which binds to two different ports. Using such a device will remove the need of extra application logic, as you will not need to maintain a list of connected peers.</p><p style="text-align: center;"><a href="http://nichol.as/wp-content/uploads/2010/06/cfs.png"><img class="size-full wp-image-683  aligncenter" title="cfs" src="http://nichol.as/wp-content/uploads/2010/06/cfs.png" alt="cfs ZeroMQ an introduction" width="364" height="64" /></a></p><h3>Selecting a message pattern</h3><p>The previous steps build the infrastructure but did not specify the message flow. The next step is to think carefully about the message pattern each component should follow. The patterns that 0MQ supports are:</p><ol><li><em><a href="http://api.zeromq.org/zmq_socket.html#_request_reply_pattern">REQUEST/REPLY</a>, </em>bidirectional, load balanced and state based</li><li><em><a href="http://api.zeromq.org/zmq_socket.html#_publish_subscribe_pattern">PUBLISH/SUBSCRIBE</a>, </em>publish to multiple recipients at once</li><li><em><a href="http://api.zeromq.org/zmq_socket.html#_pipeline_pattern">UPSTREAM / DOWNSTREAM</a>, </em>distribute data to nodes arranged in a pipeline</li><li><em><a href="http://api.zeromq.org/zmq_socket.html#_exclusive_pair_pattern">PAIR</a>, </em>communication exclusively between peers</li></ol><p style="text-align: left;">I will explain them a bit more below.</p><p><br/><br/></p><h4>Request Reply</h4><p><a href="http://nichol.as/wp-content/uploads/2010/06/reqrep1.png"><img class="alignleft size-full wp-image-691" title="reqrep" src="http://nichol.as/wp-content/uploads/2010/06/reqrep1.png" alt="reqrep1 ZeroMQ an introduction" width="235" height="184" /></a>The request reply paradigm is very common and can be found in most type of servers. For example: HTTP, POP or IMAP. This pattern has a certain state associated with it as a request has to be followed by a reply. The client uses a socket of type <em>REQ</em> as it will initiate the request by performing a .<strong>send()</strong> on the socket. The server uses a socket of type <em>REP, </em>and it will start by performing a<strong> .recv() </strong>to read the incoming request, after which it can send its reply.</p><p>ZeroMQ greatly simplifies this pattern by allowing you to have a single socket connect to multiple end points. ZeroMQ will automatically balance requests over the different peers.</p><p>The Python code below will create an echo server that listens on port 5000 with a <em>REP</em> socket. It will then loop an alternation of performing <strong>.recv()</strong> for incoming requests and then <strong>.send() </strong>a reply to them.</p><pre class="brush: python;">
import zmq
context = zmq.Context()
socket = context.socket(zmq.REP)
socket.bind(&quot;tcp://127.0.0.1:5000&quot;)

while True:
    msg = socket.recv()
    print &quot;Got&quot;, msg
    socket.send(msg)
</pre><p>When you have multiple clients connected to this server the ZMQ socket will fair queue between all incoming requests. Now, if you want your client to be able to connect to multiple servers as well, you can take the above code, change port 5000 to 6000 and use it to run an extra server. The following client code will then be able to use both of the servers:</p><pre class="brush: python;">
import zmq
context = zmq.Context()
socket = context.socket(zmq.REQ)
socket.connect(&quot;tcp://127.0.0.1:5000&quot;)
socket.connect(&quot;tcp://127.0.0.1:6000&quot;)

for i in range(10):
    msg = &quot;msg %s&quot; % i
    socket.send(msg)
    print &quot;Sending&quot;, msg
    msg_in = socket.recv()
</pre><p>The above sends 10 requests in total but since we are connected to 2 different servers, each server only has to handle 5 requests. Isn&#8217;t that great? With only a few lines of code we were able to create a distributed client/server model.</p><p>Now, if we want to add an extra server to handle our requests we will have to adjust our code. This can be cumbersome as we need to do this for all our clients to let them know it can now balance the requests over an extra server.</p><p><img class="alignright size-full wp-image-684" title="queue" src="http://nichol.as/wp-content/uploads/2010/06/queue.png" alt="queue ZeroMQ an introduction" width="235" height="206" /></p><p>This is exactly where the ZeroMQ devices fit in. Instead of having the clients connect directly to multiple servers it can connect to a single forwarding device. The forwarding device will then reroute all messages to the connected servers.</p><p>Example client output:</p><blockquote><p>Sending msg 0<br /> Sending msg 1<br /> Sending msg 2<br /> Sending msg 3<br /> Sending msg 4<br /> Sending msg 5<br /> Sending msg 6<br /> Sending msg 7<br /> Sending msg 8<br /> Sending msg 9</p></blockquote><p>Example output server 1 at port 5000:</p><blockquote><p>Got msg 0<br /> Got msg 2<br /> Got msg 4<br /> Got msg 6<br /> Got msg 8</p></blockquote><p>Example output server 2 at port 6000:</p><blockquote><p>Got msg 1<br /> Got msg 3<br /> Got msg 5<br /> Got msg 7<br /> Got msg 9</p></blockquote><p><br/><br/></p><h4>Publish Subscribe</h4><p><a href="http://nichol.as/wp-content/uploads/2010/06/broadcast.png"><img class="size-full wp-image-673 alignleft" title="broadcast" src="http://nichol.as/wp-content/uploads/2010/06/broadcast.png" alt="broadcast ZeroMQ an introduction" width="233" height="180" /></a>The Pub/Sub paradigm has gained lots of interest the last few years. You can think of things such as message pushing, XMPP or webhooks. In a pub/sub pattern the components are loosely coupled. This will greatly help you to scale out as there is no need to worry about the subscribers. However, this loose coupling can also lead to unexpected behavior when not fully understood.  A nice metaphor for the Pub/Sub paradigm is thinking of it is a radio station. When you publish messages you send something over a certain frequency, only listeners that have subscribed to that frequency will receive the signal. But also, just as with a radio, if you tuned in to the station after the broadcast you will miss the show.</p><p>It is good to stress that the various message patterns have no coupling with the infrastructure. It is thus possible to bind to a port and publish to the peers that connect to it. But it is also possible to do it the other way around, connect to multiple peers and broadcast to them. The first example resembles the radio metaphor (everybody can tune in), while the second one more resembles yelling at your peers through a megaphone (a selected group). In both situations your peers can decide not to listen to your messages by not subscribing to them.</p><p>The following code shows how you could create a broadcasting server for live soccer events:</p><pre class="brush: python;">
import zmq
from random import choice
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind(&quot;tcp://127.0.0.1:5000&quot;)

countries = ['netherlands','brazil','germany','portugal']
events = ['yellow card', 'red card', 'goal', 'corner', 'foul']

while True:
    msg = choice( countries ) +&quot; &quot;+ choice( events )
    print &quot;-&gt;&quot;,msg
    socket.send( msg )
</pre><p>The server will generate an unlimited amount of events for the different countries and pushes them over a socket of type <em>PUB</em>. Below you can find some example output:</p><blockquote><p>-&gt; portugal corner<br /> -&gt; portugal yellow card<br /> -&gt; portugal goal<br /> -&gt; netherlands yellow card<br /> -&gt; germany yellow card<br /> -&gt; brazil yellow card<br /> -&gt; portugal goal<br /> -&gt; germany corner<br /> &#8230;</p></blockquote><p>Now if we are only interested in events concerning The Netherlands and Germany we can create a client that subscribes to those specific messages:</p><pre class="brush: python;">
import zmq

context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect(&quot;tcp://127.0.0.1:5000&quot;)
socket.setsockopt(zmq.SUBSCRIBE, &quot;netherlands&quot;)
socket.setsockopt(zmq.SUBSCRIBE, &quot;germany&quot;)

while True:
    print  socket.recv()
</pre><p>The client will create a <em>SUB</em> socket, connect to our broadcast server at port 5000 and subscribe to messages starting with &#8216;netherlands&#8217; or &#8216;germany&#8217;. The output will look something like this:</p><blockquote><p>netherlands red card<br /> netherlands goal<br /> netherlands red card<br /> germany foul<br /> netherlands yellow card<br /> germany foul<br /> netherlands goal<br /> netherlands corner<br /> germany foul<br /> netherlands corner<br /> &#8230;</p></blockquote><p><br/><br/></p><h4>Pipelining</h4><p><a href="http://nichol.as/wp-content/uploads/2010/06/pipeline2.png"><img class="size-full wp-image-686 alignleft" title="pipeline2" src="http://nichol.as/wp-content/uploads/2010/06/pipeline2.png" alt="pipeline2 ZeroMQ an introduction" width="233" height="283" /></a>The pipeline pattern looks remarkably similar to the Rep/Req pattern, the difference is that instead of requiring a reply being sent to the requester the reply can be pushed down the pipe. This is a paradigm commonly seen when there is a need to process data  in parallel. For example, lets say we have some sort of system that does face recognition. We have a job server that pushes the images to one of the workers, which will then process it, once finished it will then push it down the stream again towards some sort of collector.</p><p>In the design at the left we can see that a worker will receive its message from an <em>UPSTREAM</em> socket and once they are processed sends them <em>DOWNSTREAM</em>. It routes messages from two different socket types.</p><p>The jobserver can just keep pushing tasks <em>DOWNSTREAM </em>through a single socket but with multiple endpoints. ZeroMQ and recently also PyZMQ can send the messages in a zero-copy manner. This is great if you need to push large messages around and you don&#8217;t want to waste IO cycles.</p><p><br/><br/></p><h4>Paired sockets</h4><p><a href="http://nichol.as/wp-content/uploads/2010/06/paired.png"><img class="size-full wp-image-668 alignleft" title="paired" src="http://nichol.as/wp-content/uploads/2010/06/paired.png" alt="paired ZeroMQ an introduction" width="166" height="189" /></a>Paired sockets are very similar to regular sockets as the communication is bidirectional, there is no specific state stored within the socket and there can only be one connected peer. Most real life problems can be captured in one of the previously explained patterns and I want to recommend that you look at them first before applying this one as it will simplify your problem.</p><p>The figure at the left depicts the infrastructure of a paired socket, the server listens on a certain port and a client connects to it. The red lines indicate the flow of messages, in this pattern both endpoints use a socket of type <em>PAIR</em> and as you can see the messages can flow bidirectional.</p><p>The following code shows how to implement such a thing.  We will bind to a port on one end:</p><pre class="brush: python;">
import zmq
context = zmq.Context()
socket = context.socket(zmq.PAIR)
socket.bind(&quot;tcp://127.0.0.1:5555&quot;)
</pre><p>And on the other end where we will connect to it.</p><pre class="brush: python;">
import zmq
context = zmq.Context()
socket = context.socket(zmq.PAIR)
socket.connect(&quot;tcp://127.0.0.1:5555&quot;)
</pre><p><br/></p><h3>ZeroMQ and the future</h3><p>In this post I have given a short introduction to ZeroMQ, I hope that at this point you will now share my ideas about what a great little library it is. But while the library may feel small it has a grand vision of being <em>the new messaging layer</em>. And really, it is not that weird when you come to think of it. Scalability issues are mostly just communication and portability issues, ZeroMQ can solve these problems for you.</p><p>Lets say you want to create some new sort of database because Redis, Cassandra, TokyoTyrant, Postgres, MongoDB, DabbleDB, CouchDB, HBase, etc. just don&#8217;t serve your needs that well. You create an amazing in memory tree representation for your data and have a blazing fast indexer. Now all you need is some sort of messaging layer such that different clients can talk to your server. Preferably implemented in different programming language and with clustering capabilities.  You could of course create such a messaging framework all by yourself, but that is a lot of hard work.</p><p>A simple solution is to just implement your database as a ZeroMQ server and pick a message protocol (fe JSON). As you have seen by now, implementing such functionality with ZeroMQ is really easy and on top of this you will get almost instant scalability because of the way ZeroMQ can route messages. It will also make it incredibly easy to implement different clients that will communicate with your server.  Basically all you need to do is pick one of the 15 available language bindings, use the same message protocol and you&#8217;re done. Currently the following languages have a ZeroMQ binding: Ada, C, C++, Common Lisp, Erlang, Go, Haskell, Java, Lua, .NET, OOC, Perl, PHP, Python and Ruby.</p><p><a href="http://www.zeromq.org/">ZeroMQ</a> could very well be the new way in how we connect our components. A good example of someone who understands the possibilities of ZeroMQ is Zed Shaw as can be seen with his recent project <a href="http://mongrel2.org/index">Mongrel2</a>. You can use Mongrel2 to bridge the gap between a regular HTTP client and a ZeroMQ component. If you don&#8217;t immediately see how awesome this is you probably have never worked with websockets, comet or flash based sockets. Another way to look at the great possibilities of such an implementation is to think of Facebook&#8217;s <a href="http://www.facebook.com/notes/facebook-engineering/bigpipe-pipelining-web-pages-for-high-performance/389414033919">BigPipe</a> where each Pagelet can transparantly be generated by a different component connected with 0MQ.</p> <img src="http://feeds.feedburner.com/~r/Nichol4s/~4/S2DARW0sr2o" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://nichol.as/zeromq-an-introduction/feed</wfw:commentRss> <slash:comments>29</slash:comments> <feedburner:origLink>http://nichol.as/zeromq-an-introduction</feedburner:origLink></item> <item><title>Benchmark of Python WSGI Servers</title><link>http://feedproxy.google.com/~r/Nichol4s/~3/XLwrzSjnOa8/benchmark-of-python-web-servers</link> <comments>http://nichol.as/benchmark-of-python-web-servers#comments</comments> <pubDate>Mon, 15 Mar 2010 14:29:32 +0000</pubDate> <dc:creator>Nicholas Piël</dc:creator> <category><![CDATA[Uncategorized]]></category> <category><![CDATA[async]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[wsgi]]></category><guid isPermaLink="false">http://nichol.as/?p=432</guid> <description><![CDATA[
It has been a while since the Socket Benchmark of Asynchronous server. That benchmark looked specifically at the raw socket performance of various frameworks, which was being benchmarked by doing a regular HTTP request against the TCP server.  The server itself was dumb and did not actually  understand the headers being send to [...]]]></description> <content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fnichol.as%2Fbenchmark-of-python-web-servers"><br /> <img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fnichol.as%2Fbenchmark-of-python-web-servers&amp;source=nichol4s&amp;style=normal&amp;service=bit.ly&amp;service_api=R_067b6a3b78b750bb3ff5fdfa4e005c82" height="61" width="50" title="Benchmark of Python WSGI Servers" alt=" Benchmark of Python WSGI Servers" /><br /> </a></div><p>It has been a while since the Socket Benchmark of Asynchronous server. That benchmark looked specifically at the raw socket performance of various frameworks, which was being benchmarked by doing a regular HTTP request against the TCP server.  The server itself was dumb and did not actually  understand the headers being send to it. In this benchmark I will be looking at how different <a href="http://wsgi.org/wsgi/WsgiStart">WSGI</a> servers perform at exactly that task; the handling of a full HTTP request.</p><p>I should immediately start with a word of caution. I tried my best to present an objective benchmark of the different WSGI servers. And I truly believe that a benchmark is one of the best methods to present an unbiased comparison. However, a benchmark measures the performance on a very specific domain and it could very well be that this domain is slanted towards certain frameworks. But, if we keep that in mind we can actually put some measurements behind all those &#8216;faster than&#8217; or &#8216;lighter than&#8217; claims you will find everywhere. It is my opinion that such comparison claims without any detailed description of how they are measured are worse than a biased but detailed benchmark.  The specific domain of this benchmark is, yet again, the PingPong benchmark as used earlier in my <a href="http://nichol.as/asynchronous-servers-in-python">Async Socket Benchmark</a>. However, there are some differences:</p><ul><li>We will fire multiple requests over a single connection, when possible, by using a HTTP 1.1 keepalive connection</li><li>It is a distributed benchmark with multiple clients</li><li>We will use an identical WSGI application for all servers instead of specially crafted code to return the reply</li><li>We expect the server to understand our HTTP request and reply with the correct error codes</li></ul><p>This benchmark is a conceptually simple one and you could claim that this is not representable for most common web application which rely heavily on blocking database connections. I agree with that to some extent as this is mostly the case. However, the push towards HTML5&#8217;s websockets and highly interactive web applications  will require servers that are capable to serve lots of concurrent connections with low latency.</p><h3>The benchmark</h3><p>We will run the following WSGI application &#8216;pong.py&#8217; on all servers.</p><pre class="brush: python;">
def application(environ, start_response):
    status = '200 OK'
    output = 'Pong!'

    response_headers = [('Content-type', 'text/plain'),
                        ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    return [output]
</pre><p>We will also tune both client and server by running the following commands. This basically enables the server to open LOTS of concurrent connections.</p><blockquote><p>echo &#8220;10152 65535&#8243; &gt; /proc/sys/net/ipv4/ip_local_port_range<br /> sysctl -w fs.file-max=128000<br /> sysctl -w net.ipv4.tcp_keepalive_time=300<br /> sysctl -w net.core.somaxconn=250000<br /> sysctl -w net.ipv4.tcp_max_syn_backlog=2500<br /> sysctl -w net.core.netdev_max_backlog=2500<br /> ulimit -n 10240</p></blockquote><p>The server is a virtual machine with only one assigned processor. I have explicitly limited the amount of available processors to make sure that it is a fair comparison. Whether or not the server scales over multiple processors is an interesting and useful feature but this is not something I will measure in this benchmark. The reason for this is that it isn&#8217;t that difficult to scale up your application to multiple processors by using a reverse proxy and multiple server processes (this can even be managed for you by special applications such as <a href="http://pypi.python.org/pypi/Spawning/0.9.3rc2">Spawning</a> or <a href="http://github.com/benoitc/grainbows">Grainbows</a>).  The server and clients run Debian Lenny with Python 2.6.4 on the amd64 architecture. I made sure that all WSGI servers have a backlog set of at least 500 and that (connection/error) logging is disabled, when this was not directly possible from the callable I modified the library. The server and the clients have 1GB of ram.</p><p>I benchmarked the HTTP/1.0 request rate of all server and the HTTP/1.1 request rate on the subset of servers that support pipelining multiple requests over a single connection. While the lack of HTTP 1.1 keepalive support is most likely a non issue in current deployment situations I expect it to become an important feature in applications that depend heavily on low latency connections. You should think about comet-style web applications or applications that use HTML5 websockets.</p><p>I categorize a server as HTTP/1.1 capable by its behaviour, not by its specs. For example the Paster server says that it has some support for HTTP 1.1 keep alives but I was unable to pipeline multiple requests. <a href="http://trac.pythonpaste.org/pythonpaste/ticket/392">This reported bug</a> might be relevant to this situation and might apply to some of the other &#8220;HTTP 1.0 Servers&#8221;.</p><p>The benchmark will be performed by running a <a href="http://gom-jabbar.org/articles/2009/02/04/httperf-and-file-descriptors">recompiled</a> <a href="http://www.hpl.hp.com/research/linux/httperf/docs.php">httperf</a> (which bypasses the static compiled file limit in the debian package) on 3 different specially setup client machines.  To initialize the different request rates and aggregate the results I will use a tool called <a href="http://www.xenoclast.org/autobench/">autobench</a>. Note: this is not ApacheBench (ab).</p><p>The command to benchmark HTTP/1.0 WSGI servers is:</p><blockquote><p>httperf &#8211;hog &#8211;timeout=5 &#8211;client=0/1 &#8211;server=tsung1 &#8211;port=8000 &#8211;uri=/ &#8211;rate=<strong>&lt;RATE&gt;</strong> &#8211;send-buffer=4096 &#8211;recv-buffer=16384 &#8211;num-conns=400 <strong>&#8211;num-calls=1</strong></p></blockquote><p>And the command for HTTP/1.1 WSGI servers is:</p><blockquote><p>httperf &#8211;hog &#8211;timeout=5 &#8211;client=0/1 &#8211;server=tsung1 &#8211;port=8000 &#8211;uri=/ &#8211;rate=<strong>&lt;RATE&gt;</strong> &#8211;send-buffer=4096 &#8211;recv-buffer=16384 &#8211;num-conns=400 <strong>&#8211;num-calls=10</strong></p></blockquote><h3>The Contestants</h3><p>Python is really rich with WSGI servers, i have made a selection of different servers which are listed below.</p><table id="table" class="sortable" border="0"><thead><tr><th><span title="Name">Name</span></th><th><span title="Version">Version</span></th><th><span title="http 1.1 keepalive">http 1.1</span></th><th>Flavour</th><th><span title="Repository">Repo.</span></th><th><span title="Blog">Blog</span></th><th><span title="Community">Community</span></th></tr></thead><tbody><tr><td><a href="http://gunicorn.org/installation.html">Gunicorn</a></td><td><a href="http://pypi.python.org/pypi/gunicorn/0.6.4">0.6.4</a></td><td>No</td><td>processor/thread</td><td><a href="http://github.com/benoitc/gunicorn">GIT</a></td><td>?</td><td><a href="http://webchat.freenode.net/?channels=gunicorn">#gunicorn</a></td></tr><tr><td><a href="http://projects.unbit.it/uwsgi/">uWSGI</a></td><td><a href="http://projects.unbit.it/uwsgi/browser?rev=253">Trunk (253)</a></td><td>Yes</td><td>processor/thread</td><td>repo</td><td>?</td><td><a href="http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi">Mailing List</a></td></tr><tr><td><a href="http://www.fapws.org/">FAPWS3</a></td><td>0.3.1</td><td>No</td><td>processor/thread</td><td><a href="http://github.com/william-os4y/fapws3">GIT</a></td><td><a href="http://william-os4y.livejournal.com/">William Os4y</a></td><td><a href="http://groups.google.com/group/fapws">Google Groups</a></td></tr><tr><td><a href="http://www.zetadev.com/software/aspen/">Aspen</a></td><td><a href="http://pypi.python.org/pypi/aspen/0.8">0.8</a></td><td>No</td><td>processor/thread</td><td><a href="http://aspen.googlecode.com/svn/tags/0.8/">SVN</a></td><td><a href="http://blag.whit537.org/">Chad Whitacre</a></td><td><a href="http://groups-beta.google.com/group/aspen-users/?pli=1">Google Groups</a></td></tr><tr><td><a href="http://code.google.com/p/modwsgi/">Mod_WSGI</a></td><td><a href="http://modwsgi.googlecode.com/files/mod_wsgi-3.1.tar.gz">3.1</a></td><td>Yes</td><td>processor/thread</td><td><a href="http://code.google.com/p/modwsgi/source/checkout">SVN</a></td><td><a href="http://blog.dscpl.com.au/">Graham Dumpleton</a></td><td><a href="http://groups.google.com/group/modwsgi?pli=1">Google  Groups</a></td></tr><tr><td><a href="http://docs.python.org/library/wsgiref.html">wsgiref</a></td><td>Py 2.6.4</td><td>No</td><td>processor/thread</td><td><a href="http://svn.python.org/view/python/trunk/Lib/wsgiref/">SVN</a></td><td>None</td><td><a href="http://mail.python.org/mailman/listinfo/web-sig">Mailing  List</a></td></tr><tr><td><a href="http://www.cherrypy.org/">CherryPy</a></td><td><a href="http://download.cherrypy.org/cherrypy/3.1.2/">3.1.2</a></td><td>Yes</td><td>processor/thread</td><td><a href="http://www.cherrypy.org/browser/trunk">SVN</a></td><td><a href="http://planet.cherrypy.org/">Planet CherryPy</a></td><td><a href="http://planet.cherrypy.org/">Planet, IRC</a></td></tr><tr><td><a href="http://code.google.com/p/magnum-py/">Magnum  Py</a></td><td><a href="http://magnum-py.googlecode.com/files/magnum-0.2.tar.gz">0.2</a></td><td>No</td><td>processor/thread</td><td><a href="http://code.google.com/p/magnum-py/source/checkout">SVN</a></td><td><a href="http://mattgattis.com/blog/2009/10/18/introducing-magnum/">Matt Gattis</a></td><td><a href="http://groups.google.com/group/magnum-py">Google Groups</a></td></tr><tr><td><a href="http://twistedmatrix.com/trac/">Twisted </a></td><td><a href="http://tmrc.mit.edu/mirror/twisted/Twisted/10.0/Twisted-10.0.0.tar.bz2">10.0.0</a></td><td>Yes</td><td>processor/thread</td><td><a href="http://twistedmatrix.com/trac/browser">SVN</a></td><td><a href="http://planet.twistedmatrix.com/">Planet Twisted </a></td><td><a href="http://twistedmatrix.com/trac/wiki/TwistedCommunity">Community</a></td></tr><tr><td><a href="http://code.google.com/p/cogen/">Cogen </a></td><td><a href="http://cogen.googlecode.com/files/cogen-0.2.1.zip">0.2.1</a></td><td>Yes</td><td>callback/generator</td><td><a href="http://code.google.com/p/cogen/source/checkout">SVN</a></td><td><a href="http://ionelmc.wordpress.com/"> Maries Ionel </a></td><td><a href="http://groups.google.com/group/cogen">Google Groups</a></td></tr><tr><td><a href="http://www.gevent.org/">GEvent </a></td><td><a href="http://pypi.python.org/pypi/gevent/0.12.2">0.12.2</a></td><td>Yes</td><td>lightweight threads</td><td><a href="http://bitbucket.org/denis/gevent/src/tip/examples/">Mercurial</a></td><td><a href="http://blog.gevent.org/">Denis Bilenko</a></td><td><a href="http://groups.google.com/group/gevent">Google Groups</a></td></tr><tr><td><a href="http://www.tornadoweb.org/">Tornado</a></td><td><a href="http://www.tornadoweb.org/static/tornado-0.2.tar.gz">0.2</a></td><td>Yes</td><td>callback/generator</td><td><a href="http://github.com/facebook/tornado">GIT</a></td><td><a href="http://www.facebook.com/tornadoweb">Facebook</a></td><td><a href="http://groups.google.com/group/python-tornado">Google Groups</a></td></tr><tr><td><a href="http://eventlet.net/">Eventlet</a></td><td><a href="http://pypi.python.org/packages/source/e/eventlet/eventlet-0.9.6.tar.gz">0.9.6</a></td><td>Yes</td><td>lightweight threads</td><td><a href="http://bitbucket.org/which_linden/eventlet/src/">Mercurial</a></td><td><a href="http://blog.eventlet.net/">Eventlet</a></td><td><a href="https://lists.secondlife.com/cgi-bin/mailman/listinfo/eventletdev">Mailinglist</a></td></tr><tr><td><a href="http://opensource.hyves.org/concurrence/">Concurrence</a></td><td><a href="http://github.com/concurrence/concurrence/tree/master">tip</a></td><td>Yes</td><td>lightweight threads</td><td><a href="http://github.com/concurrence/concurrence">GIT</a></td><td>None</td><td><a href="http://groups.google.com/group/concurrence-framework">Google Groups</a></td></tr></tbody></table><p>Most of the information in this table should be rather straightforward, I specify the version benchmarked and whether or not the server has been found capable of HTTP 1.1. The flavour of the server specifies the concurrency model the server uses and I identify 3 different flavours:</p><p><strong>Processor / Thread model</strong></p><p>The p/t model is the most common flavour. Every requests runs in its own cleanly separated thread. A blocking request (such as a synchronous database call or a function call in a C extension) will not influence other requests. This is convenient as you do not need to worry about how everything is implemented, but it does come at a price. The maximum amount of concurrent connections is limited by your number of workers or threads and this is known to scale badly when you have the need for lots of concurrent users.</p><p><strong>Callback / Generator model</strong></p><p>The callback/generator model handles multiple concurrent connections in a single thread, thereby removing the thread barrier. A single blocking call will block the whole event loop however and has to be prevented. The servers that have this flavour usually provide a threadpool to integrate blocking calls in their async framework or provide alternative non-blocking database connectors. In order to provide flow control this flavour uses callbacks or generators. Some think that this is a beautiful way to create a form of event driven programming others think that it is snake pit that quickly changes your clean code to an entangled mess of callbacks or yield statements.</p><p><strong>Lightweight Threads</strong></p><p>The lightweight flavour uses greenlets to provide concurrency.  This also works by providing concurrency from a single thread but in a less obtrusive way then with the callbacks or generator approach. But of course one has to be careful with blocking connections as this will stop the event loop. To prevent this from happening, Eventlet and Gevent can monkeypatch the socket module to stop it from blocking so when you are using a pure python database connector this should never block the loop. Concurrence provides an asynchronous database adapter.</p><h3>Implementation specifics for each WSGI server</h3><h4>Aspen</h4><p>Ruby might be full with all kinds of rockstar programmers (whatever that might mean) but if i have to nominate just one Python programmer with some sort of &#8216;rockstar award&#8217; i would definitely nominate Chad Whitacre. Its not only the great tools he created; Testosterone, Aspen, Stephane. But mostly how he promotes them with <a href="http://www.youtube.com/watch?v=CAJi3XQsOqI">the</a> <a href="http://www.youtube.com/user/whit537#p/u/1/Slk95WWL138">most</a> <a href="http://www.zetadev.com/software/testosterone/screencast.html">awesome</a> <a href="http://blag.whit537.org/2006/11/aspen-screencast.html">screencasts</a> i have ever seen.</p><p>Anyway, Aspen is a neat little Web server which is also able to serve WSGI applications. It can be easily installed with &#8216;pip install aspen&#8217; and uses a special directory structure for configuration and if you want more information i am going to point you to his screencasts.</p><h4>CherryPy</h4><p><a href="http://www.cherrypy.org/"><img class="alignright size-full wp-image-526" title="cherrypy" src="http://nichol.as/wp-content/uploads/2010/03/cherrypy.png" alt="cherrypy Benchmark of Python WSGI Servers" width="160" height="54" /></a>CherryPy is actually an object oriented Python framework but features an excellent WSGI server. Installation can be done with a simple &#8216;pip install cherrypy&#8217;. I ran the following script to test out the performance of the WSGI server:</p><pre class="brush: python;">
from cherrypy import wsgiserver
from pong import application

# Here we set our application to the script_name '/'
wsgi_apps = [('/', application)]

server = wsgiserver.CherryPyWSGIServer(('0.0.0.0', 8070), wsgi_apps, request_queue_size=500,     server_name='localhost')

if __name__ == '__main__':
    try:
        server.start()
    except KeyboardInterrupt:
        server.stop()
</pre><h4>Cogen</h4><p>The code to have Cogen run a WSGI application is as follows:</p><pre class="brush: python;">
from cogen.web import wsgi
from cogen.common import *
from pong import application

m = Scheduler(default_priority=priority.LAST, default_timeout=15)
server = wsgi.WSGIServer(
            ('0.0.0.0', 8070),
            application,
            m,
            server_name='pongserver')
m.add(server.serve)
try:
    m.run()
except (KeyboardInterrupt, SystemExit):
    pass
</pre><h4>Concurrence</h4><p><a href="http://opensource.hyves.org/concurrence/">Concurrence</a> is an asynchronous framework under development by Hyves (you might call it the Dutch Facebook) built upon Libevent (I used the latest stable version <a href="http://www.monkey.org/~provos/libevent-1.4.13-stable.tar.gz">1.4.13</a>), I fired up the pong application as follows:</p><pre class="brush: python;">
from concurrence import dispatch
from concurrence.http import WSGIServer
from pong import application
server = WSGIServer(application)
# Concurrence has a default backlog of 512
dispatch(server.serve(('0.0.0.0', 8080)))
</pre><h4>Eventlet</h4><p><a href="http://blog.eventlet.net/">Eventlet</a> is a full featured asynchronous framework which also provides WSGI server functionality. It is in development by Linden Labs (makers of Second Life). To run the application I used the following code:</p><pre class="brush: python;">
import eventlet
from eventlet import wsgi
from pong import application
wsgi.server(eventlet.listen(('', 8090), backlog=500), application, max_size=8000)
</pre><h4>FAPWS3</h4><p><a href="http://github.com/william-os4y/fapws3#readme">FAPWS3</a> is a WSGI server build around the <a href="http://software.schmorp.de/pkg/libev.html">LibEV</a> library (I used version <a href="http://packages.debian.org/lenny/libev3">3.43-1.1</a>). When LibEV has been installed,  FAPWS can be easily installed with pip. The philosophy behind FAPWS3 is to stay the simplest and fastest webserver. The script I used to start up the WSGI application is as follows:</p><pre class="brush: python;">
import fapws._evwsgi as evwsgi
from fapws import base
from pong import application

def start():
    evwsgi.start(&quot;0.0.0.0&quot;, 8080)
    evwsgi.set_base_module(base)

    evwsgi.wsgi_cb((&quot;/&quot;, application))

    evwsgi.set_debug(0)
    evwsgi.run()

if __name__==&quot;__main__&quot;:
    start()
</pre><h4>Gevent</h4><p><a href="http://www.gevent.org/">Gevent</a> is one of the best performing Async frameworks in my previous socket benchmark. Gevent extends Libevent and uses its HTTP server functionality extensively. To install Gevent you need Libevent installed after which you can pull in Gevent with PIP.</p><pre class="brush: python;">
from gevent import wsgi
from pong import application
wsgi.WSGIServer(('', 8088), application, spawn=None).serve_forever()
</pre><p>The above code will run the pong application without spawning a Greenlet on every request. If you leave out the argument &#8217;spawn=None&#8217; Gevent will spawn a Greenlet for every new request.</p><h4>Gunicorn</h4><p><a href="http://gunicorn.org/"><img class="alignright size-full wp-image-524" title="gunicorn" src="http://nichol.as/wp-content/uploads/2010/03/gunicorn1.png" alt="gunicorn1 Benchmark of Python WSGI Servers" width="122" height="82" /></a><a href="http://gunicorn.org/">Gunicorn</a> stands for &#8216;Green Unicorn&#8217;, everybody knows that a unicorn is a mix of the <a href="http://nichol.as/files/narwhals.swf"> the awesome narwhal</a> and <a href="http://djangopony.com/">the magnificent pony</a> the green does however have nothing to do with the great <a href="http://codespeak.net/svn/greenlet/trunk/doc/greenlet.txt">greenlets</a> as it really has a threaded flavour. Installation is easy and can be done with a simple &#8216;pip install gunicorn&#8217; Gunicorn provides you with a simple command to run wsgi applications, all I had to do was:</p><blockquote><p>gunicorn -b :8000 -w 1 pong:application</p></blockquote><p><span style="color: #800000;">Update: </span>I had some suggestions in the comment section that using a single worker and having a client connect  to the naked server is not the correct way to work with Gunicorn. So I took their suggestions and moved Gunicorn behind NGINX and increased the worker count to the suggested number of workers, 2*N+1 where N is 1 which makes 3. The result of this is depicted in the graphs as gunicorn-3w.</p><p>The run Gunicorn with more workers can be done such as:</p><blockquote><p>gunicorn -b unix:/var/nginx/uwsgi.sock -w 3 pong:application</p></blockquote><h4>MagnumPy</h4><p><a href="http://www.youtube.com/watch?v=3CquMO3vJvo&amp;feature=fvsr"><img class="alignright size-full wp-image-516" title="magnum_pi_tom_selleck-241x300" src="http://nichol.as/wp-content/uploads/2010/03/magnum_pi_tom_selleck-241x300.jpg" alt="magnum pi tom selleck 241x300 Benchmark of Python WSGI Servers" width="84" height="104" /></a><a href="http://code.google.com/p/magnum-py/">MagnumPy</a> has to be the server with the most awesome name. This is still a very young project but its homepage is making some strong statements about its performance so it is worth testing out. It does not feel as polished as the other contestants and installing is basically pushing the &#8216;magnum&#8217; directory on your PYTHONPATH edit &#8216;./magnum/config.py&#8217; after which you can start the server by running &#8216;./magnum/serve.py start&#8217;</p><pre class="brush: plain;">
#config.py
import magnum
import magnum.http
import magnum.http.wsgi
from pong import application

WORKER_PROCESSES = 1
WORKER_THREADS_PER_PROCESS = 1000
HOST = ('', 8050)
HANDLER_CLASS = magnum.http.wsgi.WSGIWrapper(application)
DEBUG = False
PID_FILE = '/tmp/magnum.pid'
</pre><h4>Mod_WSGI</h4><p><a href="http://code.google.com/p/modwsgi/">Mod_WSGI</a> is the successor of Mod_Python, it allows you to easily integrate Python code with the Apache server. My first python web app experience was with mod_python and PSP templates, WSGI and cool frameworks such as Pylons have really made life a lot easier.</p><p>Mod_WSGI is a great way to get your application deployed quickly. Installing &#8216;Mod_WSGI&#8217; is with most Linux distributions really easy. For example:</p><blockquote><p>aptitude install libapache2-mod-wsgi</p></blockquote><p>Is all you need to do on a pristine Debian distro to get a working Apache (MPM-Worker) server with Mod_WSGI enabled. To point Apache to your WSGI app just add a single line to &#8216;/etc/apache2/httpd.conf&#8217;:</p><blockquote><p>WSGIScriptAlias / /home/nicholas/benchmark/wsgibench/pong.py</p></blockquote><p>The problem is, that most people already have Apache installed and that they are using it for *shudder* serving PHP. PHP is not thread safe, meaning that you are forced to use a pre-forking Apache server. In this benchmark I am using the threaded Apache version and use mod_wsgi in embedded mode (as it gave me the best performance).</p><p>I disabled all unnecessary modules and configured Apache to provide me with a single worker, lots of threads and disabled logging (note: i tried various settings):</p><pre class="brush: plain;">
&lt;IfModule mpm_worker_module&gt;
    ServerLimit         1
    ThreadLimit         1000
    StartServers          1
    MaxClients          1000
    MinSpareThreads     25
    MaxSpareThreads     75
    ThreadsPerChild     1000
    MaxRequestsPerChild   0
&lt;/IfModule&gt;
CustomLog /dev/null combined
ErrorLog /dev/null
</pre><h4>Paster</h4><p>The Paster webserver is the webserver provided with <a href="http://pythonpaste.org/">Python Paste</a> it is Pylons default webserver. You can run a WSGI application as follows:</p><pre class="brush: python;">
from pong import application
from paste import httpserver
httpserver.serve(application, '0.0.0.0', request_queue_size=500)
</pre><h4>Tornado</h4><p><a href="http://www.tornadoweb.org/"><img class="alignright size-full wp-image-528" title="tornado" src="http://nichol.as/wp-content/uploads/2010/03/tornado.png" alt="tornado Benchmark of Python WSGI Servers" width="65" height="72" /></a><a href="http://www.tornadoweb.org/">Tornado</a> is the non-blocking webserver that powers FriendFeed. It provides some WSGI server functionality which can be used as described below. In the previous benchmark I have shown that it provides excellent raw-socket performance.</p><pre class="brush: python;">
import os
import tornado.httpserver
import tornado.ioloop
import tornado.wsgi
import sys
from pong import application
sys.path.append('/home/nicholas/benchmark/wsgibench/')
def main():
    container = tornado.wsgi.WSGIContainer(application)
    http_server = tornado.httpserver.HTTPServer(container)
    http_server.listen(8000)
    tornado.ioloop.IOLoop.instance().start()
if __name__ == &quot;__main__&quot;:
    main()
</pre><h4>Twisted</h4><p><a href="http://twistedmatrix.com/trac/"><img class="alignright size-full wp-image-527" title="TwistedLogo" src="http://nichol.as/wp-content/uploads/2010/03/TwistedLogo.png" alt="TwistedLogo Benchmark of Python WSGI Servers" width="87" height="87" /></a>After installing Twisted with PIP you get a tool &#8216;twistd&#8217; which allows you to easily serve WSGI applications fe:</p><blockquote><p>wistd &#8211;pidfile=/tmp/twisted.pid -no web &#8211;wsgi=pong.application &#8211;logfile=/dev/null</p></blockquote><p>But you can also run a WSGI application as follows:</p><pre class="brush: python;">
from twisted.web.server import Site
from twisted.web.wsgi import WSGIResource
from twisted.internet import reactor
from pong import application

resource = WSGIResource(reactor, reactor.getThreadPool(), application)
reactor.listenTCP(8000,Site(resource))
reactor.run()
</pre><h4>uWSGI</h4><p><a href="http://projects.unbit.it/uwsgi/"><img class="alignright size-full wp-image-525" title="uwsgi" src="http://nichol.as/wp-content/uploads/2010/03/uwsgi.png" alt="uwsgi Benchmark of Python WSGI Servers" width="139" height="46" /></a><a href="http://projects.unbit.it/uwsgi/">uWSGI</a> is a server written in C, it is not meant to run stand-alone but has to be placed behind a webserver. It provides modules for Apache, NGINX, Cherokee and Lighttpd. I have placed it behind NGINX which i configured as follows:</p><pre class="brush: plain;">
worker_processes  1;

events {
    worker_connections  30000;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    keepalive_timeout  65;

    upstream pingpong {
        ip_hash;
        server unix:/var/nginx/uwsgi.sock;
    }

    server {
        listen       9090;
        server_name  localhost;

        location / {
            uwsgi_pass  pingpong;
            include     uwsgi_params;
        }

        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

    }

}
</pre><p>This made NGINX listen on a unix socket, now all i needed to do was have uWSGI connect to that same unix socket, which i did with the following command:</p><blockquote><p>./uwsgi -s /var/nginx/uwsgi.sock -i -H /home/nicholas/benchmark/wsgibench/ -M -p 1 -w pong -z 30 -l 500 -L</p></blockquote><h4>WsgiRef</h4><p>WsgiRef is the default WSGI server included with Python since version 2.5. To have this server run my application I use the following code which disables logging and increases the backlog.</p><pre class="brush: python;">
from pong import application
from wsgiref import simple_server

class PimpedWSGIServer(simple_server.WSGIServer):
    # To increase the backlog
    request_queue_size = 500

class PimpedHandler(simple_server.WSGIRequestHandler):
    # to disable logging
    def log_message(self, *args):
        pass

httpd = PimpedWSGIServer(('',8000), PimpedHandler)
httpd.set_app(application)
httpd.serve_forever()
</pre><h2>Results</h2><p>Below you will find the results as plotted with Highcharts, the line will thicken when hovered over and you can easily enable or disable plotted results by clicking on the legend.</p><h3>HTTP 1.0 Server results</h3><p><div id="connectionrate-7150536638659318000" style="width: 550px; height: 400px"></div></p><p><strong> Disqualified servers</strong></p><p>From the above graph it should be clear that some of the web servers are missing, the reason is that I was unable to have them completely benchmarked as they stopped replying when the request rate passed a certain critical value. The servers that are missing are:</p><ul><li>MagnumPy, i was able to obtain a reply rate of 500 RPS, but when the request rate passed the 700 RPS mark, MagnumPy crashed</li><li>Concurrence, I was able to obtain a successful reply rate of 700 RPS, but it stopped replying when we fired more than 800 requests a second at the server. However, since Concurrence does support HTTP/1.1 keep alive connections and behaves correctly when benchmarked under a lower connection rate but higher request rate you can find its results in the HTTP/1.1 benchmark</li><li>Cogen, was able to obtain a reply rate of 800 per second but stopped replying when the request rate was above 1500 per second. It does have a complete benchmark under the HTTP/1.1 test though.</li><li>WSGIRef, I obtained a reply rate of 352 but it stopped reacting when we passed the 1900 RPS mark</li><li>Paster, obtained a reply rate of 500 but it failed when we passed the 2000 RPS mark</li></ul><p><strong>Interpretation</strong></p><p>From the servers that passed the benchmark we can see that they all have an admirable performance. At the bottom we have Twisted and Gunicorn, the performance of Twisted is somewhat expected as well it isn&#8217;t really tuned for WSGI performance. I find the performance of Gunicorn somewhat disappointing, also because for example Aspen which is a pure Python from a few years back, shows a significant better performance.  We can see however, that  increasing the worker count does in fact improve the performance as it is able to obtain a reply rate competitive with Aspen.</p><p>The other pure python servers, CherryPy  and Tornado seem to be performing on par with ModWSGI. It looks that CherryPy has a slight performance edge over Tornado. So, if you are thinking to change from ModWSGI or CherryPy to Tornado because of increased performance you should think again. Not only does this benchmark show that there isn&#8217;t that much to gain. But you will also abandon the process/thread model meaning that you should be cautious with code blocking your interpreter.</p><p>The top performers are clearly FAPWS3, uWSGI and Gevent. FAPWS3 has been designed to be fast and lives up the expectations, this has been noted by others as well as it looks like it is being <a href="http://groups.google.com/group/fapws/browse_thread/thread/f94f06df88684e77">used in production at Ebay</a>. uWSGI is used successfully in production at (and in development by) the Italian <a href="http://unbit.it/">ISP Unbit</a>. Gevent is a relatively young project but already very successful. Not only did it perform great in the previous async server benchmark but its reliance on the Libevent HTTP server gives it a performance beyond the other asynchronous frameworks.</p><p>I should note that the difference between these top 3 is too small to declare a clear winner of the &#8216;reply rate contest&#8217;. However, I want to stress that with almost all servers I had to be careful to keep the amount of concurrent connections low since threaded servers aren&#8217;t that fond of lots concurrent connections. The async servers (Gevent, Eventlet, and Tornado) were happy to work on whatever was being thrown at them. This really gives a great feeling of stability as you do not have to worry about settings such as poolsize, worker count etc..</p><p><div id="responsetime-7150536638659318000" style="width: 550px; height: 400px"></div></p><p>Most of the servers have an acceptable response time. Twisted and Eventlet are somewhat on the slow side but Gunicorn shows, unfortunately, a dramatic increase in latency when the request rate passes the 1000 RPS mark. Increasing the Gunicorn worker count lowers this latency by a lot but it still on the high side compared with for example Aspen or CherryPy.</p><p><div id="errors-7150536638659318000" style="width: 550px; height: 400px"></div></p><p>The low error rates for CherryPy, ModWSGI, Tornado, uWSGI should give everybody confidence in their suitability for a production environment.</p><h2>HTTP 1.1 Server results</h2><p>In the HTTP/1.1 benchmark we have a different list of contestants as not all servers were able to pipeline multiple requests over a single connection. In this test the connection rate is relatively low, for example a request rate of 8000 per second is about 800 connections per second with 10 requests per connection. This means that some servers that were not able to complete the HTTP/1.0 benchmark (with connection rates up to 5000 per second) are able to complete the HTTP/1.1 benchmark (Cogen and Concurrence for example).</p><p><div id="containerperf" style="width: 550px; height: 400px"></div></p><p>This graph shows the achieved request rate of the servers and we can clearly see that the achieved request rate is higher than in the HTTP/1.0 test. We could increase the total request rate even more by increasing the number of pipelined requests but this would then lower the connection rate. I think that 10 pipelined requests is a ok generalization of a webbrowser opening an average page.</p><p>The graph shows a huge gap in performance difference, with the fastest server Gevent we are able to obtain about 9000 replies per second, with Twisted, Concurrence and Cogen we get about 1000. In the middle we have CherryPy and ModWSGI with them we are able to obtain a reply rate around the 4000. It is interesting that Tornado while being close to CherryPy and ModWSGI seems to have an edge in this benchmark compared to the edge CherryPy had in the HTTP/1.0 benchmark. This is along the lines of our expectations as pipelined requests in Tornado are cheaper (since it is Async) then in ModWSGI or CherryPy. We expect this gap to widen if we increase the number of pipelined requests. However, it falls to be seen how much of a performance boost this would provide in a deployment setup as Tornado and CherryPy will then probably be sitting behind a reverse proxy, for example NGINX. In such a setting the connection type between the upstream and the proxy is usually limited to HTTP/1.0, NGINX for example does not even support HTTP/1.1 keep alive connections to its upstreams.</p><p>The best performers are clearly uWSGI and Gevent. I benchmarked Gevent with the &#8217;spawn=none&#8217; option to prevent Gevent from spawning a Greenlet, this seems fair in a benchmark like this. However, when you want to do something interesting with lots of concurrent connections you want each request to have its own Greentlet as this allows you to have thread like flow control. Thus I also benchmarked that version which can be seen in the Graph under the name &#8216;Gevent-Spawn&#8217;, from its results we can see that performance penalty is small.</p><p><div id="containertime" style="width: 550px; height: 400px"></div></p><p>Cogen is getting a high latency after about 2000 requests per second, Eventlet and Twisted show an increased latency fairly early as well.</p><p><div id="containererror" style="width: 550px; height: 400px"></div></p><p>The error rate shows that Twisted, Concurrence and Cogen have some trouble keeping up, I think all other error rates are acceptable.</p><h3>Memory Usage</h3><p>I also monitored the memory usage of the different frameworks during the benchmark. The benchmark noted below is the peak memory usage of all accumulated processes. As this benchmark does not really benefit from additional processes (as there is only one available processor) I limited the amount of workers when possible.</p><p><div id="memusage" style="width: 550px; height: 400px"></div></p><p>From these results there is one thing that really stands out and that is the absolutely low memory usage of uWSGI, Gevent and FAPWS3. Especially if we take their performance into account. It looks like Cogen is leaking memory, but I haven&#8217;t really looked into that. Gunicorn-3w shows compared with Gunicorn a relatively high memory usage. But it should be noted that this is mainly caused by the switch from the naked deployment to the deployment after NGINX as we now also have to add the memory usage of NGINX. A single Gunicorn worker only takes about 7.5Mb of memory.</p><h2>Let&#8217;s Kick it up a notch</h2><p><a href="http://nichol.as/wp-content/uploads/2010/03/610x1.jpg"><img class="alignright size-medium wp-image-535" title="OLYMPICS-TABLETENNIS/" src="http://nichol.as/wp-content/uploads/2010/03/610x1-300x201.jpg" alt="610x1 300x201 Benchmark of Python WSGI Servers" width="300" height="201" /></a>The first part of this post focussed purely on the RPS performance of the different frameworks under a high load. When the WSGI server was working hard enough it could simply answer all requests from a certain user and move on to the next user. This keeps the amount of concurrent connections relatively low making such a benchmark suitable for threaded web servers.</p><p>However, if we are going to increase the amount of concurrent connections we will quickly run into system limits as explained in the introduction.  This is commonly known as the C10K problem. Asynchronous servers use a single thread to handle multiple connections and when efficiently implemented with for example EPoll or KQueue are perfectly able to handle a large amount of concurrent connections.</p><p>So that is what we are going to do, we are going to take the top-3 performing WSGI servers namely Tornado, Gevent and uWSGI (FAPWS3 lack of HTTP/1.1 support made it unsuitable for this benchmark) and give them 5 minutes of ping-pong mayhem.</p><p>You see, ping-pong is a simple game and it isn&#8217;t really the complexity that makes it interesting it is the speed and the reaction of the players. Now, what is 5 minutes of pingpong mayhem? Imagine that for 5 minutes long every second an Airbus loaded with ping-pong players lands (500 clients) and each of those players is going to slam you exactly 12 balls (with a 5 second interval). This would mean that after 5 seconds you would already have to return the volleys of 2000 different players at once.</p><h2>Tsung Benchmark Setup</h2><p>To perform this benchmark I am going to use <a href="http://tsung.erlang-projects.org/">Tsung</a>, which is a multi-protocol distributed load testing tool written in Erlang. I will then have 3 different machines simulating the ping-pong rampage. I used the following Tsung script.</p><pre class="brush: plain;">
&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;!DOCTYPE tsung SYSTEM &quot;/usr/share/tsung/tsung-1.0.dtd&quot; []&gt;
&lt;tsung loglevel=&quot;warning&quot;&gt;

    &lt;clients&gt;
        &lt;client host=&quot;tsung2&quot; use_controller_vm=&quot;false&quot; maxusers=&quot;800&quot;/&gt;
        &lt;client host=&quot;tsung3&quot; use_controller_vm=&quot;false&quot; maxusers=&quot;800&quot;/&gt;
        &lt;client host=&quot;bastet&quot; use_controller_vm=&quot;false&quot; maxusers=&quot;800&quot;/&gt;
    &lt;/clients&gt;
    &lt;servers&gt;
        &lt;server host=&quot;tsung1&quot; port=&quot;8000&quot; type=&quot;tcp&quot;/&gt;
    &lt;/servers&gt;
    &lt;monitoring&gt;
        &lt;monitor host=&quot;tsung1&quot; type=&quot;erlang&quot;/&gt;
    &lt;/monitoring&gt;

    &lt;load&gt;
        &lt;arrivalphase phase=&quot;1&quot; duration=&quot;5&quot; unit=&quot;minute&quot;&gt;
            &lt;users interarrival=&quot;0.002&quot; unit=&quot;second&quot;/&gt;
        &lt;/arrivalphase&gt;
    &lt;/load&gt;

    &lt;sessions&gt;
        &lt;session name='wsgitest' probability='100'  type='ts_http'&gt;
            &lt;for from=&quot;0&quot; to=&quot;12&quot; incr=&quot;1&quot; var=&quot;counter&quot;&gt;
                &lt;request&gt;
                    &lt;http url='http://tsung1:8000/' version='1.1' method='GET'/&gt;
                &lt;/request&gt;
                &lt;thinktime random='false' value='5'/&gt;
            &lt;/for&gt;
        &lt;/session&gt;
    &lt;/sessions&gt;

&lt;/tsung&gt;
</pre><h3>Tsung Benchmark Results</h3><p><div id="tsungconnected6153452576349074374" style="width: 550px; height: 400px;"></div></p><p><div style="height:200px; width:550px"><div id="tsungload6153452576349074374" style="width: 170px; height: 200px; float: left" ></div><div id="tsungcpu6153452576349074374" style="width: 170px; height: 200px; float: left"></div><div id="tsungmem6153452576349074374" style="width: 180px; height: 200px; float: left"></div></div><p></p></p><p></p><p></p><p>Let me first state that all the three frameworks are perfectly capable to handle this kind of load, none of the frameworks dropped connection or ignored requests. Which I must say is already quite an achievement, considering that they had to handle about 2 million requests each.</p><p>Below the concurrent connection graph we can see the system load, the cpu usage and the free memory on the system during the benchmark. We can clearly see that Gevent put less strain on the system as the CPU and Load graph indicate. In the memory graph we can see that all frameworks used a consistent amount of memory.</p><p>The readers that still pay close attention to this article should note that the memory graph displays 4 lines instead of 3. The fourth line is Gevent compiled against <a href="http://www.provos.org/index.php?/archives/81-Libevent-2.0.4-alpha-released.html">Libevent 2.0.4a</a>, the new release of Libevent has been said to <a href="http://www.provos.org/index.php?/archives/61-Small-Libevent-2.0-Performance-Test.html">show considerable performance improvements in its HTTP server</a>. But it is still an alpha version and the memory graph shows that this version is leaking memory. Not something you want on your production site.</p><p><div id="tsunglatency6153452576349074374" style="width: 550px; height: 400px"></div></p><p>The final graph shows the latency of the 3 frameworks we can see a clear difference between Tornado and its competitors as Tornado&#8217;s response time hovers around 100ms, uWSGI around 5ms and gevent around 3ms. This is quite a difference and I am really amazed by the low latency of both Gevent and uWSGI during this onslaught.</p><h2>Summary and Remarks</h2><p>The above results show that as a Python web developer we have lots of different methods to deploy our applications. Some of these seem to perform better than others but by focussing only on server performance I will not justify most of the tested servers as they differ greatly in functionality. Also, if you are going to take some stock web framework and won&#8217;t do any optimizations or caching, the performance of your webserver is not going to matter as this will not be the bottleneck. If there is one thing which made this benchmark clear is that most Python Web servers offer great performance and if you feel things are slow the first thing to look at  is really your own application.</p><p>When you are just interested in quickly hosting your threaded application you really can&#8217;t go wrong with Apache ModWSGI. Even though Apache ModWSGI might put a little more strain on your memory requirements there is a lot to go for in terms of functionality. For example, protecting part of your website by using a LDAP server is as easy as enabling a module. Standalone CherryPy also shows great performance and functionality and is really a viable (fully Python) alternative which can lower memory requirements.</p><p>When you are a little more adventurous you can look at uWSGI and FAPWS3, they are relatively new compared to CherryPy and ModWSGI but they show a significant performance increase and do have lower memory requirements.</p><p>Concerning Tornado and performance, I do not think Tornado is an alternative for CherryPy or even ModWSGI. Not only does it hardly show any increases in performance but it also requires you to rethink your code.  But Tornado can be a great option if you do not have any code using blocking connections or are just wanting to look at something new.</p><p>And then there is <a href="http://www.gevent.org/">Gevent</a>, it really showed amazing performance at a low memory footprint, it might need some adjustments to your legacy code but then again the monkey patching of the socket module could help and I really love the cleanness of Greenlets. There has already been <a href="http://groups.google.com/group/gevent/browse_thread/thread/4de9703e5dca8271">some reports of deploying Gevent successfully</a> even with SQLAlchemy.</p><p>And if you want to dive into high performance websockets with lots of concurrent connections you really have to go with an asynchronous framework. Gevent seems like the perfect companion for that, at least that is what we are going to use.</p> <img src="http://feeds.feedburner.com/~r/Nichol4s/~4/XLwrzSjnOa8" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://nichol.as/benchmark-of-python-web-servers/feed</wfw:commentRss> <slash:comments>62</slash:comments> <feedburner:origLink>http://nichol.as/benchmark-of-python-web-servers</feedburner:origLink></item> <item><title>Asynchronous Servers in Python</title><link>http://feedproxy.google.com/~r/Nichol4s/~3/xLGzXmXO1u8/asynchronous-servers-in-python</link> <comments>http://nichol.as/asynchronous-servers-in-python#comments</comments> <pubDate>Tue, 22 Dec 2009 08:33:57 +0000</pubDate> <dc:creator>Nicholas Piël</dc:creator> <category><![CDATA[Uncategorized]]></category> <category><![CDATA[async]]></category> <category><![CDATA[comet]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[Python]]></category><guid isPermaLink="false">http://nichol.as/?p=337</guid> <description><![CDATA[
There has already been written a lot on the C10K problem and it is known that the only viable option to handle LOTS of concurrent connections is to handle them asynchronously. This also shows that for massively concurrent problems, such as lots of parallel comet connections, the GIL in Python is a non-issue as we [...]]]></description> <content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fnichol.as%2Fasynchronous-servers-in-python"><br /> <img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fnichol.as%2Fasynchronous-servers-in-python&amp;source=nichol4s&amp;style=normal&amp;service=bit.ly&amp;service_api=R_067b6a3b78b750bb3ff5fdfa4e005c82" height="61" width="50" title="Asynchronous Servers in Python" alt=" Asynchronous Servers in Python" /><br /> </a></div><p>There has already been written a lot on the<a href="http://www.kegel.com/c10k.html"> C10K problem</a> and it is known that the only viable option to handle LOTS of concurrent connections is to handle them asynchronously. This also shows that for massively concurrent problems, such as lots of parallel comet connections, the GIL in Python is a non-issue as we handle the concurrent connections in a single thread.</p><p>In this post i am going to look at a selection of asynchronous servers implemented in Python.</p><h3>Asynchronous Server Specs</h3><p>Since Python is really rich with (asynchronous) frameworks, I collected a few and looked at the following features:</p><ul><li>What License does the framework have?</li><li>Does it provide documentation?</li><li>Does the documentation contain examples?</li><li>Is it used in production somewhere?</li><li>Does it have some sort of community (mailinglist, irc, etc..)?</li><li>Is there any recent activity?</li><li>Does it have a blog (from the owner)?</li><li>Does it have a twitter account?</li><li>Where can i find the repository?</li><li>Does it have a Thread Pool?</li><li>Does it provide access to a TCP Socket?</li><li>Does it have any Comet features?</li><li>Is it using EPOLL?</li><li>What kind of server is it? (greenlets, callbacks, generators etc..)</li></ul><p>This gave me the following table.</p><table id="table" class="sortable" border="0"><thead><tr><th><span title="Name">Name</span></th><th><span title="License">Lic.</span></th><th><span title="Documentation">Doc</span></th><th><span title="Documentation contains examples">Ex.</span></th><th><span title="Used in production environment">Prod.</span></th><th><span title="Community support or forum">Com.</span></th><th><span title="Project shows signs of activity">Act.</span></th><th><span title="Project featured on blog">Blog</span></th><th><span title="Twitter account of project or main author">Twt</span></th><th><span title="Repository">Rep.</span></th><th><span title="Featuring a thread pool implementation">Pool</span></th><th><span title="Out of the box support for WSGI">Wsgi</span></th><th><span title="RAW socket access">Scket</span></th><th><span title="Featuring COMET support">Cmet</span></th><th><span title="Has out of the box support for EPOLL">Epoll</span></th><th><span title="Has test coverage">Test</span></th><th><span title="Type of concurrency">Style</span></th></tr></thead><tbody><tr><td><a href="http://twistedmatrix.com/trac/">Twisted</a></td><td>MIT</td><td>Yes</td><td>Yes</td><td>Yes</td><td><a href="http://twistedmatrix.com/trac/wiki/TwistedCommunity">Huge</a></td><td>Yes</td><td><a href="http://planet.twistedmatrix.com/">Lots</a></td><td>No</td><td><a href="http://twistedmatrix.com/trac/browser">Trac</a></td><td>Yes</td><td>Yes</td><td>Yes</td><td>No</td><td>Yes</td><td>Yes</td><td>Callback</td></tr><tr><td><a href="http://www.tornadoweb.org/">Tornado</a></td><td>Apache</td><td><a href="http://www.tornadoweb.org/documentation">Yes</a></td><td>Yes</td><td>F.Feed</td><td><a href="http://groups.google.com/group/python-tornado">Yes</a></td><td>Yes</td><td><a href="http://www.facebook.com/tornadoweb">FB</a></td><td><a href="http://www.facebook.com/tornadoweb">Yes</a></td><td><a href="http://github.com/facebook/tornado">GHub</a></td><td>No</td><td>Lim.</td><td><a href="http://github.com/facebook/tornado/blob/master/tornado/iostream.py"> Yes</a></td><td>No</td><td>Yes</td><td>No</td><td>Async</td></tr><tr><td><a href="http://orbited.org/">Orbited</a></td><td>MIT</td><td><a href="http://orbited.org/wiki/Documentation">Yes</a></td><td>Yes</td><td><a href="http://orbited.org/wiki/Sites">Yes</a></td><td><a href="http://orbited.org/wiki/Community">Yes</a></td><td>Yes</td><td><a href="http://orbited.org/blog/2008/10/announcing-orbited-070/">Yes</a></td><td>No</td><td><a href="http://orbited.org/wiki/Development">Trac</a></td><td>No</td><td>No</td><td>Yes</td><td>Yes</td><td>Yes</td><td>Yes</td><td>Callback</td></tr><tr><td><a href="http://dieselweb.org/lib/">DieselWeb</a></td><td>BSD</td><td><a href="http://dieselweb.org/lib/docs">Yes</a></td><td>Yes</td><td><a href="http://shoptalkapp.com/">STalk</a></td><td><a href="http://groups.google.com/group/diesel-users/">Yes</a></td><td>Yes</td><td><a href="http://shoptalkapp.com/blog/2009/10/20/beautiful-coroutines"> Yes</a></td><td><a href="http://twitter.com/boomplex">Yes</a></td><td><a href="http://bitbucket.org/boomplex/diesel/">BitB.</a></td><td>No</td><td>Lim.</td><td>Yes</td><td>Yes</td><td>Yes</td><td>No</td><td>Generator</td></tr><tr><td><a href="http://code.google.com/p/python-multitask/">MultiTask</a></td><td>MIT</td><td>Some</td><td>No</td><td>No</td><td>No</td><td>No</td><td><a href="http://pseudogreen.org/blog/erlang_vs_stackless_vs_multitask.html">Yes</a></td><td>No</td><td><a href="http://pseudogreen.org/bzr/multitask/">Bzr</a></td><td>No</td><td>No</td><td>No</td><td>No</td><td>No</td><td>No</td><td>Generator</td></tr><tr><td><a href="http://chiral.j4cbo.com/trac">Chiral</a></td><td>GPL2</td><td><a href="http://chiral.j4cbo.com/chiral-doc/moduleIndex.html">API</a></td><td>No</td><td>No</td><td>IRC</td><td>No</td><td>No</td><td>No</td><td><a href="http://chiral.j4cbo.com/trac/browser">Trac</a></td><td>No</td><td>Yes</td><td>Yes</td><td>Yes</td><td>Yes</td><td>Yes</td><td><a href="http://chiral.j4cbo.com/chiral-doc/chiral.core.coroutine.html"> Coroutine</a></td></tr><tr><td><a href="http://eventlet.net/">Eventlet</a></td><td>MIT</td><td><a href="http://eventlet.net/doc/">Yes</a></td><td>Yes</td><td><a href="http://wiki.secondlife.com/wiki/Eventlet">S. Life</a></td><td><a href="https://lists.secondlife.com/cgi-bin/mailman/listinfo/eventletdev"> Yes</a></td><td>Yes</td><td><a href="https://blogs.secondlife.com/community/features/blog/2007/08/25/more-open-source-our-web-services-libraries">Yes</a></td><td>No</td><td><a href="http://bitbucket.org/which_linden/eventlet/">BitB.</a></td><td><a href="http://eventlet.net/doc/modules/tpool.html">Yes</a></td><td><a href="http://eventlet.net/doc/modules/wsgi.html">Yes</a></td><td><a href="http://eventlet.net/doc/basic_usage.html#socket-functions"> Yes</a></td><td>No</td><td>Yes</td><td>Yes</td><td>Greenlet</td></tr><tr><td><a href="http://code.google.com/p/friendlyflow/">FriendlyFlow</a></td><td>GPL2</td><td><a href="http://code.google.com/p/friendlyflow/wiki/Home">Some</a></td><td><a href="http://code.google.com/p/friendlyflow/source/browse/trunk/examples/webserver.py"> One</a></td><td>No</td><td>No</td><td>No</td><td>No</td><td>Yes</td><td>Ggle</td><td>No</td><td>No</td><td>Yes</td><td>No</td><td>No</td><td>Yes</td><td>Generator</td></tr><tr><td><a href="http://weightless.io/weightless">Weightless</a></td><td>GPL2</td><td>Yes</td><td>No</td><td><a href="http://cq2.nl/">Yes</a></td><td>No</td><td>No</td><td>No</td><td><a href="http://twitter.com/ejgroene">Yes</a></td><td><a href="http://weightless.svn.sourceforge.net/">SF</a></td><td>No</td><td>No</td><td>Yes</td><td>No</td><td>No</td><td>Yes</td><td>Generator</td></tr><tr><td><a href="http://code.google.com/p/fibra/">Fibra</a></td><td>MIT</td><td>No</td><td>No</td><td>No</td><td>No</td><td>No</td><td><a href="http://entitycrisis.blogspot.com/">Yes</a></td><td>No</td><td><a href="http://code.google.com/p/fibra/source/checkout">Ggle</a></td><td>No</td><td>No</td><td>Yes</td><td>No</td><td>No</td><td>No</td><td>Generator</td></tr><tr><td><a href="http://opensource.hyves.org/concurrence/">Concurrence</a></td><td>MIT</td><td>Yes</td><td><a href="http://opensource.hyves.org/concurrence/examples.html">Yes</a></td><td><a href="http://www.hyves.nl">hyves</a></td><td><a href="http://groups.google.com/group/concurrence-framework">Yes</a></td><td>Yes</td><td>No</td><td>No</td><td><a href="http://github.com/concurrence/concurrence">GHub</a></td><td>No</td><td>Yes</td><td>Yes</td><td>No</td><td>Yes</td><td>Yes</td><td><a href="http://opensource.hyves.org/concurrence/concurrence.core.html#module-concurrence.core"> Tasklet</a></td></tr><tr><td><a href="http://trac.softcircuit.com.au/circuits/">Circuits</a></td><td>MIT</td><td><a href="http://trac.softcircuit.com.au/circuits/wiki/docs">Yes</a></td><td><a href="http://trac.softcircuit.com.au/circuits/browser/examples">Yes</a></td><td><a href="http://trac.softcircuit.com.au/circuits/wiki/docs/Users">Yes</a></td><td><a href="http://groups.google.com.au/group/circuits-users">Yes</a></td><td>Yes</td><td><a href="http://trac.softcircuit.com.au/circuits/blog">Yes</a></td><td><a href="http://twitter.com/therealprologic">Yes</a></td><td><a href="http://trac.softcircuit.com.au/circuits/browser">Trac</a></td><td>No</td><td><a href="http://trac.softcircuit.com.au/circuits/browser/circuits/web/wsgi.py"> Yes</a></td><td>Yes</td><td>No</td><td>No</td><td>Yes</td><td>Async</td></tr><tr><td><a href="http://gevent.org/">Gevent</a></td><td>MIT</td><td><a href="http://gevent.org/contents.html">Yes</a></td><td><a href="http://bitbucket.org/denis/gevent/src/tip/examples/">Yes</a></td><td>No</td><td><a style="color: #14568a !important;" href="http://groups.google.com/group/gevent">Yes</a></td><td>Yes</td><td><a href="http://blog.gevent.org/post/277359401/gevent-0-11-2-is-released">Yes</a></td><td><a href="http://twitter.com/gevent">Yes</a></td><td><a href="http://bitbucket.org/denis/gevent/src/">BitB.</a></td><td>No</td><td>Yes</td><td>Yes</td><td>No</td><td>Yes</td><td>Yes</td><td>Greenlet</td></tr><tr><td><a href="http://code.google.com/p/cogen/">Cogen</a></td><td>MIT</td><td><a href="http://cogen.googlecode.com/svn/trunk/docs/build/index.html">Yes</a></td><td><a href="http://code.google.com/p/cogen/source/browse/#svn/trunk/examples">Yes</a></td><td><a href="http://groups.google.com/group/cogen/">Yes</a></td><td>No</td><td>Yes</td><td><a href="http://ionelmc.wordpress.com/2009/01/">Yes</a></td><td><a href="http://twitter.com/ionel_mc">Yes</a></td><td><a href="http://code.google.com/p/cogen/source/checkout">Ggle</a></td><td>No</td><td>Yes</td><td>Yes</td><td>No</td><td>Yes</td><td>Yes</td><td>Generator</td></tr></tbody></table><p>This is quite a list and i probably still missed a few. The main reasons for using a framework and not implementing something your self is that you hope to be able to accelerate your own development process by standing on the shoulders of other developers. I think it therefore is important that there is documentation, some sort of developers community (mailinglist fe)  and that it is still active. If we take this as a requirement we are left with the following solutions:</p><ul><li>Orbited / Twisted   (callbacks)</li><li>Tornado (async)</li><li>Dieselweb (generator)</li><li>Eventlet (greenlet)</li><li>Concurrence (stackless)</li><li>Circuits (async)</li><li>Gevent (greenlet)</li><li>Cogen (generator)</li></ul><p>To quickly summarize this list; Twisted has been the de-facto standard to async programming with Python. It has an immense community, a wealth of tools, protocols and features. It has grown big and some say <span style="text-decoration: line-through;"><a title="Twisted vs. Tornado: You're Both Idiots" href="http://teddziuba.com/2009/09/twisted-vs-tornado-youre-both.html">it reminds them of shirtless men drinking Jager-bombs</a></span> complex. This is also one of the biggest reasons why people are looking elsewhere. Recently Facebook released the code of their async. approach called Tornado which is also using callbacks and recent benchmark show that <a href="http://www.apparatusproject.org/blog/2009/09/twisted-web-vs-tornado-performance-test/">it</a> <a href="http://antoniocangiano.com/2009/09/13/benchmarking-tornado-vs-twisted-web-vs-tornado-on-twisted/">outperforms</a> Twisted.</p><p>A common heard argument against programming with callbacks is that it can get overly complex. A programmatically cleaner approach is to use light-weight threads (imho).  This can be achieved by using a different Python implementation; <a href="http://www.stackless.com/">Stackless</a> (such as Concurrence is using) or a plugin for regular python <a href="http://pypi.python.org/pypi/greenlet">Greenlet</a> (such as Eventlet and Gevent are using). Another approach is to simulate these light-weight threads with Python generators, such as Dieselweb and Cogen are doing.</p><p>This should already show that while all these frameworks provide you asynchronous concurrency they do this in each of their own ways. I want to invite you to look at these frameworks as they all have their own code gems. For example, Concurrence has a non-blocking interface to MySQL. Eventlet has a neat thread-pool,  Tornado can pre-fork over CPU&#8217;s, Gevent offloads HTTP header parsing and DNS lookups to Libevent, Cogen has sendfile support and Twisted probably already has a factory doing exactly what you are planning to do next.</p><h3>The Ping Pong Benchmark</h3><p><a href="http://www.penny-arcade.com/comic/2008/7/2/"><img class="alignright size-full wp-image-302" style="margin-left: 10px; margin-right: 0px;" title="The Goddess of Ping Pong, Biba Golić" src="http://nichol.as/wp-content/uploads/2009/12/biba_golic_11.jpg" alt="biba golic 11 Asynchronous Servers in Python" width="194" height="300" /></a>In this benchmark i am going to focus on the performance of the framework to listen on a socket and write to incoming connections. The client pings the socket by opening it, the server responds with a<em> &#8216;Pong!&#8217;</em> and closes the socket. This should be really simple but it is a pain to create something that does this in an asynchronous and non-blocking way from scratch and that is exactly the reason why we are looking at these frameworks. It is all about making our lives easier.</p><p>Ok, for this benchmark i am going to use <a href="http://www.hpl.hp.com/research/linux/httperf/">httperf</a>,  a high performance tool that understands the HTTP protocol. If we want httperf to play along in our Ping-Pong benchmark we have to make it understand the  &#8216;PONG!&#8217; response. We can do this by mimicking a HTTP server and have our server respond with:</p><blockquote><p><span style="font-style: normal;">HTTP/1.0 200 OK<br /> Content-Length: 5</span></p><p><span style="font-style: normal;">Pong!</span></p></blockquote><p><span style="font-style: normal;"> instead of just &#8216;Pong!&#8217;. Also, since most default server configurations are not set up to handle a large amount of concurrent requests, we need to make a few adjustments:</span></p><ul><li><span style="font-style: normal;">Raise the per-process file limit by <a href="http://gom-jabbar.org/articles/2009/02/04/httperf-and-file-descriptors">compiling httperf after some adjustments.</a></span></li><li>Raise the per-user file limit, set <em>&#8216;ulimit -n 10000</em>&#8216;  on both server and client.</li><li>Raise kernel limit on file handles: &#8216;echo &#8220;128000&#8243; &gt; /proc/sys/fs/file-max&#8217;.</li><li>Increase the connection backlog, &#8216;<em>sysctl -w net.core.netdev_max_backlog = 2500</em>&#8216;</li><li>Raise the maximum connections with <em>&#8217;sysctl -w net.core.somaxconn = 250000</em>&#8216;</li></ul><p>With these settings my Debian Lenny system was ready to hammer the different servers up to rates far beyond the capacity of the frameworks. I used the following command</p><blockquote><p>httperf &#8211;hog &#8211;timeout=60 &#8211;client=0/1 &#8211;server=localhost &#8211;port=10000 &#8211;uri=/ &#8211;rate=4<strong>00</strong> &#8211;send-buffer=4096 &#8211;recv-buffer=16384 &#8211;num-conns=40000 &#8211;num-calls=1</p></blockquote><p>And increased the rate with an interval of 100 from 400 up to 9000 requests per second for a total of 40.000 requests at each interval.</p><h3>Code</h3><p>What will now follow, is the implementation of the server side in the different frameworks. It should show the different approaches the frameworks take.</p><h4>Twisted</h4><p>Gentlemen start your reactor!</p><pre class="brush: python;">
from twisted.internet import epollreactor epollreactor.install()
from twisted.internet.protocol import Protocol, Factory
from twisted.internet import reactor

class Pong(Protocol):
 def connectionMade(self):
 self.transport.write(&quot;HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n&quot;)
 self.transport.loseConnection()

# Start the reactor
factory = Factory()
factory.protocol = Pong
reactor.listenTCP(8000, factory)
reactor.run()
</pre><h4>Tornado</h4><p>Tornado, does not hide the raw socket interface, which makes this example more lengthy then the others.</p><pre class="brush: python;">

import errno
import functools
import socket
from tornado import ioloop, iostream

def connection_ready(sock, fd, events):
    while True:
        try:
            connection, address = sock.accept()
        except socket.error, e:
            if e[0] not in (errno.EWOULDBLOCK, errno.EAGAIN):
                raise
            return
        connection.setblocking(0)
        stream = iostream.IOStream(connection)
        stream.write(&quot;HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n&quot;, stream.close)

if __name__ == '__main__':
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    sock.setblocking(0)
    sock.bind((&quot;&quot;, 8010))
    sock.listen(5000)

    io_loop = ioloop.IOLoop.instance()
    callback = functools.partial(connection_ready, sock)
    io_loop.add_handler(sock.fileno(), callback, io_loop.READ)
    try:
        io_loop.start()
    except KeyboardInterrupt:
        io_loop.stop()
        print &quot;exited cleanly&quot;
</pre><h4>Dieselweb</h4><p>While this example is beautifully small, i do not really enjoy the generator approach which sprinkles &#8216;yield&#8217; all over the place.</p><pre class="brush: python;">
from diesel import Application, Service

def server_pong(addr):
    yield &quot;HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n&quot;

app = Application()
app.add_service(Service(server_pong, 8020))
app.run()
</pre><h4>Circuits</h4><p>I think the Circuit code is the most beautiful of them all, very elegent.</p><pre class="brush: python;">
from circuits.net.sockets import TCPServer

class PongServer(TCPServer):

    def connect(self, sock, host, port):
        self.write(sock, 'HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n')
        self.close(sock)

PongServer(('localhost', 8050)).run()
</pre><h4>Eventlet</h4><p>The Eventlet uses a Greenlet approach.</p><pre class="brush: python;">
from eventlet import api

def handle_socket(sock):
    sock.makefile('w').write(&quot;HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n&quot;)
    sock.close()

server = api.tcp_listener(('localhost', 8030))
while True:
    try:
        new_sock, address = server.accept()
    except KeyboardInterrupt:
        break
    # handle every new connection with a new coroutine
    api.spawn(handle_socket, new_sock)
</pre><h4>Gevent</h4><p>Gevent is presented as a rewrite of eventlet focussing on performance.</p><pre class="brush: python;">
import gevent
from gevent import socket

def handle_socket(sock):
    sock.sendall(&quot;HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n&quot;)
    sock.close()

server = socket.socket()
server.bind(('localhost', 8070))
server.listen(500)
while True:
    try:
        new_sock, address = server.accept()
    except KeyboardInterrupt:
        break
    # handle every new connection with a new coroutine
    gevent.spawn(handle_socket, new_sock)
</pre><h4>Concurrence</h4><p>Concurrence uses the Tasklet approach, it can be run under Greenlet and under Stackless Python. In this benchmark there was not really any performance difference between the two different engines.</p><pre class="brush: python;">
from concurrence import dispatch, Tasklet
from concurrence.io import BufferedStream, Socket

def handler(client_socket):
    stream = BufferedStream(client_socket)
    writer = stream.writer
    writer.write_bytes(&quot;HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n&quot;)
    writer.flush()
    stream.close()

def server():
    server_socket = Socket.new()
    server_socket.bind(('localhost', 8040))
    server_socket.listen()

    while True:
        client_socket = server_socket.accept()
        Tasklet.new(handler)(client_socket)

if __name__ == '__main__':
    dispatch(server)
</pre><h4>Cogen</h4><p>Cogen, uses the generator approach as well.</p><pre class="brush: python;">
import sys

from cogen.core import sockets
from cogen.core import schedulers
from cogen.core.coroutines import coroutine

@coroutine
def server():
    srv = sockets.Socket()
    adr = ('0.0.0.0', len(sys.argv)&gt;1 and int(sys.argv[1]) or 1200)
    srv.bind(adr)
    srv.listen(500)
    while 1:
        conn, addr = yield srv.accept()
        fh = conn.makefile()
        yield fh.write(&quot;HTTP/1.0 200 OK\r\nContent-Length: 12\r\n\r\nHello World!\r\n&quot;)
        yield fh.flush()
        conn.close()

m = schedulers.Scheduler()
m.add(server)
m.run()
</pre><h3>Results</h3><p><div id="containerperf2" style="width: 550px; height: 400px"></div></p><p>The first graph clearly shows at which connection rate (on the horizontal axis) the successful connection rate starts to degrade. It shows a huge difference between the best performer; Tornado with 7400 requests per second and the worst, Circuits with 1400 requests per second (which doesn&#8217;t use EPOLL).  This connection rate was sustained for at least 40.000 requests. We can see that, when the hammering of the server continues beyond rates the server can handle, the performance drops. This is caused by connection errors or timeouts.</p><p><div id="containertime2" style="width: 550px; height: 400px"></div></p><p>This graph shows the response time, it is clearly visible that once the maximum connection rate has been reached the overal response time starts to increase.</p><p><div id="containererr2" style="width: 550px; height: 400px"></div></p><p>The last graph shows the amount of errors, ie no return of a <em>200</em> detected by httperf. We can see a correlation between the performance of the server and the returned errors at a given request rate. The performing servers return less overall errors. There is however, one exception. Cogen was able to return ALL its requests successfully no matter how hard it was hammered. It is therefore not visible in this graph. This is interesting, at 9000 requests per second it was still able to answer all requests. However, the average connection time (from socket open till socket close) was about 7 seconds meaning that Cogen was serving about 28000 concurrent connections somewhat at reduced performance but not dropping them.</p><h3>Notes</h3><p>This post should make it clear that Python has a rich set of options toward asynchronous programming. All tested frameworks show great performance. I mean, even Circuits results with 1300 requests per second isn&#8217;t too bad. Tornado really blew me away with its performance at 7400 requests per second. But if i had to choose a favorite i would  probably go with Gevent, i am really digging its greenlet style.</p><p>The clean Greentlet / Stackless style is really cool, especially since Stackless Python is keeping up nowadays with CPython. There was some talk on a mailing list about Gevent running on Stackless. The concurrence framework already runs on Stackless and can thus be a great option already if you are looking for specific features of Stackless Python such as<a href="http://www.stackless.com/wiki/Pickling"> tasklet-pickling</a>.</p><p>I want to make clear that this test only shows  how these frameworks perform at a relatively simple task. It could be that when more stuff is going on in the background the results will change. However, I feel that this benchmark is a great indicator of how each frameworks handles a socket connection.</p><p>In the coming days I plan to investigate this some more. I will also check out  how these Python frameworks stack up against its equivalents in different languages, fe Ape, CometD, NodeJS. Stay tuned!</p> <img src="http://feeds.feedburner.com/~r/Nichol4s/~4/xLGzXmXO1u8" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://nichol.as/asynchronous-servers-in-python/feed</wfw:commentRss> <slash:comments>67</slash:comments> <feedburner:origLink>http://nichol.as/asynchronous-servers-in-python</feedburner:origLink></item> <item><title>Person Recognition (with Python)</title><link>http://feedproxy.google.com/~r/Nichol4s/~3/WTZs_4awMhA/person-recognition-with-python</link> <comments>http://nichol.as/person-recognition-with-python#comments</comments> <pubDate>Mon, 21 Dec 2009 22:17:31 +0000</pubDate> <dc:creator>Nicholas Piël</dc:creator> <category><![CDATA[Uncategorized]]></category> <category><![CDATA[ai]]></category> <category><![CDATA[computer vision]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[Python]]></category><guid isPermaLink="false">http://nichol.as/?p=359</guid> <description><![CDATA[
For my Msc thesis I have developed a system build in Python which does person recognition and have shown that it is possible to obtain a better recognition rate with this system than by using Google&#8217;s Picasa. I have put the source code online and will hereby announce that I will try my best to [...]]]></description> <content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fnichol.as%2Fperson-recognition-with-python"><br /> <img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fnichol.as%2Fperson-recognition-with-python&amp;source=nichol4s&amp;style=normal&amp;service=bit.ly&amp;service_api=R_067b6a3b78b750bb3ff5fdfa4e005c82" height="61" width="50" title="Person Recognition (with Python)" alt=" Person Recognition (with Python)" /><br /> </a></div><p>For my <a href="http://nichol.as/papers/thesis.pdf">Msc thesis</a> I have developed a system build in Python which does person recognition and have shown that it is possible to <strong>obtain a better recognition rate with this system than by using Google&#8217;s Picasa</strong>. I have put <a href="http://bitbucket.org/nicholas/projects/src/tip/PersonDetection/perdet/">the source code</a> online and will hereby announce that I will try my best to spend some time explaining how to do person and face recognition with Python.  I hope that a public announcements such as this will instantly create some public debt forcing me to actually complete this task. We shall see. <img src='http://nichol.as/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' title="Person Recognition (with Python)" /></p><p><img class="size-full wp-image-383 alignleft" style="margin-left: 10px; margin-right: 10px;" title="Bert and Ernie" src="http://nichol.as/wp-content/uploads/2009/12/images.jpeg" alt="Bert and Ernie" width="96" height="116" /></p><p>The approach described in my thesis uses a combination of pictorial cues (Eigenfaces, SIFT points, color histogram) and contextual cues (co-occurence with other persons or background). The idea behind this is really simple, in order to recognize a person we don&#8217;t even need to see the face of that specific person in most cases if we have more contextual information about the setting in which it was taken. For example, lets say you are looking at some pictures from &#8216;Sesame street&#8217;, when you detect Ernie there is a high probability that that other person on that same photo will be Bert. Even more so if we detect that the main component of that other persons color histogram is yellow.</p><p>The approach can be divided in three different subtasks:</p><ol><li>Detecting the person and segmenting its specific region</li><li>Extracting the features</li><li>Clustering over these features</li></ol><p>In the first task, we will detect the different people by using a face detection technique based on haarcascades. I plan on showing how to use OpenCV with Python and how you can improve its performance by combining the result of multiple haar cascades.</p><p>With the detected face we only have a certain square region within a picture which is very likely to contain a face. In order to detect the rest of the body i used a graph based segmentation technique and highly optimized the segmentation algorithm by using implementations in NumPy, Fortran and finally PyCUDA</p><p>From this segmented region we will then extract pictorial features such as a color histogram and SIFT features. With this information we can then try to extract and use our contextual information.</p><p style="text-align: center;"><a href="http://nichol.as/wp-content/uploads/2009/12/overview.png"><img class="aligncenter size-full wp-image-379" title="Global Overview of the Person Recognition System" src="http://nichol.as/wp-content/uploads/2009/12/overview.png" alt="Global Overview of the Person Recognition System" width="495" height="170" /></a></p><p>In the schematic overview we can see the different steps, we start with some images and segment it in regions of interest. From these regions we will then extract features to build our person models, over which we can then cluster by using  pictorial features (SIFT points and Color Histograms) and contextual features (ie, co-occurence with detected background or other persons). The code for this can already be found on <a href="http://bitbucket.org/nicholas/projects/src/tip/PersonDetection/">BitBucket</a> but is a bit rough, but as already said I promised to do some explaining. So keep an eye on this blog if you&#8217;re interested.</p><p>For now, I present you my collected list of references (also in <a href="http://nichol.as/papers/references.bib">BibTex format</a>) regarding person recognition. It could be a nice starting point for anyone interested in this domain.</p><h3>References</h3><p><div class="content"><dl><dt class="Key" id="davis2006rbp">davis2006rbp<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Davis/The%20relationship%20between%20precision-recall%20and%20ROC%20curves.pdf">The relationship between precision-recall and ROC curves</a></span><br /> <span class="Author">J. Davis and M. Goadrich</span><br /> <span class="Pages">233--240</span>&nbsp;
(<span class="Date">2006</span>)<br /><dd /><dt class="Key" id="raghavan1989cir">raghavan1989cir<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Raghavan/A%20critical%20investigation%20of%20recall%20and%20precision%20as%20measures.pdf">A critical investigation of recall and precision as measures of retrieval system performance</a></span><br /> <span class="Author">V. Raghavan and P. Bollmann and G. S. Jung</span><br /> <span class="Journal">ACM Transactions on Information Systems (TOIS)</span>&nbsp; <span class="Volume">7</span>&nbsp; <span class="Pages">205--229</span>&nbsp;
(<span class="Date">1989</span>)<br /><dd /><dt class="Key" id="wilson2006ffd">wilson2006ffd<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Wilson/Facial%20feature%20detection%20using%20Haar.pdf">Facial feature detection using Haar classifiers</a></span><br /> <span class="Author">P. I. Wilson and J. Fernandez</span><br /> <span class="Journal">Journal of Computing Sciences in Colleges</span>&nbsp; <span class="Volume">21</span>&nbsp; <span class="Pages">127--133</span>&nbsp;
(<span class="Date">2006</span>)<br /><dd /><dt class="Key" id="freund1997dtg">freund1997dtg<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Freund/A%20Decision-Theoretic%20Generalization%20of%20On-Line%20Learning%20and%20an%20Application.pdf">A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting</a></span><br /> <span class="Author">Y. Freund and R. E. Schapire</span><br /> <span class="Journal">Journal of Computer and System Sciences</span>&nbsp; <span class="Volume">55</span>&nbsp; <span class="Pages">119--139</span>&nbsp;
(<span class="Date">1997</span>)<br /><dd /><dt class="Key" id="graham2002tep">graham2002tep<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Graham/Time%20as%20essence%20for%20photo%20browsing%20through.pdf">Time as essence for photo browsing through personal digital libraries</a></span><br /> <span class="Author">A. Graham and H. Garcia-Molina and A. Paepcke and T. Winograd</span><br /> <span class="Pages">326--335</span>&nbsp;
(<span class="Date">2002</span>)<br /><dd /><dt class="Key" id="cooper2005tec">cooper2005tec<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Cooper/Temporal%20event%20clustering%20for%20digital%20photo.pdf">Temporal event clustering for digital photo collections</a></span><br /> <span class="Author">M. Cooper and J. Foote and A. Girgensohn and L. Wilcox</span><br /> <span class="Journal">ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)</span>&nbsp; <span class="Volume">1</span>&nbsp; <span class="Pages">269--288</span>&nbsp;
(<span class="Date">2005</span>)<br /><dd /><dt class="Key" id="moon2001cap">moon2001cap<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Moon/Computational%20and%20performance%20aspects%20of%20PCA-based%20face-recognition.pdf">Computational and performance aspects of PCA-based face-recognition algorithms</a></span><br /> <span class="Author">H. Moon and P. J. Phillips</span><br /> <span class="Journal">Perception-London</span>&nbsp; <span class="Volume">30</span>&nbsp; <span class="Pages">303--322</span>&nbsp;
(<span class="Date">2001</span>)<br /><dd /><dt class="Key" id="otoole2005fra">otoole2005fra<dt /><dd class="Pub"> <span class="Title"><a href="/papers/O%E2%80%99Toole/Face%20Recognition%20Algorithms%20Surpass%20Humans.pdf">Face Recognition Algorithms Surpass Humans Matching Faces over Changes in Illumination</a></span><br /> <span class="Author">A. O'Toole and P. J. Phillips and F. Jiang and J. Ayyad and N. PÃ©nard and H. Abdi</span><br /> <span class="Journal">IEEE Transactions on Pattern Analysis and Machine Intelligence</span>&nbsp; <span class="Pages">1642--1646</span>&nbsp;
(<span class="Date">2007</span>)<br /><dd /><dt class="Key" id="liu2006cdi">liu2006cdi<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Liu/Capitalize%20on%20Dimensionality%20Increasing%20Techniques%20for%20Improving.pdf">Capitalize on Dimensionality Increasing Techniques for Improving Face Recognition Grand Challenge Performance</a></span><br /> <span class="Author">C. Liu</span><br /> <span class="Journal">IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE</span>&nbsp; <span class="Pages">725--737</span>&nbsp;
(<span class="Date">2006</span>)<br /><dd /><dt class="Key" id="beis1997siu">beis1997siu<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Beis/Shape%20Indexing%20Using%20Approximate%20Nearest-Neighbour.pdf">Shape Indexing Using Approximate Nearest-Neighbour Search in High-Dimensional Spaces</a></span><br /> <span class="Author">J. Beis and D. Lowe</span><br /> <span class="Pages">1000--1006</span>&nbsp;
(<span class="Date">1997</span>)<br /><dd /><dt class="Key" id="bay2006ssu">bay2006ssu<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Bay/SURF:%20Speeded%20Up%20Robust%20Features0.pdf">SURF: Speeded Up Robust Features</a></span><br /> <span class="Author">H. Bay and T. Tuytelaars and L. Van Gool</span><br /> <span class="Journal">Lecture Notes in Computer Science</span>&nbsp; <span class="Volume">3951</span>&nbsp; <span class="Pages">404</span>&nbsp;
(<span class="Date">2006</span>)<br /><dd /><dt class="Key" id="ke2004psm">ke2004psm<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Ke/PCA-SIFT:%20A%20More%20Distinctive%20Representation%20for%20Local.pdf">PCA-SIFT: A More Distinctive Representation for Local Image Descriptors</a></span><br /> <span class="Author">Y. Ke and R. Sukthankar</span><br /> <span class="Volume">2</span>&nbsp;	(<span class="Date">2004</span>)<br /><dd /><dt class="Key" id="kryszczuk:ccf">kryszczuk:ccf<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Kryszczuk/Color%20correction%20for%20face%20detection%20based.pdf">Color correction for face detection based on human visual perception metaphor</a></span><br /> <span class="Author">K. Kryszczuk and A. Drygajlo</span><br /> <span class="Pages">138--143</span>&nbsp;
(<span class="Date">2007</span>)<br /><dd /><dt class="Key" id="gevers1999cbo">gevers1999cbo<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Gevers/Color-based%20object%20recognition.pdf">Color-based object recognition</a></span><br /> <span class="Author">T. Gevers and A. W. M. Smeulders</span><br /> <span class="Journal">Pattern Recognition</span>&nbsp; <span class="Volume">32</span>&nbsp; <span class="Pages">453--464</span>&nbsp;
(<span class="Date">1999</span>)<br /><dd /><dt class="Key" id="swain1991ci">swain1991ci<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Swain/Color%20indexing.pdf">Color indexing</a></span><br /> <span class="Author">M. J. Swain and D. H. Ballard</span><br /> <span class="Journal">International Journal of Computer Vision</span>&nbsp; <span class="Volume">7</span>&nbsp; <span class="Pages">11--32</span>&nbsp;
(<span class="Date">1991</span>)<br /><dd /><dt class="Key" id="schmid2000eip">schmid2000eip<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Schmid/Evaluation%20of%20Interest%20Point%20Detectors.pdf">Evaluation of Interest Point Detectors</a></span><br /> <span class="Author">C. Schmid and R. Mohr and C. Bauckhage</span><br /> <span class="Journal">International Journal of Computer Vision</span>&nbsp; <span class="Volume">37</span>&nbsp; <span class="Pages">151--172</span>&nbsp;
(<span class="Date">2000</span>)<br /><dd /><dt class="Key" id="lowe2004dif">lowe2004dif<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Lowe/Distinctive%20Image%20Features%20from%20Scale-Invariant.pdf">Distinctive Image Features from Scale-Invariant Keypoints</a></span><br /> <span class="Author">D. G. Lowe</span><br /> <span class="Journal">International Journal of Computer Vision</span>&nbsp; <span class="Volume">60</span>&nbsp; <span class="Pages">91--110</span>&nbsp;
(<span class="Date">2004</span>)<br /><dd /><dt class="Key" id="mikolajczyk2004sai">mikolajczyk2004sai<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Mikolajczyk/Scale%20%5C&%20Affine%20Invariant%20Interest%20Point.pdf">Scale & Affine Invariant Interest Point Detectors</a></span><br /> <span class="Author">K. Mikolajczyk and C. Schmid</span><br /> <span class="Journal">International Journal of Computer Vision</span>&nbsp; <span class="Volume">60</span>&nbsp; <span class="Pages">63--86</span>&nbsp;
(<span class="Date">2004</span>)<br /><dd /><dt class="Key" id="mikolajczyk2005pel">mikolajczyk2005pel<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Mikolajczyk/A%20Performance%20Evaluation%20of%20Local%20Descriptors.pdf">A Performance Evaluation of Local Descriptors</a></span><br /> <span class="Author">K. Mikolajczyk and C. Schmid</span><br /> <span class="Journal">IEEE Transactions on Pattern Analysis and Machine Intelligence</span>&nbsp; <span class="Pages">1615--1630</span>&nbsp;
(<span class="Date">2005</span>)<br /><dd /><dt class="Key" id="comaniciu2002msr">comaniciu2002msr<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Comaniciu/Mean%20Shift:%20A%20Robust%20Approach%20Toward.pdf">Mean Shift: A Robust Approach Toward Feature Space Analysis</a></span><br /> <span class="Author">D. Comaniciu and P. Meer</span><br /> <span class="Journal">IEEE Transactions on Pattern Analysis and Machine Intelligence</span>&nbsp; <span class="Pages">603--619</span>&nbsp;
(<span class="Date">2002</span>)<br /><dd /><dt class="Key" id="grabner2006fas">grabner2006fas<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Grabner/Fast%20Approximated%20SIFT.pdf">Fast Approximated SIFT</a></span><br /> <span class="Author">M. Grabner and H. Grabner and H. Bischof</span><br /> <span class="Journal">Lecture Notes in Computer Science</span>&nbsp; <span class="Volume">3851</span>&nbsp; <span class="Pages">918</span>&nbsp;
(<span class="Date">2006</span>)<br /><dd /><dt class="Key" id="castrillonsantana2008faf">castrillonsantana2008faf<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Castrillon/Face%20and%20Facial%20Feature%20Detection%20Evaluation.pdf">Face and Facial Feature Detection Evaluation</a></span><br /> <span class="Author">M. Castrillón-Santana and L. Déniz-Suárez and L. Antón-Canalís and J. Lorenzo-Navarro</span><br /> <span class="Volume">7</span>&nbsp;	(<span class="Date">2008</span>)<br /><dd /><dt class="Key" id="lienhart2002esh">lienhart2002esh<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Lienhart/An%20Extended%20Set%20of%20Haar-like%20Features%20for%20Rapid%20Object.pdf">LAn Extended Set of Haar-like Features for Rapid Object Detection</a></span><br /> <span class="Author">R. Lienhart and J. Maydt</span><br /> <span class="Volume">1</span>&nbsp; <span class="Pages">900--903</span>&nbsp;
(<span class="Date">2002</span>)<br /><dd /><dt class="Key" id="turk1991er">turk1991er<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Turk/Eigenfaces%20for%20Recognition.pdf">Eigenfaces for Recognition</a></span><br /> <span class="Author">M. Turk and A. Pentland</span><br /> <span class="Journal">Journal of Cognitive Neuroscience</span>&nbsp; <span class="Volume">3</span>&nbsp; <span class="Pages">71--86</span>&nbsp;
(<span class="Date">1991</span>)<br /><dd /><dt class="Key" id="felzenszwalb2004egb">felzenszwalb2004egb<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Felzenszwalb/Efficient%20Graph-Based%20Image%20Segmentation.pdf">Efficient Graph-Based Image Segmentation</a></span><br /> <span class="Author">P. F. Felzenszwalb and D. P. Huttenlocher</span><br /> <span class="Journal">International Journal of Computer Vision</span>&nbsp; <span class="Volume">59</span>&nbsp; <span class="Pages">167--181</span>&nbsp;
(<span class="Date">2004</span>)<br /><dd /><dt class="Key" id="Hae-sang:2006rz">Hae-sang:2006rz<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Hae-sang/A%20K-means-like%20Algorithm%20for%20K-medoids%20Clustering%20and%20Its%20Performance.pdf">A K-means-like Algorithm for K-medoids Clustering and Its Performance</a></span><br /> <span class="Author">P. Hae-sang and L. Jong-seok and J. Chi-hyuck</span><br /> <span class="Pages">1222-1231</span>&nbsp;
(<span class="Date">2006</span>)<br /><dd /><dt class="Key" id="cui2007eip">cui2007eip<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Cui/EasyAlbum:%20an%20interactive%20photo%20annotation%20system.pdf">EasyAlbum: an interactive photo annotation system based on face clustering and re-ranking</a></span><br /> <span class="Author">J. Cui and F. Wen and R. Xiao and Y. Tian and X. Tang</span><br /> <span class="Pages">367--376</span>&nbsp;
(<span class="Date">2007</span>)<br /><dd /><dt class="Key" id="elgammal2001pfs">elgammal2001pfs<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Elgammal/Probabilistic%20framework%20for%20segmenting%20people%20under.pdf">Probabilistic framework for segmenting people under occlusion</a></span><br /> <span class="Author">A. Elgammal and L. Davis</span><br /> <span class="Volume">2</span>&nbsp;	(<span class="Date">2001</span>)<br /><dd /><dt class="Key" id="2008_garcia_cvgpu">2008_garcia_cvgpu<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Garcia/Fast%20k%20nearest%20neighbor%20search%20using.pdf">Fast k nearest neighbor search using GPU</a></span><br /> <span class="Author">V. Garcia and E. Debreuve and M. Barlaud</span><br /> (<span class="Date">2008</span>)<br /><dd /><dt class="Key" id="VanDeSandeCVPR2008">VanDeSandeCVPR2008<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Sande/Evaluation%20of%20Color%20Descriptors%20for%20Object%20and%20Scene.pdf">Evaluation of Color Descriptors for Object and Scene Recognition</a></span><br /> <span class="Author">K. E. A. van de Sande and T. Gevers and C. G. M. Snoek</span><br /> (<span class="Date">2008</span>)<br /><div class="Abstract"> Image category recognition is important to access visual
information on the level of objects and scene types. So far,
intensity-based descriptors have been widely used. To increase
illumination invariance and discriminative power,
color descriptors have been proposed only recently. As
many descriptors exist, a structured overview of color invariant
descriptors in the context of image category recognition
is required.
Therefore, this paper studies the invariance properties
and the distinctiveness of color descriptors in a structured
way. The invariance properties of color descriptors are
shown analytically using a taxonomy based on invariance
properties with respect to photometric transformations. The
distinctiveness of color descriptors is assessed experimentally
using two benchmarks from the image domain and the
video domain.
From the theoretical and experimental results, it can be
derived that invariance to light intensity changes and light
color changes affects category recognition. The results reveal
further that, for light intensity changes, the usefulness
of invariance is category-specific.</div><dd /><dt class="Key" id="wagstaff2001ckm">wagstaff2001ckm<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Wagstaff/Constrained%20k-means%20clustering%20with%20background.pdf">Constrained k-means clustering with background knowledge</a></span><br /> <span class="Author">K. Wagstaff and C. Cardie and S. Rogers and S. Schroedl</span><br /> <span class="Pages">577--584</span>&nbsp;
(<span class="Date">2001</span>)<br /><dd /><dt class="Key" id="gallagher_cvpr_08_clothing">gallagher_cvpr_08_clothing<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Gallagher/Clothing%20Cosegmentation%20for%20Recognizing%20People.pdf">Clothing Cosegmentation for Recognizing People</a></span><br /> <span class="Author">A. Gallagher and T. Chen</span><br /> (<span class="Date">2008</span>)<br /><dd /><dt class="Key" id="Lepetit:cr">Lepetit:cr<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Lepetit/Keypoint%20Recognition%20using%20Randomized%20Trees.pdf">Keypoint Recognition using Randomized Trees</a></span><br /> <span class="Author">V. Lepetit and P. Fua</span><br /> <span class="Journal">IEEE Transactions on Pattern Analysis and Machine Intelligence</span>&nbsp;
(<span class="Date">2006</span>)<br /><dd /><dt class="Key" id="Indyk:1999dq">Indyk:1999dq<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Indyk/Approximate%20Nearest%20Neighbors:%20Towards%20Removing.pdf">Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality</a></span><br /> <span class="Author">P. Indyk and R. Motwani</span><br /> <span class="Journal">Proceedings of the thirtieth annual ACM symposium on Theory of computing</span>&nbsp; <span class="Pages">605--613</span>&nbsp;
(<span class="Date">1998</span>)<br /><dd /><dt class="Key" id="Gionis:1999bh">Gionis:1999bh<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Gionis/Similarity%20Search%20in%20High%20Dimensions%20via%20Hashing.pdf">Similarity Search in High Dimensions via Hashing</a></span><br /> <span class="Author">A. Gionis and P. Indyk and R. Motwani</span><br /> <span class="Journal">???</span>&nbsp; <span class="Pages">518-529</span>&nbsp;
(<span class="Date">1999</span>)<br /><dd /><dt class="Key" id="Sivic:2004qf">Sivic:2004qf<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Sivic/Efficient%20Visual%20Content%20Retrieval%20and%20Mining.pdf">Efficient Visual Content Retrieval and Mining in Videos</a></span><br /> <span class="Author">J. Sivic and A. Zisserman</span><br /> <span class="Journal">???</span>&nbsp;
(<span class="Date">2004</span>)<br /><dd /><dt class="Key" id="Clayton:2007vn">Clayton:2007vn<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Clayton/A%20learning%20framework%20for%20nearest%20neighbor%20search.pdf">A learning framework for nearest neighbor search</a></span><br /> <span class="Author">L. Clayton and S. Dasgupta</span><br /> <span class="Journal">Advances in Neural Information Processing Systems 20</span>&nbsp;
(<span class="Date">2007</span>)<br /><dd /><dt class="Key" id="ozuysal2007fkr">ozuysal2007fkr<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Ozuysal/Fast%20keypoint%20recognition%20in%20ten%20lines%20of%20code.pdf">Fast keypoint recognition in ten lines of code</a></span><br /> <span class="Author">M. Ozuysal and P. Fua and V. Lepetit</span><br /> <span class="Journal">Proc. IEEE Conference on Computing Vision and Pattern Recognition</span>&nbsp;
(<span class="Date">2007</span>)<br /><dd /><dt class="Key" id="shi2000nca">shi2000nca<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Shi/Normalized%20cuts%20and%20image%20segmentation.pdf">Normalized cuts and image segmentation</a></span><br /> <span class="Author">J. Shi and J. Malik</span><br /> <span class="Journal">IEEE Transactions on Pattern Analysis and Machine Intelligence</span>&nbsp; <span class="Volume">22</span>&nbsp; <span class="Pages">888--905</span>&nbsp;
(<span class="Date">2000</span>)<br /><dd /><dt class="Key" id="marfil2006psa">marfil2006psa<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Marfil/Pyramid%20segmentation%20algorithms%20revisited.pdf">Pyramid segmentation algorithms revisited</a></span><br /> <span class="Author">R. Marfil and L. Molina-Tanco and A. Bandera and J. RodrÃ­guez and F. Sandoval</span><br /> <span class="Journal">Pattern Recognition</span>&nbsp; <span class="Volume">39</span>&nbsp; <span class="Pages">1430--1451</span>&nbsp;
(<span class="Date">2006</span>)<br /><dd /><dt class="Key" id="Zhang:2007pd">Zhang:2007pd<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Zhang/Local%20features%20and%20kernels%20for%20classi%EF%AC%81cation%20of%20texture.pdf">Local features and kernels for classification of texture and ob ject categories</a></span><br /> <span class="Author">J. Zhang and M. Marszalek and S. Lazebnik and C. Schmid</span><br /> <span class="Journal">International Journal of Computer Vision</span>&nbsp; <span class="Volume">73</span>&nbsp; <span class="Pages">213-238</span>&nbsp;
(<span class="Date">2007</span>)<br /><dd /><dt class="Key" id="lowe1999orl">lowe1999orl<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Lowe/Object%20recognition%20from%20local%20scale-invariant.pdf">Object recognition from local scale-invariant features</a></span><br /> <span class="Author">D. G. Lowe</span><br /> <span class="Journal">International Conference on Computer Vision</span>&nbsp; <span class="Volume">2</span>&nbsp; <span class="Pages">1150--1157</span>&nbsp;
(<span class="Date">1999</span>)<br /><dd /><dt class="Key" id="SandeMSC07">SandeMSC07<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Sande/Coloring%20Concept%20Detection%20in%20Video%20using.pdf">Coloring Concept Detection in Video using Interest Regions</a></span><br /> <span class="Author">K. E. A. v. d. Sande</span><br /> (<span class="Date">2007</span>)<br /><div class="Abstract">Video concept detection aims to detect high-level semantic information present in video. State-of-the-art systems are based on visual features and use machine learning to build concept detectors from annotated examples. The choice of features and machine learning algorithms is of great influence on the accuracy of the concept detector. So far, intensity-based SIFT features based on interest regions have been applied with great success in image retrieval. Features based on interest regions, also known as local features, consist of an interest region detector and a region descriptor. In contrast to using intensity information only, we will extend both interest region detection and region description with color information in this thesis. We hypothesize that automated concept detection using interest region features benefits from the addition of color information. Our experiments, using the Mediamill Challenge benchmark, show that the combination of intensity features with color features improves significantly over intensity features alone.</div><dd /><dt class="Key" id="ramanan2003fat">ramanan2003fat<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Ramanan/Finding%20and%20tracking%20people%20from%20the%20bottom.pdf">Finding and tracking people from the bottom up</a></span><br /> <span class="Author">D. Ramanan and D. Forsyth</span><br /> <span class="Journal">Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on</span>&nbsp; <span class="Volume">2</span>&nbsp;	(<span class="Date">2003</span>)<br /><dd /><dt class="Key" id="felzenswalb">felzenswalb<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Felzenszwalb/Efficient%20Matching%20of%20Pictorial%20Structures.pdf">Efficient Matching of Pictorial Structures</a></span><br /> <span class="Author">P. Felzenszwalb and D. Huttenlocher</span><br /> <span class="Pages">66-73</span>&nbsp;
(<span class="Date">2000</span>)<br /><dd /><dt class="Key" id="girgensohn2004lfr">girgensohn2004lfr<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Girgensohn/Leveraging%20face%20recognition%20technology%20to%20find.pdf">Leveraging face recognition technology to find and organize photos</a></span><br /> <span class="Author">A. Girgensohn and J. Adcock and L. Wilcox</span><br /> <span class="Journal">Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval</span>&nbsp; <span class="Pages">99--106</span>&nbsp;
(<span class="Date">2004</span>)<br /><dd /><dt class="Key" id="naaman2005lcr">naaman2005lcr<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Naaman/Leveraging%20context%20to%20resolve%20identity%20in%20photo.pdf">Leveraging context to resolve identity in photo albums</a></span><br /> <span class="Author">M. Naaman and R. B. Yeh and H. Garcia-Molina and A. Paepcke</span><br /> <span class="Journal">Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries</span>&nbsp; <span class="Pages">178--187</span>&nbsp;
(<span class="Date">2005</span>)<br /><dd /><dt class="Key" id="berg2007naf">berg2007naf<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Berg/Names%20and%20Faces.pdf">Names and Faces</a></span><br /> <span class="Author">T. L. Berg and A. C. Berg and J. Edwards and M. Maire and R. White and Y. W. Teh and E. Learned-Miller and D. Forsyth</span><br /> <span class="Journal">University of California Berkeley. Technical report</span>&nbsp;
(<span class="Date">2007</span>)<br /><dd /><dt class="Key" id="tian2007faf">tian2007faf<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Tian/A%20Face%20Annotation%20Framework%20with%20Partial.pdf">A Face Annotation Framework with Partial Clustering and Interactive Labeling</a></span><br /> <span class="Author">Y. Tian and W. Liu and R. Xiao and F. Wen and X. Tang</span><br /> <span class="Journal">Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on</span>&nbsp; <span class="Pages">1--8</span>&nbsp;
(<span class="Date">2007</span>)<br /><dd /><dt class="Key" id="zhao2006apa">zhao2006apa<dt /><dd class="Pub"> <span class="Title"><a href="/papers/ZHAO/Automatic%20Person%20Annotation%20of%20Family%20Photo.pdf">Automatic Person Annotation of Family Photo Album</a></span><br /> <span class="Author">M. Zhao and Y. Teo and S. Liu and T. Chua and R. Jain</span><br /> <span class="Journal">International Conference on Image and Video Retrieval</span>&nbsp; <span class="Pages">163--172</span>&nbsp;
(<span class="Date">2006</span>)<br /><dd /><dt class="Key" id="zhang2005rfa">zhang2005rfa<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Zhang/Robust%20Face%20Alignment%20Based%20on%20Local.pdf">Robust Face Alignment Based on Local Texture Classifiers</a></span><br /> <span class="Author">L. Zhang and H. Ai and S. Xin and C. Huang and S. Tsukiji and S. Lao</span><br /> <span class="Journal">The IEEE International Conference on Image Processing</span>&nbsp; <span class="Pages">354--357</span>&nbsp;
(<span class="Date">2005</span>)<br /><dd /><dt class="Key" id="arandjelovic2006acl">arandjelovic2006acl<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Arandjelovic/Automatic%20Cast%20Listing%20in%20Feature-Length%20Films.pdf">Automatic Cast Listing in Feature-Length Films with Anisotropic Manifold Space</a></span><br /> <span class="Author">O. Arandjelovic and R. Cipolla</span><br /> <span class="Journal">2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition</span>&nbsp; <span class="Volume">2</span>&nbsp; <span class="Pages">1513--1520</span><dd /><dt class="Key" id="jaffre11ipl">jaffre11ipl<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Jaffre/Improvement%20of%20a%20person%20labelling%20method%20using.pdf">JImprovement of a person labelling method using extracted knowledge on costume</a></span><br /> <span class="Author">G. Jaffre and P. Joly</span><br /><dd /><dt class="Key" id="jaffre:cnf">jaffre:cnf<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Jaffre/Costume:%20A%20New%20Feature%20for%20Automatic%20Video%20Content.pdf">Costume: A New Feature for Automatic Video Content Indexing</a></span><br /> <span class="Author">G. Jaffre and P. Joly</span><br /> <span class="Journal">Coupling approaches, coupling media and coupling languages for information retrieval (RIAO)</span>&nbsp; <span class="Pages">314--325</span>&nbsp;
(<span class="Date">2004</span>)<br /><dd /><dt class="Key" id="anguelov2007cir">anguelov2007cir<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Anguelov/Contextual%20Identity%20Recognition%20in%20Personal%20Photo.pdf">Contextual Identity Recognition in Personal Photo Albums</a></span><br /> <span class="Author">D. Anguelov and K. Lee and S. B. Gokturk and B. Sumengen</span><br /> <span class="Journal">Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on</span>&nbsp; <span class="Pages">1--7</span>&nbsp;
(<span class="Date">2007</span>)<br /><dd /><dt class="Key" id="yacoob2005daa">yacoob2005daa<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Yacoob/Detection,%20Analysis%20and%20Matching%20of%20Hair.pdf">Detection, Analysis and Matching of Hair</a></span><br /> <span class="Author">Y. Yacoob and L. Davis</span><br /> <span class="Journal">Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on</span>&nbsp; <span class="Volume">1</span>&nbsp;	(<span class="Date">2005</span>)<br /><dd /><dt class="Key" id="song2006cah">song2006cah<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Song/Context-Aided%20Human%20Recognition--Clustering.pdf">Context-Aided Human Recognition--Clustering</a></span><br /> <span class="Author">Y. Song and T. Leung</span><br /> <span class="Journal">European Conference on Computer Vision</span>&nbsp;
(<span class="Date">2006</span>)<br /><dd /><dt class="Key" id="sivic2006fpr">sivic2006fpr<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Sivic/Finding%20people%20in%20repeated%20shots%20of%20the%20same.pdf">Finding people in repeated shots of the same scene</a></span><br /> <span class="Author">J. Sivic and C. L. Zitnick and R. Szeliski</span><br /> <span class="Journal">British Machine Vision Conference</span>&nbsp;
(<span class="Date">2006</span>)<br /><dd /><dt class="Key" id="yilmaz2006ots">yilmaz2006ots<dt /><dd class="Pub"> <span class="Title"><a href="/papers/YILMAZ/Object%20tracking:%20A%20survey.pdf">Object tracking: A survey</a></span><br /> <span class="Author">A. YILMAZ and O. JAVED and M. SHAH</span><br /> <span class="Journal">ACM computing surveys</span>&nbsp; <span class="Volume">38</span>&nbsp; <span class="Pages">1--45</span>&nbsp;
(<span class="Date">2006</span>)<br /><dd /><dt class="Key" id="dalal:hdu">dalal:hdu<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Dalal/Human%20Detection%20Using%20Oriented%20Histograms.pdf">Human Detection Using Oriented Histograms of Flow and Appearance</a></span><br /> <span class="Author">N. Dalal and B. Triggs and C. Schmid</span><br /><dd /><dt class="Key" id="kpalma:oap">kpalma:oap<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Kpalma/An%20Overview%20of%20Advances%20of%20Pattern%20Recognition%20Systems.pdf">An Overview of Advances of Pattern Recognition Systems in Computer Vision</a></span><br /> <span class="Author">K. Kpalma and J. Ronsin</span><br /><dd /><dt class="Key" id="gavrila1999vah">gavrila1999vah<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Gavrila/Visual%20analysis%20of%20human%20movement:%20A%20survey.pdf">Visual analysis of human movement: A survey</a></span><br /> <span class="Author">D. M. Gavrila</span><br /> <span class="Journal">Computer Vision and Image Understanding</span>&nbsp; <span class="Volume">73</span>&nbsp; <span class="Pages">82--98</span>&nbsp;
(<span class="Date">1999</span>)<br /><div class="Abstract"> The ability to recognize humans and their activities by vision
is key for a machine to interact intelligently and effortlessly with a
human-inhabited environment. Because of many potentially impor-
tant applications, ``looking at people'' is currently one of the most
active application domains in computer vision. This survey identi-
fies a number of promising applications and provides an overview
of recent developments in this domain. The scope of this survey is
limited to work on whole-body or hand motion; it does not include
work on human faces. The emphasis is on discussing the various
methodologies; they are grouped in 2-D approaches with or without
explicit shape models and 3-D approaches. Where appropriate, sys-
tems are reviewed. We conclude with some thoughts about future
directions.</div><dd /><dt class="Key" id="jones2002scm">jones2002scm<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Jones/Statistical%20Color%20Models%20with%20Application.pdf">Statistical Color Models with Application to Skin Detection</a></span><br /> <span class="Author">M. J. Jones and J. M. Rehg</span><br /> <span class="Journal">International Journal of Computer Vision</span>&nbsp; <span class="Volume">46</span>&nbsp; <span class="Pages">81--96</span>&nbsp;
(<span class="Date">2002</span>)<br /><div class="Abstract">The existence of large image datasets such as the set of photos
on the World Wide Web make it possible to build
powerful generic models for low-level image attributes
like color using simple histogram learning techniques. We describe
the construction of color models for skin and non-skin classes
from a dataset of nearly 1 billion labelled pixels. These
classes exhibit a surprising degree of separability which we exploit
by building a skin pixel detector achieving a detection
rate of 80% with 8.5% false positives. We compare the performance
of histogram and mixture models in skin detection
and find histogram models to be superior in accuracy and
computational cost. Using aggregate features computed from
the skin pixel detector we build a surprisingly effective detector for
naked people. Our results suggest that color can be a more
powerful cue for detecting people in unconstrained imagery
than was previously suspected. We believe this work is the
most comprehensive and detailed exploration of skin color
models to date.</div><dd /><dt class="Key" id="diplaros2004sdu">diplaros2004sdu<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Diplaros/Skin%20detection%20using%20the%20EM%20algorithm%20with.pdf">Skin detection using the EM algorithm with spatial constraints</a></span><br /> <span class="Author">A. Diplaros and T. Gevers and N. Vlassis</span><br /> <span class="Journal">Systems, Man and Cybernetics, 2004 IEEE International Conference on</span>&nbsp; <span class="Volume">4</span>&nbsp;	(<span class="Date">2004</span>)<br /><div class="Abstract"> Abstract -- In this paper, we propose a color-based method
for skin detection and segmentation, which also takes into
account the spatial coherence of the skin pixels. We treat the
problem of skin detection as an inference problem. We as-
sume that each pixel in an image has a hidden binary label
associated with it, that specifies if it is skin or not. In order
to solve the inference problem ,we use a variational EM al-
gorithm which incorporates the spatial constraints with just
a small computational overhead in the E-step. Finally, we
show that our method provides better results than the stan-
dard EM algorithm and a state-of-art skin-detection method
from the literature [9].</div><dd /><dt class="Key" id="gavrila07eb">gavrila07eb<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Gavrila/A%20Bayesian,%20Exemplar-Based%20Approach%20to%20Hierarchical%20Shape.pdf">A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching</a></span><br /> <span class="Author">D. M. Gavrila</span><br /> <span class="Volume">29</span>&nbsp;	(<span class="Date">2007</span>)<br /><dd /><dt class="Key" id="gavrila2007mcp">gavrila2007mcp<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Gavrila/Multi-cue%20Pedestrian%20Detection%20and%20Tracking%20from.pdf">Multi-cue Pedestrian Detection and Tracking from a Moving Vehicle</a></span><br /> <span class="Author">D. M. Gavrila and S. Munder</span><br /> <span class="Journal">International Journal of Computer Vision</span>&nbsp; <span class="Volume">73</span>&nbsp; <span class="Pages">41--59</span>&nbsp;
(<span class="Date">2007</span>)<br /><div class="Abstract">This paper presents a multi-cue vision system for the real-time detection and tracking of pedestrians from a moving vehicle. The detection component involves a cascade of modules, each utilizing complementary visual criteria to successively narrow down the image search space, balancing robustness and efficiency considerations. Novel is the tight integration of the consecutive modules: (sparse) stereo-based ROI generation, shape-based detection, texture-based classification and (dense) stereo-based verification. For example, shape-based detection activates a weighted combination of texture-based classifiers, each attuned to a particular body pose.Performance of individual modules and their interaction is analyzed by means of Receiver Operator Characteristics (ROCs). A sequential optimization technique allows the successive combination of individual ROCs, providing optimized system parameter settings in a systematic fashion, avoiding ad-hoc parameter tuning. Application-dependent processing constraints can be incorporated in the optimization procedure. Results from extensive field tests in difficult urban traffic conditions suggest system performance is at the leading edge.</div><dd /><dt class="Key" id="bowyer2006saa">bowyer2006saa<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Bowyer/A%20survey%20of%20approaches%20and%20challenges%20in%203D%20and%20multi-modal%203D+%202D%20face.pdf">A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition</a></span><br /> <span class="Author">K. W. Bowyer and K. Chang and P. Flynn</span><br /> <span class="Journal">Computer Vision and Image Understanding</span>&nbsp; <span class="Volume">101</span>&nbsp; <span class="Pages">1--15</span>&nbsp;
(<span class="Date">2006</span>)<br /><div class="Abstract">This survey focuses on recognition performed by matching models of the three-dimensional shape of the face, either alone or in combination with matching corresponding two-dimensional intensity images. Research trends to date are summarized, and challenges confronting the development of more accurate three-dimensional face recognition are identified. These challenges include the need for better sensors, improved recognition algorithms, and more rigorous experimental methodology.</div><dd /><dt class="Key" id="phillips2005ofr">phillips2005ofr<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Phillips/Overview%20of%20the%20face%20recognition%20grand%20challenge.pdf">Overview of the face recognition grand challenge</a></span><br /> <span class="Author">P. J. Phillips and P. J. Flynn and T. Scruggs and K. W. Bowyer and J. Chang and K. Hoffman and J. Marques and J. Min and W. Worek</span><br /> <span class="Journal">Proceedings of IEEE Conference on Computer Vision and Pattern Recognition</span>&nbsp; <span class="Volume">1</span>&nbsp; <span class="Pages">947--954</span>&nbsp;
(<span class="Date">2005</span>)<br /><div class="Abstract">Over the last couple of years, face recognition researchers have been developing new techniques. These developments are being fueled by advances in computer vision techniques, computer design, sensor design, and interest in fielding face recognition systems. Such advances hold the promise of reducing the error rate in face recognition systems by an order of magnitude over Face Recognition Vendor Test (FRVT) 2002 results. The Face Recognition Grand Challenge (FRGC) is designed to achieve this performance goal by presenting to researchers a six-experiment challenge problem along with data corpus of 50,000 images. The data consists of 3D scans and high resolution still imagery taken under controlled and uncontrolled conditions. This paper describes the challenge problem, data corpus, and presents baseline performance and preliminary results on natural statistics of facial imagery.</div><dd /><dt class="Key" id="zhu2006fhd">zhu2006fhd<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Zhu/Fast%20Human%20Detection%20Using%20a%20Cascade.pdf">Fast Human Detection Using a Cascade of Histograms of Oriented Gradients</a></span><br /> <span class="Author">Q. Zhu and S. Avidan and M. C. Yeh and K. T. Cheng</span><br /> <span class="Journal">Computer Vision and Pattern Recognition</span>&nbsp; <span class="Volume">1</span>&nbsp; <span class="Pages">4</span>&nbsp;
(<span class="Date">2006</span>)<br /><div class="Abstract">We integrate the cascade-of-rejectors approach with the Histograms of Oriented Gradients (HoG) features to achieve a fast and accurate human detection system. The features used in our system are HoGs of variable-size blocks that capture salient features of humans automatically. Using AdaBoost for feature selection, we identify the appropriate set of blocks, from a large set of possible blocks. In our system, we use the integral image representation and a rejection cascade which significantly speed up the computation. For a 320 Ã— 280 image, the system can process 5 to 30 frames per second depending on the density in which we scan the image, while maintaining an accuracy level similar to existing methods.</div><dd /><dt class="Key" id="dalai2005hog">dalai2005hog<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Dalai/Histograms%20of%20oriented%20gradients%20for%20human%20detection.pdf">Histograms of oriented gradients for human detection</a></span><br /> <span class="Author">N. Dalai and B. Triggs and I. Rhone-Alps and F. Montbonnot</span><br /> <span class="Journal">Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on</span>&nbsp; <span class="Volume">1</span>&nbsp;	(<span class="Date">2005</span>)<br /><div class="Abstract">We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.</div><dd /><dt class="Key" id="schneiderman2004odu">schneiderman2004odu<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Schneiderman/Object%20Detection%20Using%20the%20Statistics%20of%20Parts.pdf">Object Detection Using the Statistics of Parts</a></span><br /> <span class="Author">H. Schneiderman and T. Kanade</span><br /> <span class="Journal">International Journal of Computer Vision</span>&nbsp; <span class="Volume">56</span>&nbsp; <span class="Pages">151--177</span>&nbsp;
(<span class="Date">2004</span>)<br /><div class="Abstract">In this paper we describe a trainable object detector and its instantiations for detecting faces and cars at any size, location, and pose. To cope with variation in object orientation, the detector uses multiple classifiers, each spanning a different range of orientation. Each of these classifiers determines whether the object is present at a specified size within a fixed-size image window. To find the object at any location and size, these classifiers scan the image exhaustively.
Each classifier is based on the statistics of localized parts. Each part is a transform from a subset of wavelet coefficients to a discrete set of values. Such parts are designed to capture various combinations of locality in space, frequency, and orientation. In building each classifier, we gathered the class-conditional statistics of these part values from representative samples of object and non-object images. We trained each classifier to minimize classification error on the training set by using Adaboost with Confidence-Weighted Predictions (Shapire and Singer, 1999). In detection, each classifier computes the part values within the image window and looks up their associated class-conditional probabilities. The classifier then makes a decision by applying a likelihood ratio test. For efficiency, the classifier evaluates this likelihood ratio in stages. At each stage, the classifier compares the partial likelihood ratio to a threshold and makes a decision about whether to cease evaluation---labeling the input as non-object---or to continue further evaluation. The detector orders these stages of evaluation from a low-resolution to a high-resolution search of the image. Our trainable object detector achieves reliable and efficient detection of human faces and passenger cars with out-of-plane rotation.</div><dd /><dt class="Key" id="zhao2003frl">zhao2003frl<dt /><dd class="Pub"> <span class="Title"><a href="/papers/ZHAO/Face%20Recognition:%20A%20Literature%20Survey.pdf">Face Recognition: A Literature Survey</a></span><br /> <span class="Author">W. Zhao and R. Chellappa and P. Phillips and A. Rosenfeld</span><br /> <span class="Journal">ACM Computing Surveys</span>&nbsp; <span class="Volume">35</span>&nbsp; <span class="Pages">399--458</span>&nbsp;
(<span class="Date">2003</span>)<br /><div class="Abstract">As one of the most successful applications of image analysis and understanding, face
recognition has recently received significant attention, especially during the past
several years. At least two reasons account for this trend: the first is the wide range of
commercial and law enforcement applications, and the second is the availability of
feasible technologies after 30 years of research. Even though current machine
recognition systems have reached a certain level of maturity, their success is limited by
the conditions imposed by many real applications. For example, recognition of face
images acquired in an outdoor environment with changes in illumination and/or pose
remains a largely unsolved problem. In other words, current systems are still far away
from the capability of the human perception system.
This paper provides an up-to-date critical survey of still- and video-based face
recognition research. There are two underlying motivations for us to write this survey
paper: the first is to provide an up-to-date review of the existing literature, and the
second is to offer some insights into the studies of machine recognition of faces. To
provide a comprehensive survey, we not only categorize existing recognition techniques
but also present detailed descriptions of representative methods within each category.
In addition, relevant topics such as psychophysical studies, system evaluation, and
issues of illumination and pose variation are covered.</div><dd /><dt class="Key" id="viola2001rod">viola2001rod<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Viola/Rapid%20object%20detection%20using%20a%20boosted.pdf">Rapid object detection using a boosted cascade of simple features</a></span><br /> <span class="Author">P. Viola and M. Jones</span><br /> <span class="Journal">Proc. CVPR</span>&nbsp; <span class="Volume">1</span>&nbsp; <span class="Pages">511--518</span>&nbsp;
(<span class="Date">2001</span>)<br /><dd /><dt class="Key" id="yang2002dfi">yang2002dfi<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Yang/Detecting%20faces%20in%20images:%20a%20survey.pdf">Detecting faces in images: a survey</a></span><br /> <span class="Author">M. H. Yang and D. Kriegman and N. Ahuja</span><br /> <span class="Journal">Pattern Analysis and Machine Intelligence, IEEE Transactions on</span>&nbsp; <span class="Volume">24</span>&nbsp; <span class="Pages">34--58</span>&nbsp;
(<span class="Date">2002</span>)<br /><dd /><dt class="Key" id="osuna1997tsv">osuna1997tsv<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Osuna/Training%20support%20vector%20machines:%20an%20application.pdf">Training support vector machines: an application to face detection</a></span><br /> <span class="Author">E. Osuna and R. Freund and F. Girosi and others</span><br /> <span class="Journal">Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</span>&nbsp; <span class="Volume">24</span>&nbsp;	(<span class="Date">1997</span>)<br /><dd /><dt class="Key" id="rowley1998nnb">rowley1998nnb<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Rowley/Neural%20network-based%20face%20detection.pdf">Neural network-based face detection</a></span><br /> <span class="Author">H. Rowley and S. Baluja and T. Kanade</span><br /> <span class="Journal">Pattern Analysis and Machine Intelligence, IEEE Transactions on</span>&nbsp; <span class="Volume">20</span>&nbsp; <span class="Pages">23--38</span>&nbsp;
(<span class="Date">1998</span>)<br /><dd /><dt class="Key" id="vezhnevets2003spb">vezhnevets2003spb<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Vezhnevets/A%20survey%20on%20pixel-based%20skin%20color%20detection.pdf">A survey on pixel-based skin color detection techniques</a></span><br /> <span class="Author">V. Vezhnevets and V. Sazonov and A. Andreeva</span><br /> <span class="Journal">Proc. Graphicon</span>&nbsp; <span class="Pages">85--92</span>&nbsp;
(<span class="Date">2003</span>)<br /><div class="Abstract"> Skin color has proven to be a useful and robust cue for face de-
tection, localization and tracking. Image content filtering, content-
aware video compression and image color balancing applications
can also benefit from automatic detection of skin in images. Numer-
ous techniques for skin color modelling and recognition have been
proposed during several past years. A few papers comparing differ-
ent approaches have been published [Zarit et al. 1999], [Terrillon
et al. 2000], [Brand and Mason 2000]. However, a comprehensive
survey on the topic is still missing. We try to fill this vacuum by
reviewing most widely used methods and techniques and collecting
their numerical evaluation results.</div><dd /><dt class="Key" id="terrillon1998adh">terrillon1998adh<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Terrillon/Automatic%20detection%20of%20human%20faces%20in%20natural.pdf">Automatic detection of human faces in natural scene images by use of a skin color model and of invariant moments</a></span><br /> <span class="Author">J. C. Terrillon and M. David and S. Akamatsu</span><br /> <span class="Journal">Proc. of the Third International Conference on Automatic Face and Gesture Recognition</span>&nbsp; <span class="Pages">112--117</span>&nbsp;
(<span class="Date">1998</span>)<br /><dd /><dt class="Key" id="yang1997scm">yang1997scm<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Yang/Skin-color%20Modeling%20and%20Adaptation.pdf">Skin-color Modeling and Adaptation</a></span><br /> <span class="Author">J. Yang and W. Lu and A. Waibel</span><br /> (<span class="Date">1997</span>)<br /><dd /><dt class="Key" id="sigal2000eap">sigal2000eap<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Sigal/Estimation%20and%20prediction%20of%20evolving%20color%20distributions.pdf">Estimation and prediction of evolving color distributions for skin segmentation under varying illumination</a></span><br /> <span class="Author">L. Sigal and S. Sclaroff and V. Athitsos</span><br /> <span class="Journal">PROC IEEE COMPUT SOC CONF COMPUT VISION PATTERN RECOGNIT</span>&nbsp; <span class="Volume">2</span>&nbsp; <span class="Pages">152--159</span>&nbsp;
(<span class="Date">2000</span>)<br /><div class="Abstract"> A novel approach for real-time skin segmentation in video
sequences is described. The approach enables reliable skin
segmentation despite wide variation in illumination during
tracking. An explicit second order Markov model is used
to predict evolution of the skin color (HSV) histogram over
time. Histograms are dynamically updated based on feed-
back from the current segmentation and based on predic-
tions of the Markov model. The evolution of the skin color
distribution at each frame is parameterized by translation,
scaling and rotation in color space. Consequent changes
in geometric parameterization of the distribution are prop-
agated by warping and re-sampling the histogram. The
parameters of the discrete-time dynamic Markov model are
estimated using Maximum Likelihood Estimation, and also
evolve over time. Quantitative evaluation of the method
was conducted on labeled ground-truth video sequences
taken from popular movies.</div><dd /><dt class="Key" id="raja1998tas">raja1998tas<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Raja/Tracking%20and%20segmenting%20people%20in%20varying%20lighting.pdf">Tracking and segmenting people in varying lighting conditions using colour</a></span><br /> <span class="Author">Y. Raja and S. J. McKenna and S. Gong</span><br /> <span class="Journal">Third International Conference on Automatic Face and Gesture Recognition, Nara, Japan, IEEE Computer Society Press</span>&nbsp; <span class="Pages">228--233</span>&nbsp;
(<span class="Date">1998</span>)<br /><div class="Abstract"> Colour cues were used to obtain robust detection and
tracking of people in relatively unconstrained dynamic
scenes. Gaussian mixture models were used to estimate
probability densities of colour for skin, clothing and back-
ground. These models were used to detect, track and seg-
ment people, faces and hands. A technique for dynamically
updating the models to accommodate changes in apparent
colour due to varying lighting conditions was used. Two
applications are highlighted: (1) actor segmentation for vir-
tual studios, and (2) focus of attention for face and gesture
recognition systems. A system implemented on a 200MHz
PC tracks multiple objects in real-time.</div><dd /><dt class="Key" id="drew1998iic">drew1998iic<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Drew/Illumination-invariant%20color%20object%20recognition%20via%20compressedchromaticity.pdf">Illumination-invariant color object recognition via compressedchromaticity histograms of color-channel-normalized images</a></span><br /> <span class="Author">M. Drew and J. Wei and Z. N. Li</span><br /> <span class="Journal">Computer Vision, 1998. Sixth International Conference on</span>&nbsp; <span class="Pages">533--540</span>&nbsp;
(<span class="Date">1998</span>)<br /><dd /><dt class="Key" id="chang1996cts">chang1996cts<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Chang/Color%20texture%20segmentation%20for%20clothing%20in%20a%20computer-aided.pdf">Color texture segmentation for clothing in a computer-aided fashion design system</a></span><br /> <span class="Author">C. C. Chang and L. L. Wang</span><br /> <span class="Journal">Image and Vision Computing</span>&nbsp; <span class="Volume">14</span>&nbsp; <span class="Pages">685--702</span>&nbsp;
(<span class="Date">1996</span>)<br /><div class="Abstract">A traditional fashion designer has to draw a large number of drafts in order to accomplish an ideal style. Better performance can be achieved if these operations are done on computers, because the designer can easily make changes for various patterns and colors. To develop a computer-aided fashion design system, one of the most difficult tasks is to automatically separate the clothing from the background so that a new item can be `put on'. One difficulty of the segmentation work arises from the diverse patterns on the clothing, especially with folds or shadows. In this study, circular histograms are first utilized to quantize color and to reduce shadow/highlight effects. Then a color co-occurrence matrix and a color occurrence vector are proposed to characterize the color spatial dependence and color occurrence frequency of the clothing's texture. Next, based on the two color features blocks on the clothing are found by a region growing method. Finally, post-processing is applied to obtain a smooth clothing boundary. Experimental results are presented to show the feasibility of the proposed approach.</div><dd /><dt class="Key" id="darrell2000ipt">darrell2000ipt<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Darrell/Integrated%20Person%20Tracking%20Using%20Stereo,.pdf">Integrated Person Tracking Using Stereo, Color, and Pattern Detection</a></span><br /> <span class="Author">T. Darrell and G. Gordon and M. Harville and J. Woodfill</span><br /> <span class="Journal">International Journal of Computer Vision</span>&nbsp; <span class="Volume">37</span>&nbsp; <span class="Pages">175--185</span>&nbsp;
(<span class="Date">2000</span>)<br /><div class="Abstract">We present an approach to real-time person tracking in crowded and/or unknown environments using
integration of multiple visual modalities. We combine stereo, color, and face detection modules into a single robust
system, and show an initial application in an interactive, face-responsive display. Dense, real-time stereo processing
is used to isolate users from other objects and people in the background. Skin-hue classification identifies and tracks
likely body parts within the silhouette of a user. Face pattern detection discriminates and localizes the face within
the identified body parts. Faces and bodies of users are tracked over several temporal scales: short-term (user stays
within the field of view), medium-term (user exits/reenters within minutes), and long term (user returns after hours
or days). Short-term tracking is performed using simple region position and size correspondences, while medium
and long-term tracking are based on statistics of user appearance. We discuss the failure modes of each individual
module, describe our integration method, and report results with the complete system in trials with thousands of users.</div><dd /><dt class="Key" id="arandjelovic2005afr">arandjelovic2005afr<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Arandjelovic/Automatic%20face%20recognition%20for%20film%20character.pdf">Automatic face recognition for film character retrieval in feature-length films</a></span><br /> <span class="Author">O. Arandjelovic and A. Zisserman</span><br /> <span class="Journal">Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on</span>&nbsp; <span class="Volume">1</span>&nbsp;	(<span class="Date">2005</span>)<br /><dd /><dt class="Key" id="everingham2005iiv">everingham2005iiv<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Everingham/Identifying%20individuals%20in%20video%20by%20combining%20generative.pdf">Identifying individuals in video by combining generative and discriminative head models</a></span><br /> <span class="Author">M. Everingham and A. Zisserman</span><br /> <span class="Journal">Proc. ICCV</span>&nbsp; <span class="Pages">1103--1110</span>&nbsp;
(<span class="Date">2005</span>)<br /><dd /><dt class="Key" id="zhang2003aah">zhang2003aah<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Zhang/Automated%20annotation%20of%20human%20faces%20in%20family.pdf">Automated annotation of human faces in family albums</a></span><br /> <span class="Author">L. Zhang and L. Chen and M. Li and H. Zhang</span><br /> <span class="Journal">Proceedings of the eleventh ACM international conference on Multimedia</span>&nbsp; <span class="Pages">355--358</span>&nbsp;
(<span class="Date">2003</span>)<br /><div class="Abstract"> Automatic annotation of photographs is one of the most desirable
needs in family photograph management systems.  In this paper, we
present a learning framework to automate the face annotation in
family photograph albums.  Firstly, methodologies of content-based
image retrieval and face recognition are seamlessly integrated to
achieve automated annotation.  Secondly, face annotation is
formulated in a Bayesian framework, in which the face similarity
measure is defined as maximum a posteriori (MAP) estimation.
Thirdly, to deal with the missing features, marginal probability is
used so that samples which have missing features are compared with
those having the full feature set to ensure a non-biased decision.
The experimental evaluation has been conducted within a family
album of few thousands of photographs and the results show that the
proposed approach is effective and efficient in automated face
annotation in family albums.</div><dd /><dt class="Key" id="apostof07">apostof07<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Apostoloff/Who%20Are%20You%3F%20realtime%20person%20identification.pdf">Who Are You? realtime person identification</a></span><br /> <span class="Author">N. Apostoloff and A. Zisserman</span><br /> (<span class="Date">2007</span>)<br /><dd /><dt class="Key" id="Everingham06a">Everingham06a<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Everingham/Hello!%20My%20name%20is...%20Buffy%20--%20Automatic.pdf">Hello! My name is... Buffy -- Automatic Naming of Characters in TV Video</a></span><br /> <span class="Author">M. Everingham and J. Sivic and A. Zisserman</span><br /> (<span class="Date">2006</span>)<br /><dd /><dt class="Key" id="kruppa">kruppa<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Kruppa/Fast%20and%20Robust%20Face%20Finding%20via%20Local.pdf">Fast and Robust Face Finding via Local Context</a></span><br /> <span class="Author">H. Kruppa and M. Costrillon-Santana and B. Schiele</span><br /> <span class="Journal">Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance</span>&nbsp;
(<span class="Date">2003</span>)<br /><dd /><dt class="Key" id="santana">santana<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Castrillon/ENCARA2:%20Real-time%20Detection%20of%20Multiple%20Faces.pdf">ENCARA2: Real-time Detection of Multiple Faces at Different Resolutions in Video Streams</a></span><br /> <span class="Author">M. Castrillón Santana and O. Déniz Suárez and M. Hernández Tejera and C. Guerra Artal</span><br /> <span class="Journal">Journal of Visual Communication and Image Representation</span>&nbsp; <span class="Pages">130-140</span>&nbsp;
(<span class="Date">2007</span>)<br /><dd /><dt class="Key" id="sanne">sanne<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Korzec/Classifying%20the%20Head-shoulder%20region%20and%20orientation%20in%20pedestrians.pdf">Classifying the Head-shoulder region and orientation in pedestrians</a></span><br /> <span class="Author">S. Korzec</span><br /> (<span class="Date">2007</span>)<br /><dd /><dt class="Key" id="mori">mori<dt /><dd class="Pub"> <span class="Title"><a href="/papers/xren_cvpr04_people.pdf">Recovering Human body configurations: Combining segmentation and recognition</a></span><br /> <span class="Author">G. Mori and X. Ren and A. A. Efros and J. Malik</span><br /> <span class="Journal">IEEE Computer Vision and Pattern Recognition</span>&nbsp; <span class="Pages">326-333</span>&nbsp;
(<span class="Date">2004</span>)<br /><dd /><dt class="Key" id="partassembly">partassembly<dt /><dd class="Pub"> <span class="Title"><a href="/papers/Micilotta/Detection%20and%20tracking%20of%20Humans%20by%20Probalistic%20Body.pdf">Detection and tracking of Humans by Probalistic Body Part Assembly</a></span><br /> <span class="Author">A. Micilotta</span><br /><div class="Abstract">This paper presents a probabilistic framework of assembling detected hu-
man body parts into a full 2D human configuration. The face, torso, legs
and hands are detected in cluttered scenes using boosted body part detectors
trained by AdaBoost. Body configurations are assembled from the detected
parts using RANSAC, and a coarse heuristic is applied to eliminate obvious
outliers. An a priori mixture model of upper-body configurations is used to
provide a pose likelihood for each configuration. A joint-likelihood model
is then determined by combining the pose, part detector and corresponding
skin model likelihoods. The assembly with the highest likelihood is selected
by RANSAC, and the elbow positions are inferred. This paper also illustrates
the combination of skin colour likelihood and detection likelihood to further
reduce false hand and face detections.</div><dd /><dt class="Key" id="viola">viola<dt /><dd class="Pub"> <span class="Title"><a href="/papers/viola2003_iccv_ped_cascade_class.pdf">Robust real-time face detection</span><br /> <span class="Author">P. Viola and M. J. Jones</a></span><br /> <span class="Journal">International Journal of Computer Vision</span>&nbsp; <span class="Volume">57</span>&nbsp; <span class="Pages">137-154</span>&nbsp;
(<span class="Date">2004</span>)<br /><dd /></dl></div></p> <img src="http://feeds.feedburner.com/~r/Nichol4s/~4/WTZs_4awMhA" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://nichol.as/person-recognition-with-python/feed</wfw:commentRss> <slash:comments>0</slash:comments> <feedburner:origLink>http://nichol.as/person-recognition-with-python</feedburner:origLink></item> <item><title>Climategate battle — start sharing data</title><link>http://feedproxy.google.com/~r/Nichol4s/~3/sMgBCM4_Qyo/climate-gate-boxing-match-start-sharing-data</link> <comments>http://nichol.as/climate-gate-boxing-match-start-sharing-data#comments</comments> <pubDate>Fri, 11 Dec 2009 14:56:30 +0000</pubDate> <dc:creator>Nicholas Piël</dc:creator> <category><![CDATA[Uncategorized]]></category> <category><![CDATA[rant]]></category><guid isPermaLink="false">http://nichol.as/?p=156</guid> <description><![CDATA[
Now that the dust has somewhat settled after climategate, the consensus seems to be that it has been overblown. If you look at the timeline of events this isn&#8217;t surprising. Between the public appearance of the report and the first damning articles on the 20th there was less then a single day.  It is not that [...]]]></description> <content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fnichol.as%2Fclimate-gate-boxing-match-start-sharing-data"><br /> <img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fnichol.as%2Fclimate-gate-boxing-match-start-sharing-data&amp;source=nichol4s&amp;style=normal&amp;service=bit.ly&amp;service_api=R_067b6a3b78b750bb3ff5fdfa4e005c82" height="61" width="50" title="Climategate battle    start sharing data" alt=" Climategate battle    start sharing data" /><br /> </a></div><p>Now that the dust has somewhat settled after climategate, the consensus seems to be that it <a title="Time: Climategate overblown?" href="http://www.time.com/time/health/article/0,8599,1946082-1,00.html">has been overblown</a>. If you look at the timeline of events this isn&#8217;t surprising. Between the <a title="160mb of boring stuff" href="http://www.megaupload.com/?d=U44FST89">public appearance of the report</a> and the <a href="http://blogs.news.com.au/heraldsun/andrewbolt/index.php/heraldsun/comments/hadley_hacked/">first</a> damning articles on the 20th there was less then a single day.  It is not that difficult to question how thorough the review of 160mb of data was.  It simply wasn&#8217;t.</p><p>It was as if some people thought they had hit gold and where aggressively searching for that specific quote within the leaked emails which would make them famous instantly. But all in all it was a bit disappointing if you where hoping to find exciting revelations. The thing that could be distilled from the e-mails was that most researchers are having strong opinions and big ego&#8217;s, but this shouldn&#8217;t really be a surprise.</p><p>It is naive to think that scientists are unbiassed, they simply aren&#8217;t. However, they are expected to backup up their views with unbiassed facts. The main argument thats left if we ignore all personal slander seems to be focused around a quote in one of the emails concerning the <a title="World Meteorological Organisation - 1999 Global Climate Update" href="http://nichol.as/papers/wmo913.pdf">WMO Statement of the status of the global climate in 1999</a>. The front page of this report shows the picture below and indicates that 1990-1999 has been the hottest decade on the record. So yes, <strong>it is </strong><strong>an argument about a 10 year old report. </strong>It might be worth noting that a few days ago (8 dec 2009), the World Meteorological Institute came with a new <a title="2000-2009, warmest decade on records" href="http://www.wmo.int/pages/mediacentre/press_releases/pr_869_en.html">press release that our current decade is the warmest on record</a>. That information got probably lost in the heated debate.</p><p style="text-align: center;"><img class="size-full wp-image-175  aligncenter" style="border: 1px solid grey;" title="The disputed graph" src="http://nichol.as/wp-content/uploads/2009/12/1009061939.jpg" alt="1009061939 Climategate battle    start sharing data" width="381" height="250" /></p><p>From the leaked emails conservative news sources state that the following quote is a clear sign of <a style="color: #14568a !important;" href="http://blogs.telegraph.co.uk/news/jamesdelingpole/100017393/climategate-the-final-nail-in-the-coffin-of-anthropogenic-global-warming/">manipulation of evidence</a>:</p><blockquote><p style="padding-left: 30px;">&#8220;I’ve just completed Mike’s Nature trick of adding in the real temps to each series for the last 20 years (ie from 1981 onwards) amd from 1961 for Keith’s to hide the decline.&#8221;</p></blockquote><p><span style="color: #232323; font-family: Arial, Verdana, Lucida, serif, sans; font-style: normal; font-size: 12px;">But is it? In <a title="CRU2" href="http://www.uea.ac.uk/mac/comm/media/press/2009/nov/CRUupdate">a rebuttal by the Climate Research Unit</a> they state the following:</span></p><blockquote><p style="padding-left: 30px;">This email referred to a “trick” of adding recent instrumental data to the end of temperature reconstructions that were based on proxy data.</p><p style="padding-left: 30px;">Phil Jones comments further: “One of the three temperature reconstructions was based entirely on a particular set of tree-ring data that shows a strong correlation with temperature from the 19th century through to the mid-20th century, but does not show a realistic trend of temperature after 1960. This is well known and is called the ‘decline’ or ‘divergence’. The use of the term ‘hiding the decline’ was in an email written in haste. CRU has not sought to hide the decline. Indeed, CRU has published a number of articles that both illustrate, and discuss the implications of, this recent tree-ring decline, including the article that is listed in the legend of the WMO Statement figure.</p></blockquote><p>They also provide an extra graph where they show the climate reconstruction and the recent instrumental data seperately:</p><p style="text-align: center;"><a href="http://nichol.as/wp-content/uploads/2009/12/4052145227.jpg"><img class="size-full wp-image-182  aligncenter" title="seperated data" src="http://nichol.as/wp-content/uploads/2009/12/4052145227.jpg" alt="seperated data" width="476" height="198" /></a></p><p style="text-align: center;"><p>So, as you can see there isn&#8217;t really anything shocking to report.</p><p>It seems that our viewpoint concerning climate change seems closely linked to our position on the political spectrum. In the red corner, we have the conservatives who consider any idea where they might need to change their way of living threatening. In the blue corner we have the progressives, those who feel that change is a goal not just a method. During the first round of the climate gate boxing match we mainly heard the conservative viewpoint represented by the <a href="http://blogs.telegraph.co.uk/news/jamesdelingpole/100017393/climategate-the-final-nail-in-the-coffin-of-anthropogenic-global-warming/">Telegraph</a>, <a href="http://www.foxnews.com/opinion/2009/11/24/john-lott-climate-change-emails-copenhagen/">FOX news</a>, <a title="The Global Cooling Cover Up" href="http://www.washingtontimes.com/news/2009/nov/27/the-global-cooling-cover-up/">Washington Times</a> and <a title="Climate gate gets worse" href="http://johnrlott.blogspot.com/2009/11/climate-gate-gets-worse.html">lots</a> of <a href="http://dancingczars.wordpress.com/2009/12/03/12-days-of-climate-gate-and-the-networks-still-ignore-the-scandal-ah-the-media-czars/">infuriated</a> <a href="http://jkshaws.wordpress.com/2009/12/04/climate-gate-heats-up-but-mainstream-media-ignore-firestorm/">bloggers</a> but now that that the round is over i think the focus will shift to a more progressive point of view. You see, wether or not climate change is happening we will have to think about how we manage our environment. We are <a style="color: #14568a !important;" href="http://en.wikipedia.org/wiki/Oil_reserves">running out of resources</a> and we are <a style="color: #14568a !important;" href="http://en.wikipedia.org/wiki/Great_Pacific_Garbage_Patch">polluting our environment </a>. When we do not act accordingly we will end up like <a style="color: #14568a !important;" title="Jared Diamond on why Societies Collapse" href="http://www.ted.com/talks/jared_diamond_on_why_societies_collapse.html">the easter islands</a>.</p><h2>Round 2: The need for data sharing</h2><p>A positive result of this climate battle is the renewed focus on the public availability of data and methodologies. CRU claims that 95% of their data is already open to the public and that they will make the remaining 5% publicly available, which is great news.  This movement of &#8216;data freeing&#8217; is a great initiative, certainly in this time of collective sharing. John Wilbanks of <a style="color: #14568a !important;" title="Science Commons" href="http://sciencecommons.org/">Science Commons</a> says the following:</p><blockquote><p style="padding-left: 30px;"><em>&#8220;the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery…..we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge&#8221;.</em></p></blockquote><p>Making our research widely available is a great way to catalyze progress in the broadest sense, this is probably better illustrated with the next video by  <a style="color: #14568a !important;" title="WSJ article Dylan, son of and famous for wil.i.am" href="http://online.wsj.com/article/SB123872027544485035.html">Jesse Dylan</a>.</p><p><center><embed src="http://blip.tv/play/gpxS0PgiAg" type="application/x-shockwave-flash" width="480" height="270" allowscriptaccess="always" allowfullscreen="true"></embed></center></p><p>The importance of data sharing is already recognized by the government of the USA, they have created <a href="http://data.gov">data.gov</a> with the purpose to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. It currently has over 118000 different datasets which really makes it a dataminers wetdream.</p><h2>My efforts on data sharing</h2><p>In an effort to not just stand along the sideline but participate in this &#8216;release your data&#8217; party, I have decided to put my master thesis and its results in the public domain. For <a title="Msc Thesis" href="http://nichol.as/papers/thesis.pdf">my master thesis</a> i have implemented a system in mostly Python code which does person recognition on static images. You can compare it with what <a href="http://picasa.google.com/">Google&#8217;s Picasa</a> does. However, i was able to outperform Picasa in recognition rate on a few datasets. I have already released some of the source code on <a href="http://bitbucket.org/nicholas/projects/">BitBucket</a> and you can find a little bit more information on the <a href="http://nichol.as/projects">Projects</a> page.</p><p>In the next few months i am going to explain this approach in more detail and will put up my collected resources and a bibtex file. I think this will be a great start for anyone interested in machine vision and person recognition. If your interested <a href="http://twitter.com/nichol4s">just follow me on twitter!</a></p> <img src="http://feeds.feedburner.com/~r/Nichol4s/~4/sMgBCM4_Qyo" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://nichol.as/climate-gate-boxing-match-start-sharing-data/feed</wfw:commentRss> <slash:comments>0</slash:comments> <feedburner:origLink>http://nichol.as/climate-gate-boxing-match-start-sharing-data</feedburner:origLink></item> <item><title>How Google is wasting your bandwidth</title><link>http://feedproxy.google.com/~r/Nichol4s/~3/JZ9qHaLZvoI/how-google-is-wasting-your-bandwidth</link> <comments>http://nichol.as/how-google-is-wasting-your-bandwidth#comments</comments> <pubDate>Mon, 30 Nov 2009 15:37:06 +0000</pubDate> <dc:creator>Nicholas Piël</dc:creator> <category><![CDATA[Uncategorized]]></category> <category><![CDATA[cdn]]></category> <category><![CDATA[javascript]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[programming]]></category><guid isPermaLink="false">http://nichol.as/?p=112</guid> <description><![CDATA[
Using a Content Delivery Network (CDN) is a method  to improve the performance of your website. Some of the reasons for using a CDN are:Placing content geographically close to the end user and thus lowering latency and increasing bandwidth.
Increasing the amount of parallel downloads at the client by distributing over different domains
Offload the burden on [...]]]></description> <content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fnichol.as%2Fhow-google-is-wasting-your-bandwidth"><br /> <img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fnichol.as%2Fhow-google-is-wasting-your-bandwidth&amp;source=nichol4s&amp;style=normal&amp;service=bit.ly&amp;service_api=R_067b6a3b78b750bb3ff5fdfa4e005c82" height="61" width="50" title="How Google is wasting your bandwidth" alt=" How Google is wasting your bandwidth" /><br /> </a></div><p>Using a <a href="http://en.wikipedia.org/wiki/Content_delivery_network">Content Delivery Network (CDN)</a> is a method  to improve the performance of your website. Some of the reasons for using a CDN are:</p><ul><li>Placing content geographically close to the end user and thus lowering latency and increasing bandwidth.</li><li>Increasing the amount of parallel downloads at the client by distributing over different domains</li><li>Offload the burden on your servers</li><li>Facilitate long term caching by using a robust source for libraries</li></ul><p>Especially this last point is why I looked at <a style="color: #14568a !important;" href="http://code.google.com/apis/ajaxlibs/">Google&#8217;s CDN for Ajax libraries</a>. It is a beautiful idea. When more people are using the same CDN, the cost of downloading an Ajax library can be ignored because it is very likely that the web browser will already have the library in its belly. Wonderfull!</p><p>For example, when I try to download the <a href="http://ajax.googleapis.com/ajax/libs/prototype/1.6.1.0/prototype.js">Prototype library</a> everything goes well and the Google&#8217;s CDN spits the following back:</p><blockquote><p>Content-Type:text/javascript; charset=UTF-8<br /> Date:Mon, 30 Nov 2009 14:31:50 GMT<br /> Expires:Tue, <strong>30 Nov 2010 </strong>13:56:34 GMT</p></blockquote><p>As you can see, Google tells your browser to cache the file for a full year as it should. Now, lets look at what happens when trying this with <a title="Jquery 1.3.2" href="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js">JQuery 1.3.2</a>:</p><blockquote><p>Content-Type:text/javascript; charset=UTF-8<br /> Date:Mon, 30 Nov 2009 14:40:15 GMT<br /> Expires:Tue, <strong>30 Nov 2010</strong> 14:40:15 GMT</p></blockquote><p>Again, everything is ok. Now, lets try a different version, <a href="http://ajax.googleapis.com/ajax/libs/jquery/1.3/jquery.min.js">JQuery 1.3</a>:</p><blockquote><p>Content-Type:text/javascript; charset=UTF-8<br /> Date:Mon, 30 Nov 2009 14:41:42 GMT<br /> Expires:Mon, 30 Nov 2009 <strong>15:41:42 GMT</strong></p></blockquote><p>Huh? When requesting the 1.3 version, Google is basically telling us to<em> &#8216;remember it only for one hour&#8217;</em>. This is wrong imho. When you specify 1.3, you are telling Google you want &#8216;<em>the latest version in the 1.3 series</em>&#8216;. On the<a style="color: #14568a !important;" href="http://jquery.com/"> jquery.com</a> site they are linking to the 1.3 version as well. This means that for a pagehit on the jquery website you will re download 60k of minified Jquery goodness when this file is not in your cache. A better approach would be if Google just let the client do its cache  revalidation  (which can do so by using the <em>&#8216;if-modified since&#8217;</em> header).</p><p>But wait there is more. Wordpress, for example, adds an extra version argument to the file (<em>?ver=&lt;bla&gt;</em>). This can be handy when you want to generate a certain script or css file dynamically. And really should not be a problem with Google&#8217;s CDN, should it? Well lets see what happens when we request <a href="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js?ver=1.3.2">http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js?ver=1.3.2</a></p><blockquote><p>Content-Type:text/javascript; charset=UTF-8<br /> Date:Mon, 30 Nov 2009 15:00:00 GMT<br /> Expires:<strong>Fri, 01 Jan 1990</strong> 00:00:00 GMT<br /> Last-Modified:Mon, 23 Nov 2009 18:54:21 GMT</p></blockquote><p>Holy cow, Google invented time travel!  The implications of this are pretty big, this may affect a lot of people with Wordpress blogs who where &#8217;smart enough&#8217; to use the Google CDN but without really testing if it worked.</p><p>Basically what this means is that http://ajax.googleapis.com isn&#8217;t really your performance safety net. You need to know exactly what you&#8217;re doing otherwise it will bite you back and you might be better off just hosting those libraries on your own site. Thus my recommendation would be to use the Google CDN but specify exactly which version of the library you are going to need and make sure you do not provide any arguments.</p> <img src="http://feeds.feedburner.com/~r/Nichol4s/~4/JZ9qHaLZvoI" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://nichol.as/how-google-is-wasting-your-bandwidth/feed</wfw:commentRss> <slash:comments>11</slash:comments> <feedburner:origLink>http://nichol.as/how-google-is-wasting-your-bandwidth</feedburner:origLink></item> <item><title>Hello world!</title><link>http://feedproxy.google.com/~r/Nichol4s/~3/ketjYAjc85Q/hello-world</link> <comments>http://nichol.as/hello-world#comments</comments> <pubDate>Mon, 16 Nov 2009 20:26:15 +0000</pubDate> <dc:creator>Nicholas Piël</dc:creator> <category><![CDATA[Uncategorized]]></category> <category><![CDATA[Python]]></category><guid isPermaLink="false">http://nichol.as/wordpress/?p=1</guid> <description><![CDATA[
Yes, &#8220;Hello world!&#8221; , the first default post on a Wordpress blog. After much consideration i decided to simply use the best piece of software there is for blogging purposes. Yeah, i know it is PHP code and not Python and trust me, i tried really hard to use some Python alternative and even started [...]]]></description> <content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fnichol.as%2Fhello-world"><br /> <img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fnichol.as%2Fhello-world&amp;source=nichol4s&amp;style=normal&amp;service=bit.ly&amp;service_api=R_067b6a3b78b750bb3ff5fdfa4e005c82" height="61" width="50" title="Hello world!" alt=" Hello world!" /><br /> </a></div><p>Yes, &#8220;Hello world!&#8221; , the first default post on a <a href="http://www.wordpress.org">Wordpress</a> blog. After much consideration i decided to simply use the best piece of software there is for blogging purposes. Yeah, i know it is PHP code and not Python and trust me, i tried really hard to use some Python alternative and even started working (doesn&#8217;t everybody else?) on my own blogging software written on top of <a href="http://pylonshq.com/">Pylons</a>, <a href="http://www.makotemplates.org/">Mako</a> and <a href="https://storm.canonical.com/">STORM</a>, but it just doesn&#8217;t compare with WP.</p><p>Well, i must admit <a href="http://zine.pocoo.org/">Zine</a> (Python blog software) looks nice, but still, WP integrates so nice with everything else. Choosing for WP isn&#8217;t just choosing a software package, it is being part of a huge community. It is hard to compete with the zillions of PHP coders that extend Wordpress or integrate it with other platforms, no matter how much nicer your language of choice is.</p><p>The things that i where looking for, at minimum:</p><ul><li>Easy editing of posts, the WP webinterface rocks!</li><li>RSS Support</li><li>Edit with XML-RPC</li><li><a href="http://en.wikipedia.org/wiki/Pingback">Pingbacks</a></li><li>Spamfiltering, WP integrates nicely with <a href="http://akismet.com/">Akismet</a></li></ul><p>But with WP Plugins you easily get so much more:</p><ul><li><a href="http://en.gravatar.com/">Gravatars</a> (Globally Recognized Avatars)</li><li><a href="http://openid.net/">OpenID</a> One password for all your sites</li><li><a title="WP to Twitter" href="http://www.joedolson.com/articles/wp-to-twitter/">Twitter</a> <a title="Twitter widget" href="http://rick.jinlabs.com/code/twitter">Integration</a></li><li><a title="LinkedIN Application" href="http://www.linkedin.com/opensocialInstallation/preview?_ch_panel_id=1&amp;_applicationId=2200">LinkedIn integration</a></li><li>Easy integration with <a title="Google Analyticor" href="http://ronaldheft.com/code/analyticator/">Google Analytics</a></li><li><a title="autoptimize" href="http://www.turleando.com.ar/autoptimize/">Client side optimization</a> with CSS sprites and data-uri</li><li>Robust performance <a title="WP Super Cache" href="http://ocaoimh.ie/wp-super-cache/">through caching</a></li><li>SEO <a title="Headspace 2" href="http://urbangiraffe.com/plugins/headspace2/">optimized</a></li><li>Automatic <a title="redirect plugin" href="http://urbangiraffe.com/plugins/redirection/">redirection of renamed pages</a></li></ul><p>So, here it is my Wordpress blog, hosted behind a NGINX proxy which handles all static data and some Apache workers that handle the PHP scripts.  I must say, I feel confident that it can handle a Digg. At least, i benchmarked it to handle 3000 reqs/second without breaking a sweat.</p> <img src="http://feeds.feedburner.com/~r/Nichol4s/~4/ketjYAjc85Q" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://nichol.as/hello-world/feed</wfw:commentRss> <slash:comments>0</slash:comments> <feedburner:origLink>http://nichol.as/hello-world</feedburner:origLink></item> </channel> </rss><!-- Dynamic page generated in 4.276 seconds. --><!-- Cached page generated by WP-Super-Cache on 2010-07-22 10:25:07 -->
