<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"> <channel><title>Giovanni Bajo's swapfile</title> <link>http://giovanni.bajo.it</link> <description>Because memory is volatile</description> <lastBuildDate>Sun, 11 Sep 2011 14:00:42 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=</generator> <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/giovannibajo" /><feedburner:info uri="giovannibajo" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item><title>Golomb-coded sets: smaller than Bloom filters</title><link>http://feedproxy.google.com/~r/giovannibajo/~3/U3Jp_m1MDjY/</link> <comments>http://giovanni.bajo.it/2011/09/golomb-coded-sets/#comments</comments> <pubDate>Sun, 04 Sep 2011 16:49:34 +0000</pubDate> <dc:creator>Giovanni Bajo</dc:creator> <category><![CDATA[C++]]></category> <category><![CDATA[Optimizations]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[algorithms]]></category> <category><![CDATA[optimization]]></category> <category><![CDATA[Programming]]></category> <guid isPermaLink="false">http://giovanni.bajo.it/?p=197</guid> <description><![CDATA[While reading the super-interesting Imperial Violet, Adam Langley's weblog, I stumbled upon a new data structure that I had never heard of: the Golomb-coded sets (GCS). It is a probabilistic data structure conceptually similar to the famous Bloom filters, but with a more compact in-memory representation, and a slower query time.]]></description> <content:encoded><![CDATA[<p>While reading the super-interesting <a
href="http://www.imperialviolet.org/">Imperial Violet</a>, Adam Langley&#8217;s weblog, I stumbled upon a new data structure that I had never heard of: the <strong>Golomb-coded sets (GCS)</strong>. It is a probabilistic data structure conceptually similar to the famous <a
href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</a>, but with a more compact in-memory representation, and a slower query time.</p><p>A bloom filter with the optimal number of hash functions (= bits per element) usually occupies a space in memory that is N * log2(e) * log2(1/P) bits, where N is the number of elements that you want to store, and P is the false-positive probability. To put things into perspective, let&#8217;s say you want to store 100K elements with 1 false positive every 8K elements. Given that log2(8K) ≈ log2(8192) = 13, and log2(e) ≈ 1.44, you need 100K * 1.44 * 13 bits ≈ 1828 KiB ≈ 1.78 MiB.</p><p>The theoretical minimum for a similar probabilistic data structure would be N * log2(1/P), so a bloom filter is roughly using 44% more memory than theoretically necessary (log(e) = 1.44). GCS is a way to get closer to that minimum.</p><p>GCS is well suit in situations where to want to minimize the memory occupation and you can afford a slightly higher computation time, compared to a Bloom filter. Google Chromium, for instance, uses it to keep a local (client) set of <a
href="http://en.wikipedia.org/wiki/Revocation_list">SSL CRL</a>; they prefer lower memory occupation because it is specifically important in constrained scenarios (e.g.: mobile), and they can afford the structure to be a little bit slower than Bloom filter since it&#8217;s still much faster than a SSL handshake (double network roundtrip).</p><h2>Turning words into values</h2><p>GCS is actually quite simple, and I will walk you through it step by step. First, let&#8217;s agree on a dictionary of words you want to put into the set, for instance, the NATO alphabet:</p><pre class="brush: python; title: ; notranslate">
['alpha', 'bravo', 'charlie', 'delta', 'echo', 'foxtrot',
 'golf', 'hotel', 'india', 'juliet', 'kilo', 'lima', 'mike',
 'november', 'oscar', 'papa', 'quebec', 'romeo', 'sierra',
 'tango', 'uniform', 'victor', 'whiskey', 'xray', 'yankee',
 'zulu']
</pre><p>We want to create a data-structure for these words with a 1 on 64 false-positive probability. This means that we expect that we will be able to check the whole english dictionary against it, and about 1 word every 64 will result to be present even if it&#8217;s not.</p><p>We compute a single hash key for each different element, as integers in the range [0, N*P). Since N=26 (the length of the NATO alphabet) and P=64, the range is [0, 1664]. As in the case of Bloom filters, we want this hash to be uniformly distributed across the domain, so a cryptographic hash like MD5 or SHA1 is a good choice (and no, the fact that there are pre-image attacks on MD5 does not matter much in this scenario). We will need a way to convert a 128-bit or 160-bit hash to a number in the range [0, 1664], but the moral equivalent of the modulus is the enough to not affect the distribution. We will then compute the hash as follows:</p><pre class="brush: python; title: ; notranslate">
def gcs_hash(w, (N,P)):
    &quot;&quot;&quot;
    Hash value for a GCS with N elements and 1/P probability
    of false positives.
    We just need a hash that generates uniformally-distributed
    values for best results, so any crypto hash is fine. We
    default to MD5.
    &quot;&quot;&quot;
    h = md5(w).hexdigest()
    h = long(h[24:32],16)
    return h % (N*P)
</pre><p>If we apply this function over the input words, we get these hash values:</p><pre class="brush: python; title: ; notranslate">
[('alpha', 1017L), ('bravo', 591L), ('charlie', 1207L), ('delta', 151L),
 ('echo', 1393L), ('foxtrot', 1005L), ('golf', 526L), ('hotel', 208L),
 ('india', 461L), ('juliet', 1378L), ('kilo', 1231L), ('lima', 192L),
 ('mike', 1630L), ('november', 1327L), ('oscar', 997L), ('papa', 662L),
 ('quebec', 806L), ('romeo', 1627L), ('sierra', 866L), ('tango', 890L),
 ('uniform', 1134L), ('victor', 269L), ('whiskey', 512L), ('xray', 831L),
 ('yankee', 1418L), ('zulu', 1525L)]
</pre><p>Let&#8217;s now just get the hash values and sort them. This is the result:</p><pre class="brush: python; title: ; notranslate">
[151L, 192L, 208L, 269L, 461L,
 512L, 526L, 591L, 662L, 806L,
 831L, 866L, 890L, 997L, 1005L,
 1017L, 1134L, 1207L, 1231L,
 1327L, 1378L, 1393L, 1418L,
 1525L, 1627L, 1630L]
</pre><p>Remember that the range was [0, 1664). Given that we used a cryptographic hash, we expect these values to be uniformly distributed across that range. They look like it at glance, and obviously it gets much better with real-world data sets which are much larger. If we plot these values, we can double-check the distribution:</p><p> <img
style="margin-left:auto;margin-right:auto" src="http://giovanni.bajo.it/files/2011/09/gcs-distrib1.png" alt="gcs distrib1 Golomb coded sets: smaller than Bloom filters" border="0" width="415" height="398" title="Golomb coded sets: smaller than Bloom filters" /></p><h2>Let's compress</h2><p>We now want to compress this set of number in the most efficient way. General purpose algorithms like zlib are obviously the wrong choice here, since they work by finding repetition of strings, and the 16-bit or 32-bit encoding of the above numbers would look like random data to zlib. Some compression theory comes to the rescue: the best way to compress an unordered uniform data set is to compute the array of differences, which will be a geometric distribution, and then use the Golomb encoding. Did I lose you? Let's see it one step at a time.</p><p>If we compute the increments (differences) between a uniformly distribute set of values, the result is a <a
href="http://en.wikipedia.org/wiki/Geometric_distribution">geometric distribution</a>. Recall that we originally decided for a range of exactly 26*64, and then we picked 26 uniformly distributed values within it. If you were to bet on the most likely distance between a value and the next one, wouldn't you say "64"? Yes. And we can argue that most distances are going to be numbers pretty close to the value 64, and far larger values are extremely unlikely. This intuition matches with the geometric distribution (whose correspondent in the continuos domain is the exponential distribution).</p><p>In our example, this is the array of differences:</p><pre class="brush: python; title: ; notranslate">
[151L, 41L, 16L, 61L, 192L, 51L, 14L, 65L, 71L, 144L,
 25L, 35L, 24L, 107L, 8L, 12L, 117L, 73L, 24L, 96L,
 51L, 15L, 25L, 107L, 102L, 3L]
</pre><p>Again, we can check this with a little plot (after sorting them):</p><p><img
style="margin-left:auto;margin-right:auto" src="http://giovanni.bajo.it/files/2011/09/gcs-geom1.png" alt="gcs geom1 Golomb coded sets: smaller than Bloom filters" border="0" width="408" height="369" title="Golomb coded sets: smaller than Bloom filters" /></p><p>The parameter p of this geometric distribution should be exactly the false probability we chose above (1/64). To double-check, we can <a
href="http://en.wikipedia.org/wiki/Geometric_distribution#Parameter_estimation">estimate the parameter p</a> by dividing the number of values by their sum: 26 / 1438 = 0.0175320, which is close enough to 1 / 64 = 0.015625. Again, with a larger input set, the numbers would be even closer.</p><h2>Golomb encoding</h2><p>We now want to compress this set of differences with <a
href="http://en.wikipedia.org/wiki/Golomb_coding">Golomb encoding</a>. As Wikipedia says, "alphabets following a geometric distribution will have a Golomb code as an optimal prefix code, making Golomb coding highly suitable for situations in which the occurrence of small values in the input stream is significantly more likely than large values". In fact, we are going to use simplified sub-case of Golomb encoding, in which the parameter p is a power of 2 (like 64 is, in our case). This sub-case is called Rice encoding.</p><p>Back to the intuition: we are going to compress values which are very likely to be as small as the value 64, and very unlikely to be much bigger; 128 is unlikely, 192 is very unlikely, 256 is very very unlikely, and so on. Golomb encoding splits each value in two parts: the quotient and the remainder of the division by the parameter. Given what we just said about the likeness, you must expect the quotient to be likely 0 or 1, unlikely to be 2, very unlikely to be 3, very very unlikely to be 4, etc. On the other hand, the remainder is probably just a random number we can't infer much about, it's the high frequency oscillation which is impossible to predict. Golomb (Rice) coding simply encodes the quotient in <i>base 1 (unary encoding)</i> and the remainder in <i>base 2 (binary encoding)</i>. Unary encoding might sound weird at first, but it's really simple:</p><table
style="border-spacing: 3em 0em"><tr><th>Number</th><th>Unary encoding</th></tr><tr><td>0</td><td>0</td></tr><tr><td>1</td><td>10</td></tr><tr><td>2</td><td>110</td></tr><tr><td>3</td><td>1110</td></tr><tr><td>4</td><td>11110</td></tr><tr><td>5</td><td>111110</td></tr><tr><td>6</td><td>1111110</td></tr></table><p>So we emit as many 1s as the number we want to encode (the quotient) followed by a zero. Then, we emit the binary encoding of the remainder using exactly 6 bits (since it will be a number between 0 and 63). Thus, a number between 0 and 63 will be exactly 7 bits long: 1 bit for the quotient (0) and 6 bits for the remainder. A number between 64 and 127 will be 8 bits long (quotient 10, plus the remainder); A number between 128 and 191 will be 9 bits long (quotient 110); and so on. Smaller numbers are as compact as possible, higher and unlikely numbers gets longer and longer. Let's see our array of differences properly encoded:</p><table
style="border-spacing: 3em 0em"><tr><th>Number</th><th>Quot</th><th>Rem</th><th>Golomb encoding</th></tr><tr><td>151</td><td>2</td><td>23</td><td
style="font-family: monospace">110&nbsp;&nbsp;010111</td></tr><tr><td>41</td><td>0</td><td>41</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;101001</td></tr><tr><td>16</td><td>0</td><td>16</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;010000</td></tr><tr><td>61</td><td>0</td><td>61</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;111101</td></tr><tr><td>192</td><td>3</td><td>0</td><td
style="font-family: monospace">1110&nbsp;000000</td></tr><tr><td>51</td><td>0</td><td>51</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;110011</td></tr><tr><td>14</td><td>0</td><td>14</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;001110</td></tr><tr><td>65</td><td>1</td><td>1</td><td
style="font-family: monospace">10&nbsp;&nbsp;&nbsp;000001</td></tr><tr><td>71</td><td>1</td><td>7</td><td
style="font-family: monospace">10&nbsp;&nbsp;&nbsp;000111</td></tr><tr><td>144</td><td>2</td><td>16</td><td
style="font-family: monospace">110&nbsp;&nbsp;010000</td></tr><tr><td>25</td><td>0</td><td>25</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;011001</td></tr><tr><td>35</td><td>0</td><td>35</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;100011</td></tr><tr><td>24</td><td>0</td><td>24</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;011000</td></tr><tr><td>107</td><td>1</td><td>43</td><td
style="font-family: monospace">10&nbsp;&nbsp;&nbsp;101011</td></tr><tr><td>8</td><td>0</td><td>8</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;001000</td></tr><tr><td>12</td><td>0</td><td>12</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;001100</td></tr><tr><td>117</td><td>1</td><td>53</td><td
style="font-family: monospace">10&nbsp;&nbsp;&nbsp;110101</td></tr><tr><td>73</td><td>1</td><td>9</td><td
style="font-family: monospace">10&nbsp;&nbsp;&nbsp;001001</td></tr><tr><td>24</td><td>0</td><td>24</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;011000</td></tr><tr><td>96</td><td>1</td><td>32</td><td
style="font-family: monospace">10&nbsp;&nbsp;&nbsp;100000</td></tr><tr><td>51</td><td>0</td><td>51</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;110011</td></tr><tr><td>15</td><td>0</td><td>15</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;001111</td></tr><tr><td>25</td><td>0</td><td>25</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;011001</td></tr><tr><td>107</td><td>1</td><td>43</td><td
style="font-family: monospace">10&nbsp;&nbsp;&nbsp;101011</td></tr><tr><td>102</td><td>1</td><td>38</td><td
style="font-family: monospace">10&nbsp;&nbsp;&nbsp;100110</td></tr><tr><td>3</td><td>0</td><td>3</td><td
style="font-family: monospace">0&nbsp;&nbsp;&nbsp;&nbsp;000011</td></tr></table><p>And if we concatenate all the output, we get our final Golomb-coded set of numbers:</p><pre>
11001011 10101001 00100000 11110111
10000000 01100110 00111010 00000110
00011111 00100000 01100101 00011001
10001010 10110001 00000011 00101101
01100010 01001100 01010000 00110011
00011110 01100110 10101110 10011000
00011
</pre><p>197 bits (25 padded bytes) to encode 26 arbitrary-long words with a 1.5% of false positives. That's 7.57 bits per word.  Not bad! The theoretical minimum number of bits was 26 * log2(64) = 156, so we're still a little off in this example, but still better than an optimal Bloom filter which would require 225 bits. The example I chose is obviously too small and thus it's very impacted by the specific words and the output of the MD5. I ran the same algorithm over a 640K-words English dictionary with expected false probability 1/1024, and I got a 7,405,432 bits GCS, which is about 11.58 bits per word. An optimal Bloom filter for the same set would take 9,227,646 bits, while the theoretical minimum would be 6,396,530 bits. Quite an improvement, in fact.</p><h2>Decompression and query improvements</h2><p>So how do we now query for a word to see if it's in the set or not? We just need to reverse all the steps.<p>We start going through the bits. We extract the quotient Q by simply counting the number of consecutive 1s before the terminating 0; then we extract the fixed-size remainder R (exactly P bits, 6 in our example). We compute the original difference (Q*P+R), and we accumulate it into an integer so that we regenerate the original sorted set of hash values, one element at a time. We don't need to actually expand the whole set in memory: as we go through the bits and compute one hash value at a time, we can compare it with the one that we are being queried for, to see if there's a match.</p><p>After you reverse the encoding and difference steps, the underlying hash value set is sorted. So going through the set in linear order sounds like slower than it could be. One would think of doing a bisect search, but there is no easy way to jump to an arbitrary index in the encoded set, since the encoded elements have different size in bits, and they can be decoded only one at a time in linear order.</p><p>What we can do to improve query time a bit is to compute an index that allows to seek within the encoded set. For instance, if you want to make the query time 32 times faster, you can split the original domain [0...N*P) in 32 subdomains of equal size N*P/32. Then, for each subdomain, you find the smallest hash value that is part of the subdomain, and save its bit-index in the encoded GCS within the index. Since the hash values are uniformly distributed, each subdomain will contain roughly N/32 values, so by seeking into it while querying you will need to decode only 1/32th of the whole GCS, thus getting the 32x speed increase in query time. In the full English dictionary example cited above, an additional index made of 32 indices of 32-bits each is just 1,204 bits of additional memory; compared to a 7 million bits GCS, it's a good deal to obtain a 32x speed increase in query time!</p><h2>Play with the code</h2><p>The full code is <a
href="https://github.com/rasky/gcs">available on GitHub</a>. I wrote both a Python and a C++ implementation that you can compare. The Python code is meant to be really simple and not really optimized (e.g.: it even streams the set from the disk when you do a query). The C++ is a little bit more optimized, though still mostly an academic example. Notice that the code does not implement the index to speed up decompression. Even if it's good deal, it's obviously not mandatory and just an optimization.</p> ]]></content:encoded> <wfw:commentRss>http://giovanni.bajo.it/2011/09/golomb-coded-sets/feed/</wfw:commentRss> <slash:comments>1</slash:comments> <feedburner:origLink>http://giovanni.bajo.it/2011/09/golomb-coded-sets/</feedburner:origLink></item> <item><title>Is Convergence really solving the SSL problem?</title><link>http://feedproxy.google.com/~r/giovannibajo/~3/WvnC_joujNk/</link> <comments>http://giovanni.bajo.it/2011/09/is-convergence-really-solving-the-ssl-problem/#comments</comments> <pubDate>Sun, 04 Sep 2011 16:13:36 +0000</pubDate> <dc:creator>Giovanni Bajo</dc:creator> <category><![CDATA[BOFH]]></category> <category><![CDATA[Web]]></category> <guid isPermaLink="false">http://rasky.bloghosting.develer.net/?p=189</guid> <description><![CDATA[At DEFCON 2011, Moxie Marlinspike presented a possible solution to the "big SSL problem": Convergence, a clever way to remove the need of certificate authorities. But is it really going to solve it?]]></description> <content:encoded><![CDATA[<p>At DEFCON 2011, <a
href="http://blog.thoughtcrime.org/">Moxie Marlinspike</a> presented a possible solution to the &#8220;big SSL problem&#8221;: <a
href="http://convergence.io/">Convergence</a>, a clever way to remove the need of certificate authorities. But is it really going to solve it?</p><p>Let&#8217;s step back a little. We all know that SSL is kind of broken because of the need to rely on certificate authorities. Moxie himself has a <a
href="http://blog.thoughtcrime.org/ssl-and-the-future-of-authenticity">great blog post</a> on the subject. In a word, we don&#8217;t want to pay certificate authorities, we have too many of them (up to 70 in recent browsers/OS), we can&#8217;t really trust all of them, and we don&#8217;t have an easy way to revoke trust in the certificates they issue.</p><p>Convergence starts from the idea that it would be really great to avoid CAs altogether and use self-signed certificates, but self-signed certificates are vulnerable to man-in-the-middle (MITM) attacks. So the clever idea is noting that MITM is a local attack: it&#8217;s either someone next to you drinking a coffee at Starbucks, or someone that hacked your ISP&#8217;s DNS, or maybe someone working for a corrupted government that&#8217;s <a
href="http://www.foxnews.com/politics/2010/11/16/internet-traffic-reportedly-routed-chinese-servers/">hijacking traffic at the BGP level</a>. It&#8217;s unrealistic that the same MITM attack can affect you, someone in Iceland, someone in China, someone in West Virgina, and someone in Italy at the same time, right? And that&#8217;s what Convergence exploits: it gives you a way to compare SSL certificates fetched from the website you want to visit from many different servers in the world, called &#8220;notaries&#8221;. If they all match, a MITM attack is impossible, and you can trust the self-signed certificate and proceed logging in into your bank. Right?</p><p>Wrong. Because you know what else self-signed certificates are vulnerable too, in addition to MITM? <strong>Lies</strong>. And you know who lies? A phisher. So if a phisher registers bankoffamerica.com (pay attention to the typo!) and self-signs the website with a SSL certificate saying that the organization behind the website really is Bank of America Corporation incorporated in Delaware, all notaries will report that the certificate is exactly the same as fetched from different parts of the world, and you will get absolutely no warning.</p><p>And what is worse is that you will <strong>lose</strong> any EV indication while browsing with Convergence, since Convergence simply does not currently support any way to validate the identity of the website. As Moxie himself <a
href="https://twitter.com/moxie__/status/108305243212754945">says</a>, &#8220;Convergence does not enable EV for self-signed certs. It is concerned with authenticity, not identity&#8221;.</p><p>So to clarify: if you start browsing today with Convergence (and assuming you get a good set of notaries to bootstrap), you get the following effects:</p><ol><li>You will be immune to MITM attacks with rogue certificates, like in the current Diginotar&#8217;s debacle.</li><li>The CA list in your browser will be effectively ignored.</li><li>You will stop seeing any information from EV certificates (identities of the sites your browse).</li></ol><p>I personally don&#8217;t consider this a good compromise. It might be that I don&#8217;t live in China or Iran, but MITM attacks are an order of magnitude less common than phishing attacks, and people are slowly learning to trust EV certificates when browsing the Internet. It&#8217;s true that EV SSL certificates could be forged as well, I don&#8217;t dispute that. But right now with Convergence, I&#8217;m going to trade a simple protection against a common set of attacks with a good protection against a rare set of attacks.</p><p>I reached Moxie with these concerns, and he clarified that, technically speaking, Convergence could be enhanced to check for existing EV SSL certificates (through a custom notary), but that he doesn&#8217;t see EV certificates as solving any real problem nowadays. I beg to disagree: I don&#8217;t like EV per-se as well, but I think the identity problem is still something that must be solved on the Internet, and it&#8217;s probably even more important than solving the MITM problem.</p><p>Given that the number of websites which are targets of phishing attacks are relatively small because it&#8217;s mainly a group of high-profile sites (banks, web mails, social networks, etc.), and given that SSL certificates do not change so often (usually no more than once in a year for a high-profile website), there must be a way to conceive a global list of validated certificates for which an identity can be certified, even through a crowd-sourced mechanism. It has to be simpler than what GPG attempts to do with key-signing parties, because we don&#8217;t need to certify the identity of Mr John Green that you never met before, plus other 1 billion people; you just need to certify that Google is Google and PayPal is Paypal, for a one thousand of high profile websites. If you ask 10 people in 10 different countries to give you the SSL fingerprint for &#8220;Google, Inc.&#8221;, and you get 10 identical fingerprints, you can be 100% sure that the certificate you get is really for &#8220;Google, Inc.&#8221;. And if Google commits to use the same certificate for the next 3 months, you could globally cache this information, and distribute it to web users through notaries in a way that their address bar says &#8220;This is a certified Google Inc. website&#8221;. Or, in other words, &#8220;This is the same Google Inc. website that 1 million of people have visited in the last 2 hours, and 100 millions in the last 24 hours&#8221;.</p><p>If Convergence could be augmented to do something similar, I think we would be getting closer to the final solution of the SSL problem.</p> ]]></content:encoded> <wfw:commentRss>http://giovanni.bajo.it/2011/09/is-convergence-really-solving-the-ssl-problem/feed/</wfw:commentRss> <slash:comments>9</slash:comments> <feedburner:origLink>http://giovanni.bajo.it/2011/09/is-convergence-really-solving-the-ssl-problem/</feedburner:origLink></item> <item><title>The algorithms behind OTP tokens</title><link>http://feedproxy.google.com/~r/giovannibajo/~3/HSy_7LLWJEg/</link> <comments>http://giovanni.bajo.it/2011/05/the-algorithms-behind-otp-tokens/#comments</comments> <pubDate>Sun, 29 May 2011 22:21:25 +0000</pubDate> <dc:creator>Giovanni Bajo</dc:creator> <category><![CDATA[BOFH]]></category> <category><![CDATA[hash]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[security]]></category> <category><![CDATA[tokens]]></category> <guid isPermaLink="false">http://rasky.bloghosting.develer.net/?p=159</guid> <description><![CDATA[A friend asked me some time time ago how his bank&#8217;s OTP token worked. Most tokens that banks use (at least in Italy) are products of the &#8220;RSA SecurID&#8221; family, which are proprietary and secret (and rumored to have been compromised), but the general cryptography behind them is well-known and there are open standards that [...]]]></description> <content:encoded><![CDATA[<div
class="zemanta-img" style="margin: 1em"><div
class="wp-caption alignright" style="width: 310px"><a
href="http://commons.wikipedia.org/wiki/File:RSA_SecurID_SID800.jpg"><img
src="http://giovanni.bajo.it/files/2011/05/300px-RSA_SecurID_SID8007.jpg" alt="300px RSA SecurID SID8007 The algorithms behind OTP tokens" width="300" height="143" title="The algorithms behind OTP tokens" /></a><p
class="wp-caption-text">Image via Wikipedia</p></div></div><p>A friend asked me some time time ago how his bank&#8217;s <a
class="zem_slink" title="One-time password" rel="wikipedia" href="http://en.wikipedia.org/wiki/One-time_password">OTP</a> token worked. Most tokens that banks use (at least in Italy) are products of the &#8220;RSA SecurID&#8221; family, which are proprietary and secret (and rumored to have been <a
href="http://www.rsa.com/node.aspx?id=3872">compromised</a>), but the general cryptography behind them is well-known and there are open standards that can be easily deployed by companies that want to add an extra layer of security. An emerging standard for OTP generation is called <a
href="http://www.openauthentication.org/">OATH</a>, and given the general availability of free implementations, I will detail its algorithms in this post.</p><h2>Some background on CSPRNG</h2><p>OTPs (one-time passwords) are based on the concept of a so-called <em>cryptographically-secure pseudo-random number generator</em> (aka CSPRNG). As many programmers know, pseudo-random generator (as found in the standard library of most programming languages, such as C&#8217;s rand()) are algorithms that generate a repeatable sequence of numbers that are &#8220;random looking&#8221;; there are several way to measure <em>how random</em> a sequence is, but the important property that differentiate PRNG from CSPRNG is not really concerned with randomness per-se, but rather with <em>how easy is to predict the next number</em> just by looking at the previous ones. This is important in the OTP context because obviously an attacker might get to know the previous numbers generated by the system (eg. through a key-logger installed on user&#8217;s computer), so it is paramount to make sure that he cannot exploit this knowledge to generate the next number.</p><p>Once we have chosen a suitable CSPRNG, it is sufficient that the server and the client agrees on the &#8220;seed&#8221; to be used for the sequence; in fact, both sides will be able to independently generate the same sequence and thus for the server it will be easy to check if the numbers generated by the client are the same that it generates by itself. As long as the algorithm is truly secure and the &#8220;seed&#8221; is not leaked, the ability of generating the next number can be considered a good authentication mechanism, since nobody else should be able to generate the same sequence. By embedding the seed onto an off-line physical device (the token), leaking the seed becomes almost impossible. If the device is stolen, it is sufficient to revoke it and assign a new device to the user (with a new &#8220;seed&#8221;).</p><p>I&#8217;m putting the word seed between quotes because it does not tell all the truth. Any PRNG algorithm works by keeping an internal state; when the next number is requested, the algorithm manipulates (changes) the internal state in some way, and then produces a number as result of this computation. In the <a
href="http://en.wikipedia.org/wiki/Linear_congruential_generator">simplest of all random algorithms</a> (the same used by many implementation of C&#8217;s rand()), the internal state is simply the previous number being generated, but this is obviously not good for a CSPRNG. What we want is a way to generate a number from an internal state in a way not to leak <em>any</em> information (if possible at all) about the internal state. The &#8220;seed&#8221; is just a way to initializes the internal state, but if the PRNG is not secure, it will eventually leak enough information to let an attacker reconstruct the internal state, even if the seed itself was never leaked.</p><h2>Getting to the code</h2><p>So, how do we make sure that we can generate a number from a state without leaking information? What we are looking for is called a one-way function. Luckily, in cryptography, we have plenty of one-way functions available: they are called &#8220;secure hash functions&#8221;, MD5 and SHA1 being the most popular. If we decide that we trust SHA-1 as a good one-way function (that is, we accept that our CSPRNG be as strong as SHA-1, and broken as soon as SHA-1 will be broken), then we don&#8217;t need a complex internal state: we can simply use a progressive counter for that. SHA-1&#8242;s properties in fact already guarantee that, given SHA1(N), there is no way to reconstruct N; and for every N&#8217; very &#8220;similar&#8221; to N (eg: obtained by bit-flipping, or a simple increment), there is no correlation at all between SHA1(N&#8217;) and SHA1(N).</p><p>So, the sequence SHA1(0), SHA1(1), SHA1(2), etc. can be considered a CSPRNG. But it is just one sequence; it&#8217;s true that can be arbitrary long, but how can we obtain different sequences for different users? The first solution that comes to mind is to use the same technique that is used to differentiate hashes of the same password: a salt. If we assign each user a random salt S, we can prefix it to the counter and obtain a unique sequence by computing SHA1(S + N). Let&#8217;s see this at work with some basic Python:</p><pre class="brush: python; title: ; notranslate">
import os
from hashlib import sha1
def OTP(salt=None, n=0, digits=6):
    if salt is None:
        salt = os.urandom(8)
        print &quot;Salt:&quot;, salt.encode(&quot;hex&quot;)
    while True:
        hash = sha1(salt + repr(n)).hexdigest()
        num = long(hash, 16) % (10**digits)
        yield &quot;%06d&quot; % num
        n += 1
o = OTP()
print o.next()
print o.next()
</pre><p>which produces an output just like this:</p><pre class="brush: plain; gutter: false; title: ; notranslate">
Salt: e4cd2b5b4dbbed47
031049
163172
</pre><p>Up until now I have used the word &#8220;salt&#8221; but this is in fact incorrect. The term salt is supposed to indicate a random string of bytes which is generated for each invokation of the hash/cipher. In the CSPRNG case, instead, the random data is generated once per user, and reused through the life of the sequence. Together with the counter, this &#8220;salt&#8221; is actually the secret state of the algorithm. The correct term for that string is &#8220;secret key&#8221;; if it is leaked, the CSPRNG is basically broken (since recovering the current counter is just a matter of generating the whole sequence up until the match).</p><h2>The real OATH (RFC 4226)</h2><p>OATH is basically the above algorithm, with only a few variations to make it more secure in face of future attacks to the underlying SHA1 hash function:</p><ul><li>&#8220;Mutating&#8221; a hash through a secret key is a common need in cryptography, and there is a stronger algorithm to achieve it: HMAC. HMAC-SHA1(S,N) is basically the stronger version of SHA1(S+N), with S fixed and N changing. OATH uses HMAC-SHA1.</li><li>A longer (20 bytes) secret key is used, as required/suggest by HMAC-SHA1.</li><li>The counter is always converted to a 8-byte string (its big-endian binary representation). The above code was using repr as a simple way to obtain a string from a counter.</li><li>To extract the final 6 digits that form the OTP, we convert the whole SHA1 digest (20 bytes) into a number and then we calculate the modulo with 1,000,000. This means that we basically use the last few bytes of the SHA1 digest and discard the rest. This is not a problem right now because SHA1 is still secure, but to minimize risks OATH describes a way to extract 6 digits taking all 20 bytes into account. The exactly details can be read in the specifications.</li></ul><p>Even with the above changes, OATH is quite easy to implement, and in fact there exist many different free implementations. For instance, you can download a generic OATH client for both <a
href="http://itunes.apple.com/it/app/oath-token/id364017137?mt=8">iOS</a> and <a
href="http://code.google.com/p/androidtoken/">Android</a>, which are very good substitutes for a hardware token (with just the inconvenience that it is much much easier to extract the secret key in case an attack has physical access to the device for a while).</p> ]]></content:encoded> <wfw:commentRss>http://giovanni.bajo.it/2011/05/the-algorithms-behind-otp-tokens/feed/</wfw:commentRss> <slash:comments>3</slash:comments> <feedburner:origLink>http://giovanni.bajo.it/2011/05/the-algorithms-behind-otp-tokens/</feedburner:origLink></item> <item><title>Compile-time Function Execution in D</title><link>http://feedproxy.google.com/~r/giovannibajo/~3/UKPSq_-KpdE/</link> <comments>http://giovanni.bajo.it/2010/05/compile-time-function-execution-in-d/#comments</comments> <pubDate>Sun, 30 May 2010 16:07:42 +0000</pubDate> <dc:creator>Giovanni Bajo</dc:creator> <category><![CDATA[Optimizations]]></category> <category><![CDATA[C++]]></category> <category><![CDATA[Compile-time Function Execution]]></category> <category><![CDATA[D]]></category> <category><![CDATA[Languages]]></category> <category><![CDATA[Programming]]></category> <guid isPermaLink="false">http://giovanni.bajo.it/?p=121</guid> <description><![CDATA[This post will describe the basics of a very powerful feature of the D programming language: the Compile-time Function Execution (CTFE), which allows complicated functions to be fully evaluated at compile-time, irrespective of the optimization levels. C optimizations If you are an average C programmer, you know that simple code can be trusted to be [...]]]></description> <content:encoded><![CDATA[<p>This post will describe the basics of a very powerful feature of the D programming language: the <a
class="zem_slink" href="http://en.wikipedia.org/wiki/Compile_time_function_execution" title="Compile time function execution" rel="wikipedia">Compile-time Function Execution</a> (CTFE), which allows complicated functions to be fully evaluated at compile-time, irrespective of the optimization levels.</p><h2>C optimizations</h2><p>If you are an average C programmer, you know that simple code can be trusted to be evaluated at compile-time thanks to optimizers. For instance, if you write something like:</p><pre class="brush: cpp; title: ; notranslate">
void square(int x) { return x*x; }
void foo(void)
{
   int k = square(32);
   /* ... */
}
</pre><p>you trust your compiler to evaluate the square at compile-time, when optimizations are on. When things get more hairy:</p><pre class="brush: cpp; title: ; notranslate">
int factorial(int x)
{
    int result = x;
    while (--x)
         result *= x;
   return result;
}
void foo(void)
{
   int k = factorial(8);
   /* ... */
}
</pre><p>C programmers get immediately less confident about what will happen at run-time. For instance, would you say that your compiler is able to expand the above code at compile-time or not? Actually, the answer is &#8220;yes&#8221; in this particular case (unless you are using a very old compiler), but the point is still valid: this is not C code that one would write if he wants to be <b>sure</b> that the whole calculation be folded at compile-time.</p><p>There is also another issue: since the language does not mandate that the value is folded (and in fact, it is not folded when optimizations are disabled), you cannot create a constant out of it, such as by assigning it to a const variable.</p><h2>When things get hairy</h2><p>Now, let&#8217;s try with a (very naive and simple) solution of problem #1 of <a
href="http://projecteuler.net/index.php?section=problems&amp;id=1">Project Euler</a>:</p><pre class="brush: cpp; title: ; notranslate">
#include &lt;stdio.h&gt;
int euler1(int max)
{
   int i,res=0;
   for (i=1; i&lt;max; i++)
   {
       if ((i % 3) == 0 || (i % 5) == 0)
           res += i;
   }
   return res;
}
int main()
{
   int r10 = euler1(10);
   int r1000 = euler1(1000);
   printf(&quot;%d %d\n&quot;, r10, r1000);
   return 0;
}
</pre><p>This program simply calculates the sum of all divisors of 3 or 5 below 1000. But if you look at the generated code with GCC under <code>-O3</code>, you will see that the actual results are not computed at compile-time, but rather calculated at runtime. I believe any average C programmer would agree that we should not expect this code to be folded at compile time.</p><p>Now, meet the equivalent D code:</p><pre class="brush: cpp; title: ; notranslate">
int euler1(int max)
{
   int i,res=0;
   for (i=1; i&lt;max; i++)
   {
       if ((i % 3) == 0 || (i % 5) == 0)
           res += i;
   }
   return res;
}
int main()
{
   int r10 = euler1(10);
   int r1000 = euler1(1000);
   printf(&quot;%d %d\n&quot;, r10, r1000);
   return 0;
}
</pre><p>Deja-vu? Yes, it is <b>exactly</b> the same, barring the initial include statement that is not required (actually, there is no preprocessor in D and modules refer to each other with the <code>import</code> statement, but <code>printf</code> is a builtin). Of course, the above example was hand-crafted to make it both valid C and D code, but being D an evolution of C, the basic syntax is the same.</p><h2>Meet CTFE</h2><p>And now the hattrick: in D, we can request the compiler to evaluate <code>euler1</code> at compile-time by simply using the <code>static</code> keyword at invocation time:</p><pre class="brush: cpp; title: ; notranslate">
   static int r10 = euler1(10);
   static int r1000 = euler1(1000);
</pre><p>Great, isn&#8217;t it? Now the result of the above function call are evaluated by the compiler, irrespective of the optimization levels. If the function cannot be evaluated at compile-time (usually because it has side-effects, like any kind of I/O), it will trigger a compile-time error.</p><p>We can verify that the above constants really do appear in the generated code by compiling with <code>gdc -save-temps euler1.d</code> and then inspecting <code>euler1.s</code>:</p><pre class="brush: plain; highlight: [2,8]; title: ; notranslate">
D6euler14mainFZi3r10i:
        .long   23
.globl _D6euler14mainFZi5r1000i
        .align 4
        .type   _D6euler14mainFZi5r1000i, @object
        .size   _D6euler14mainFZi5r1000i, 4
_D6euler14mainFZi5r1000i:
        .long   233168
        .section        .rodata
.LC0:
        .string &quot;%d %d\n&quot;
        .text
.globl _Dmain
        .type   _Dmain, @function
_Dmain:
.LFB3:
        pushq   %rbp
.LCFI3:
        movq    %rsp, %rbp
.LCFI4:
        movl    _D6euler14mainFZi5r1000i(%rip), %edx
        movl    _D6euler14mainFZi3r10i(%rip), %esi
        movl    $.LC0, %edi
        movl    $0, %eax
        call    printf
        movl    $0, %eax
        leave
        ret
</pre><p>Notice how the compiler has calculated the values 23 and 233168 (respectively, results of <code>euler1(10)</code> and <code>euler1(1000)</code>) and put them in the data section of the executable.</p><p>If you are curious of what happens when the compiler cannot do the whole evaluation at compile-time, it is sufficient to stick a <code>printf()</code> call somewhere in the <code>euler()</code> function. Since <code>printf()</code> does some I/O, it breaks CFTE, and the compiler will happily tell you about that:</p><pre class="brush: plain; title: ; notranslate">
euler1.d:9: Error: cannot evaluate printf(&quot;hello, world!\x0a&quot;) at compile time
euler1.d:15: Error: cannot evaluate euler1(10) at compile time
euler1.d:16: Error: cannot evaluate euler1(1000) at compile time
</pre><h2>Closing words</h2><p>CTFE is simple yet very powerful. The fact that it is triggered at the call site (rather than being an attribute of the function, like the <code>inline</code> keyword) is a very smart design choice: it makes perfectly sense for the same function to be used at both run-time and compile-time, depending on the inputs.</p><p>For the C++ guys reading, C++0x has grown a <code>constexpr</code> keyword that, while looking superficially similar, it is a lot less powerful, since it can only be used on very simple functions (basically, one-liners). In fact, the keyword is meant to be use while declaring a function, and not at the call-site, so it has to apply only on small functions which can be proved to always yield a constant value.</p><p>In the next weeks, I will post a more complete real-world example of the power of CTFE from <a
href="http://www.bertos.org">BeRTOS</a>. Stay tuned!</p> ]]></content:encoded> <wfw:commentRss>http://giovanni.bajo.it/2010/05/compile-time-function-execution-in-d/feed/</wfw:commentRss> <slash:comments>88</slash:comments> <feedburner:origLink>http://giovanni.bajo.it/2010/05/compile-time-function-execution-in-d/</feedburner:origLink></item> <item><title>Grey on black: combining greylisting with blacklists</title><link>http://feedproxy.google.com/~r/giovannibajo/~3/1-7Ekbq3lWs/</link> <comments>http://giovanni.bajo.it/2010/04/grey-on-black-combining-greylisting-with-blacklists/#comments</comments> <pubDate>Tue, 06 Apr 2010 21:22:01 +0000</pubDate> <dc:creator>Giovanni Bajo</dc:creator> <category><![CDATA[BOFH]]></category> <category><![CDATA[DNSBL]]></category> <category><![CDATA[greylisting]]></category> <category><![CDATA[RBL]]></category> <category><![CDATA[SpamAssassin]]></category> <guid isPermaLink="false">http://giovanni.bajo.it/?p=109</guid> <description><![CDATA[In the never-ending spam-fighting war, many different technologies are implemented to prevent users from wadingthrough a daily batch of pill discounts and lottery winnings. Two of them, which are both very effective, are greylists and RBLs (aka blacklists); they both operate at the SMTP level (even before the mail body is transferred). Usually, these technologies [...]]]></description> <content:encoded><![CDATA[<p>In the never-ending spam-fighting war, many different technologies are implemented to prevent users from wadingthrough a daily batch of pill discounts and lottery winnings. Two of them, which are both very effective, are <em>greylists</em> and <em>RBLs</em> (aka blacklists); they both operate at the SMTP level (even before the mail body is transferred). Usually, these technologies are deployed separately, but even better results can be achieved when combined. How? Follow me.</p><h2>Greylisting</h2><p><a
class="zem_slink" title="Greylisting" rel="wikipedia" href="http://en.wikipedia.org/wiki/Greylisting">Greylisting</a> is a very common anti-spam technology: whenever a potential spam arrives, our <a
class="zem_slink" title="Message transfer agent" rel="wikipedia" href="http://en.wikipedia.org/wiki/Message_transfer_agent">MTA</a> replies with a temporary failure code (SMTP code: <code>4xx</code>), and writes down the IP address of the sender MTA (the server that is sending the e-mail) in a DB; when facing a 4xx code, any RFC-compliant MTA will simply try to deliver the same e-mail again after a while, at which point our MTA will let the mail through (because the IP address is already present in the DB). On the other hand, most spammers do not follow the RFC (how&nbsp;bizarre): they would not bother retrying and would simply move on; this is because each retry is a memory/CPU cost for them (they need to note down the server which is temporary failing, keep a queue around, etc.); after all, if it was a <em>real</em> temporary failure, that server might still be failing for a few hours, and it would be a real waste of bandwidth to try again, just to stop at the same temporary failure.</p><p>What it is interesting about greylisting is that such a simple idea can achieve impressive results. If you google around, you will find enthusiastic people reporting at least 80% of spam blocked, with few of them citing numbers as high as 95%. All modern mail servers support greylisting either out of the box or through a plugin. For the famous open source server Postfix, the <a
href="http://postgrey.schweikert.ch/" target="_blank">postgrey</a> extension is easily available and widely deployed.</p><p>There obviously are some cons to greylisting:</p><ul><li>Since normally greylisting is applied to all incoming e-mails, regular e-mail delivery is delayed. In fact, a RFC-compliant MTA will eventually retry, but how soon is an implementation detail. It might be one minute or one day (even though most servers usually retry in a few minutes). Obviously, mature greylisting implementations will keep the IP address of a good server in the DB for a long time (eg: one day or one week), effectively whitelisting it; so, the delay only happens on one e-mail per server. Most people don&#8217;t see this delaying as a problem, but I do: I believe that a <strong>good anti-spam pipeline should never impact regular e-mails whatsover</strong>. I don&#8217;t want a very important e-mail to be delayed for an unknown period of time because I cannot fight spam in other ways.</li><li>Some servers could be RFC-compliant and still fail to delivery e-mail to a server using greylisting. How &amp; why? Simple: think of &nbsp;large shop like gmail. They have dozens of SMTP servers. When a server tries to deliver an e-mail and it receives the 4xx code back, can you be absolutely sure that <em>the same<strong> </strong><span
style="font-style: normal">SMTP server will retry later? What if they have an unique queue of e-mail, and another server picks the same e-mail up and retries? This other server will </span><span
style="font-style: normal">also </span><span
style="font-style: normal">be greylisted. &nbsp;Can you see the problem now? I won&#8217;t enter details, but many greylist implementations suggest using manually-compiled (and maintained) whitelists. Uh-oh. Less funny now.</span></em></li></ul><h2 style="font-size: 1.5em">RBLs</h2><p>RBLs are blacklists of servers/zombies that deliver spam, listed by IP address, and queried through the DNS protocol. In short, if you want to ask <code>super-blacklist.example</code> if <code>1.2.3.4</code> is listed there, it is sufficient to try to resolve the hostname <code>4.3.2.1.super-blacklist.example</code>; if it resolves to an IP address (any IP address, it doesn&#8217;t matter), it&#8217;s listed; otherwise it is not listed.</p><p>Blacklists can be deployed within a mail server at different levels. For instance, you can use them in a score system to increase the probability for a mail to be treated as spam. Or you can use them to simply reject the e-mail at the SMTP level, even before the body is transferred (since everything you need to query the blacklist is the MTA IP address, and you have this information as soon as the TCP connection is initiated to your MTA).</p><p>Of course, not all blacklists are equal. If you rely on them for rejecting your e-mails, you better be sure that the blacklist is well-maintained, has a good reputation and that you are using it <em>correctly</em>. For instance, if we analyze the famous <a
href="http://www.spamhaus.org/zen/">Spamhaus Zen blacklist</a>, we find out that it is made of three different blacklists, one of which is the <a
href="http://www.spamhaus.org/pbl/">PBL</a>. The PBL is (quoting) &#8220;a DNSBL database of end-user IP address ranges which should not be delivering unauthenticated SMTP email to any Internet mail server except those provided for specifically by an ISP for that customer&#8217;s use&#8221;. Rephrasing: it is a list of all dynamic IP ranges in the world; all home ADSL users are listed here. That&#8217;s it. There is no connection with spam altogether, it is <strong>just a list of dynamic IP addresses</strong>. Now, while most dynamic IP addresses will never deliver an e-mail directly to your server (and instead go through their ISP&#8217;s MTA), are you ready to block these hosts at the SMTP level? What if someone (one in ten thousands) runs a legitimate MTA server on their ADSL line? They are not violating any RFC, so why should you blacklist them? In fact, SpamHaus itself suggests <em>against</em> using this list for outright rejection of e-mail.</p><p>So, is there anything we can do with PBL at the SMTP level, which is in the middle between outright rejection and a SpamAssassin-like score system? Yes we can: we can greylist!</p><h2 style="font-size: 1.5em">Greylisting &amp; RBLs together</h2><p>There are many different anti-spam technologies that can be done at the SMTP level, by looking at the so-called SMTP envelope (the SMTP conversation that happens before the actual e-mail is delivered). For instance, some people configure their servers so that they will <a
href="http://www.postfix.org/postconf.5.html#smtpd_helo_restrictions">reject e-mails based on the HELO command</a>, eg: if its argument is not a FQDN hostname, as required by the RFC. Others (including myself) consider an outright rejection too risky, because there might be misconfigured MTAs out there and it&#8217;s not their users fault (nor <em>your</em> users fault!). Traditionally, you could either choose to implement one of these enforcements or not. But there is an alternative: greylisting.</p><p>Let&#8217;s get back to the PBL example. Using PBL for filtering e-mails is attracting:</p><ul><li>Almost all regular e-mail will arrive from a non dynamic consumer IP address.</li><li>Almost all e-mails that you get from dynamic consumer IP addresses will be spam (basically, zombie Windows PCs running malwares that deliver spam).</li></ul><p>But <em>almost is not always</em>, and I don&#8217;t want to risk rejecting any regular e-mail, even if it happens once a year. I don&#8217;t want to impact regular e-mails whatsover. So, you can do like me and just <strong>greylist entries from IPs that are in the PBL</strong>. The worst thing that can happen is that those e-mails are slighly delayed, but given the unlikeness of such event, it is probably an acceptable compromise.</p><p>If you think of it, this is just a smarter alternative way of deploying greylisting. Instead of greylisting each and every e-mail and causing delays to all your users, you greylist only those e-mails that are <em>very dark grey but not really black</em>. And if you google around for SMTP anti-spam tricks, you can find many of them which follows in this <em>dark grey</em> area: they&#8217;re <em>so dark</em> that some mail admins reject e-mail altogether, but they&#8217;re not fully <em>black</em>, so other more conservative admins just punt. Don&#8217;t punt: greylist!</p><p>This is why I greylist all e-mails from IPs in the PBL. And <strong>I also greylist all e-mails whose HELO string is not a FQDN</strong>, so that the occasionally misconfigured server is just delayed a little while, but not rejected, and I still block lots of spam very early in the pipeline.</p><h2>And now, give me the code!</h2><p>As much as I find this kind of greylisting deployment so sexy, it is depriving to see that there is basically no support around in mail servers for such a setup. The abandonware spftware called <a
href="http://david.chanial.com/gl4rbl/">gl4rbl</a> is the best I could find around, and we succesfully deployed an enhanced version of it with qmail in <a
href="www.develer.com">Develer</a> for a long time.</p><p>When we switched to Postfix, we had to reimplement this, because the existing postgrey plugin does not allow conditional greylisting.</p><p>In the next days, we will post our own <a
href="http://www.postfix.org/SMTPD_POLICY_README.html">Postfix Policy Server</a> that implements greylisting on RBL. It is a very compact Python script, less than 200 lines including comments. We also have some ideas on making it more generically useful so that it can be used to implement other kinds of conditional greylisting, and I will discuss them in the same post.</p><p>In case you did not know, you can avoid missing it by subscribing to the <a
href="http://giovanni.bajo.it/feed">RSS feed</a>.</p> ]]></content:encoded> <wfw:commentRss>http://giovanni.bajo.it/2010/04/grey-on-black-combining-greylisting-with-blacklists/feed/</wfw:commentRss> <slash:comments>0</slash:comments> <feedburner:origLink>http://giovanni.bajo.it/2010/04/grey-on-black-combining-greylisting-with-blacklists/</feedburner:origLink></item> <item><title>C++ and copy-on-write data structures</title><link>http://feedproxy.google.com/~r/giovannibajo/~3/z4lC8ulb_hc/</link> <comments>http://giovanni.bajo.it/2010/03/c-and-copy-on-write-data-structures/#comments</comments> <pubDate>Thu, 25 Mar 2010 01:49:20 +0000</pubDate> <dc:creator>Giovanni Bajo</dc:creator> <category><![CDATA[C++]]></category> <category><![CDATA[api]]></category> <category><![CDATA[cow]]></category> <category><![CDATA[optimization]]></category> <category><![CDATA[qt]]></category> <guid isPermaLink="false">http://giovanni.bajo.it/?p=81</guid> <description><![CDATA[I generally like C++ and the way C++ feels, but there is one specific quirk which is caused by STL designers&#8217; choice of data structures which cannot be implemented with a copy-on-write (aka COW) semantic. Curious about which quirk? Keep on reading! COW might look like an implementation detail, but it can really change the [...]]]></description> <content:encoded><![CDATA[<p>I generally like C++ and the way C++ feels, but there is one specific quirk which is caused by STL designers&#8217; choice of data structures which cannot be implemented with a <a
class="zem_slink" title="Copy-on-write" rel="wikipedia" href="http://en.wikipedia.org/wiki/Copy-on-write">copy-on-write</a> (aka COW) semantic. Curious about which quirk? Keep on reading! COW might look like an implementation detail, but <strong>it can really change the way you design your APIs</strong>.</p><h2>What is a COW?</h2><div
class="zemanta-img" style="margin: 1em"><div
class="wp-caption alignright" style="width: 250px"><br
/> <img
src="http://giovanni.bajo.it/files/2010/03/300px-Friesian-Holstein.jpg" alt="300px Friesian Holstein C++ and copy on write data structures" width="240" height="168" title="C++ and copy on write data structures" /><br
/><p
class="wp-caption-text">COW can sound nasty at first, but it is a good C++ citizen</p></div></div><p>An object with COW semantic (sometimes also called &#8220;implicit sharing&#8221;) has an interesting property: whenever you copy it, it does not really do a &#8220;deep copy&#8221; but simply shares the underlying data between the original object and its copy, and increments a reference counter to track this. <strong>Only at the moment that any of the copies is further modified (written to), the deep copy is really performed</strong>.</p><p>Now, it is important to realize that the overhead imposed by the COW semantic is usually very tiny: the object just needs to bookkeep a counter at every copy, and check it before any modification. The latter is even lighter than it sounds because the compiler can strip off many checks thanks to constant propagations.</p><p>But what are the advantages of using COW? The first and foremost advantage is the possibility to have a <em>clean API with respect to return values</em>.</p><h2 style="font-size: 1.5em">Returning a COW</h2><p>If you have programmed C++ long enough, you know that all C++ functions that should return a complex data structure (anything which is larger than a small structure, like a container, a memory buffer, and so on) will instead accept a reference or a pointer to an object that the caller must provide and that will be &#8220;filled in&#8221; as return value. <strong>This is absolutely awful</strong>. If you don&#8217;t see it as awful, it&#8217;s about time you take a break from C++ and switch to other languages for a while. For instance:</p><pre class="brush: cpp; title: ; notranslate">
std::vector&lt;Node&gt; nodes;
cur_node.children(nodes);
printf(&quot;Number of children: %u\n&quot;, nodes.size());
</pre><p>The <code>Node::children()</code> function, in any other language, would have returned a vector of Nodes. But idiomatic C++ says that you should not impose the overhead of doing an additional copy of the vector onto your user, and thus ask him to pass you the container to fill. This is really unreadable because it breaks the common syntax of functions by intermixing return values to arguments. Moreover, it imposes additional troubles to the API; eg: what happens if you call <code>Node::children()</code> with a non-empty vector? Will the function just append its data to the vector? Or would it clear it at the beginning?</p><p>Now, think of a world where <code>std::vector</code> is a COW data structure. All these problems suddenly disappear because returning the vector <strong>would not cause a copy anymore</strong>. Bam, problem solved! And when you have a clear API, where return values really are return values, you get all the benefits from it:</p><pre class="brush: cpp; title: ; notranslate">
printf(&quot;Number of children: %u\n&quot;, cur_node.children().size());
</pre><p>Three lines merged into one. No more temporary named variable. And if <code>Node</code> happens to internally store its children as a <code>vector</code>, this code is <strong>even faster</strong> than the non-COW version, because there is no copy performed <em>at all</em>: you get the same speed as if <code>children()</code> returned a reference to the internal vector, even though it does not.</p><h2 style="font-size: 1.5em">COWs to the rescue</h2><p>Other advantages of COW data structures:</p><ol><li>You can nest COW data structures without overhead. Think of a <code>vector&lt;vector&lt;int&gt;&gt;</code>. Without COW, whenever the outer data structure resizes, the inner data structures will all be reallocated and copied. With COW, there is no copy at all: the internal data structures are basically <em>moved</em> over in constant time.</li><li>You can stop abusing const references everywhere. In C++, most functions will receive arguments by const references instead of simple values. Again, this is to save the overhead of a copy, and again this is almost useless with COW data structures. Const references are also nasty because they represent a first leak of <a
class="zem_slink" title="Const-correctness" rel="wikipedia" href="http://en.wikipedia.org/wiki/Const-correctness">const-correctness</a> in your code (I also hate the whole const-correctness in C++, but this is for another day; if you like it, you can still make your functions accept const values if you feel too).</li></ol><h2 style="font-size: 1.5em">Norwegian COWs</h2><p>Nokia&#8217;s Qt has designed all objects and data structures with COW. They even made them reentrant (for multithreading purposes) by using atomic reference counting through native CPU instructions. Finally, they expose the guts of COW through a simple QSharedData template that you can use to reimplement your own COW objects.</p><p>I could not agree with Qt&#8217;s designers more.</p><p>This is another example of how COW can positively influence API design:</p><pre class="brush: cpp; highlight: [6]; title: ; notranslate">
QByteArray fileHash(QString fn)
{
   QFile f(fn);
   QCryptographicHash h(QCryptographicHash::Sha1);
   while (!f.atEnd())
        h.addData(f.read(16*1024));
   return h.result();
}
</pre><p>See how I can simply pass the buffer returned by <code>QFile::read()</code> to <code>addData()</code> without having to worry about memory copying.</p><h2 style="font-size: 1.5em">Counting COWs</h2><p>So, the next time you design an API which asks for a return value among the arguments, think of a C++ world where COW is everywhere. And dream on.</p> ]]></content:encoded> <wfw:commentRss>http://giovanni.bajo.it/2010/03/c-and-copy-on-write-data-structures/feed/</wfw:commentRss> <slash:comments>16</slash:comments> <feedburner:origLink>http://giovanni.bajo.it/2010/03/c-and-copy-on-write-data-structures/</feedburner:origLink></item> <item><title>ctypes support in PyInstaller</title><link>http://feedproxy.google.com/~r/giovannibajo/~3/-eiMqZMOwjM/</link> <comments>http://giovanni.bajo.it/2010/03/ctypes-support-in-pyinstaller/#comments</comments> <pubDate>Fri, 19 Mar 2010 02:58:49 +0000</pubDate> <dc:creator>Giovanni Bajo</dc:creator> <category><![CDATA[Python]]></category> <category><![CDATA[ctypes]]></category> <category><![CDATA[py2app]]></category> <category><![CDATA[py2exe]]></category> <category><![CDATA[pyinstaller]]></category> <guid isPermaLink="false">http://giovanni.bajo.it/?p=64</guid> <description><![CDATA[Do you want to see how PyInstaller magically handles usages of ctypes, whilst py2exe/py2app would simply fail? Follow me in this in-depth tutorial! PyInstaller comprises many advanced features to make your life easier when packaging a Python program. In my opinion, its main advantage over py2exe/py2app (besides the obvious fact that it is a multi-platform [...]]]></description> <content:encoded><![CDATA[<p>Do you want to see how <a
href="http://www.pyinstaller.org">PyInstaller</a> magically handles usages of ctypes, whilst py2exe/py2app would simply fail? Follow me in this in-depth tutorial!</p><p>PyInstaller comprises many advanced features to make your life easier when packaging a Python program. In my opinion, its main advantage over py2exe/py2app (besides the obvious fact that it is a multi-platform tool) is that it <strong>includes many workarounds and quirks necessary to make applications work</strong> after packaging. What py2exe &#8220;fixes&#8221; by adding a page to their wiki, PyInstaller implements it properly, for everybody&#8217;s pleasure.</p><p>A very good example of this philosophy is the support for ctypes. I am sure most of you are familiar with this little gem, which allows a programmer to dynamicall bind to a shared library and invokes its functions without writing a binding. For instance, let&#8217;s say you have some existing and very complicated C code like the following:</p><pre class="brush: cpp; title: ; notranslate">
/* foobar.c */
#include &lt;stdio.h&gt;
void fooize(int a)
{
   printf(&quot;Fooized %d!\n&quot;, a);
}
</pre><p>Which is compiled to a shared library:</p><pre class="brush: plain; title: ; notranslate">
$ gcc -shared -fPIC -o foobar.so foobar.c
</pre><p>Now, you can access this from Python by simply doing something like:</p><pre class="brush: python; title: ; notranslate">
#!/usr/bin/env python
# example.py
from ctypes import *
foobar = CDLL(&quot;foobar.so&quot;)
foobar.fooize(4)
</pre><p>which obviously produce the following output when executed:</p><pre class="brush: plain; title: ; notranslate">
$ LD_LIBRARY_PATH=. python example.py
Fooized 4!
</pre><p>ctypes handled all the magic for me: it called <code>dlopen(3)</code> to access the dynamic library, it called <code>dlsym(3)</code> to access the symbol <code>fooize</code>, and it even handled marshalling of arguments. This is great.</p><p>But what happens if I want to package this application? Well, <strong>with py2exe/py2app, it wouldn&#8217;t work</strong>. In fact, they would totally miss the dependency of the example program on the <code>foobar.so</code> file. This obviously happens because they support only the simplest forms of dependency: a plain Python <code>import</code> statement or dependencies between shared librarys (eg: <code>foo.dll</code> that depends on <code>bar.dll</code>).</p><p>Now, watch PyInstaller in action:</p><pre class="brush: plain; highlight: [1,5,24,27]; title: ; notranslate">
$ python ~/src/pyinstaller/Makespec.py --onefile example.py
wrote /tmp/ct/example.spec
now run Build.py to build the executable
$ python ~/src/pyinstaller/Build.py example.spec
checking Analysis
building Analysis because outAnalysis0.toc non existent
running Analysis outAnalysis0.toc
Analyzing: /home/rasky/src/pyinstaller/support/_mountzlib.py
Analyzing: /home/rasky/src/pyinstaller/support/useUnicode.py
Analyzing: example.py
Warnings written to /tmp/ct/warnexample.txt
checking PYZ
rebuilding outPYZ1.toc because outPYZ1.pyz is missing
building PYZ outPYZ1.toc
checking PKG
rebuilding outPKG3.toc because outPKG3.pkg is missing
building PKG outPKG3.pkg
checking EXE
rebuilding outEXE2.toc because example missing
building EXE from outEXE2.toc
Appending archive to EXE /tmp/ct/dist/example
$ ./dist/example
Fooized 4!
$ ls -la ./dist
totale 2927
drwxr-xr-x 2 rasky rasky      72 2010-03-19 03:33 .
drwxr-xr-x 4 rasky rasky     344 2010-03-19 03:33 ..
-rwxr-xr-x 1 rasky rasky 2991835 2010-03-19 03:33 example
</pre><p>It simply worked. What PyInstaller behind the hood is:</p><ul><li>Scan the bytecode of example.py, looking for usages of ctypes</li><li>Collect all the shared libraries used by ctypes calls. Notice that it can&#8217;t infer <em>all</em> filenames potentially used by the source code (how would you handle the case where the library named is specified interactively by the user?), but it handles all simple cases (see <a
href="http://www.pyinstaller.org/wiki/CtypesDependencySupport" target="_blank">this page</a> for more details).</li><li>Resolve pathnames using the OS&#8217; library search path.</li></ul><p>Notice how <code>foobar.so</code> was also included within the single-file package produced by PyInstaller. And how can this possibly work? Let strace show us the trick:</p><pre class="brush: plain; highlight: [1]; title: ; notranslate">
$ strace -ff ./dist/example 2&gt;&amp;1 | grep foobar.so
stat(&quot;/tmp/_MEIDG0H9t/foobar.so&quot;, 0x7fff92f09480) = -1 ENOENT (No such file or directory)
open(&quot;/tmp/_MEIDG0H9t/foobar.so&quot;, O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
[pid 13942] open(&quot;/tmp/_MEIDG0H9t/foobar.so&quot;, O_RDONLY) = 6
stat(&quot;/tmp/_MEIDG0H9t/foobar.so&quot;, {st_mode=S_IFREG|0700, st_size=7985, ...}) = 0
unlink(&quot;/tmp/_MEIDG0H9t/foobar.so&quot;)     = 0
</pre><p>What happens at runtime is that <code>foobar.so</code> is extracted in a temporary directory, so that ctypes can find it.</p><p>So: PyInstaller is designed to <b>do everything required to package an application</b>. If you like this philosophy and you are tired of chasing workarounds in wikis, you should give it a try!</p> ]]></content:encoded> <wfw:commentRss>http://giovanni.bajo.it/2010/03/ctypes-support-in-pyinstaller/feed/</wfw:commentRss> <slash:comments>0</slash:comments> <feedburner:origLink>http://giovanni.bajo.it/2010/03/ctypes-support-in-pyinstaller/</feedburner:origLink></item> <item><title>IE9: new features and GPU acceleration</title><link>http://feedproxy.google.com/~r/giovannibajo/~3/cdHnPd4-B5U/</link> <comments>http://giovanni.bajo.it/2010/03/ie9-new-features-and-gpu-acceleration/#comments</comments> <pubDate>Tue, 16 Mar 2010 19:36:30 +0000</pubDate> <dc:creator>Giovanni Bajo</dc:creator> <category><![CDATA[Web]]></category> <category><![CDATA[Canvas]]></category> <category><![CDATA[GPU]]></category> <category><![CDATA[H264]]></category> <category><![CDATA[HTML5]]></category> <category><![CDATA[IE9]]></category> <guid isPermaLink="false">http://giovanni.bajo.it/?p=43</guid> <description><![CDATA[So there is much chatting going on about the new IE9 and the fact that this time Microsoft tried to adhere to web standards much more than it used to do in the past. Namely: HTML5 for videos (H.264), which means that we will be holding a Adobe &#8220;Flash For Video&#8221; funeral pretty soon now. [...]]]></description> <content:encoded><![CDATA[<p>So there is much chatting going on about the <a
href="http://blogs.msdn.com/ie/archive/2010/03/16/html5-hardware-accelerated-first-ie9-platform-preview-available-for-developers.aspx" target="_blank">new IE9</a> and the fact that this time Microsoft tried to adhere to web standards much more than it used to do in the past. Namely:</p><ul><li><strong>HTML5 for videos (H.264)</strong>, which means that we will be holding a Adobe &#8220;Flash For Video&#8221; funeral pretty soon now. I notice that most geeks don&#8217;t care about usages of Flashes besides videos, but there <em>are</em> still many of them (both filling gaps in HTML standard, like in the case of multi-file uploads, and for animations/interactions/games where HTML5 Canvas+JS is still too slow).</li><li><strong>Fast JavaScript engine</strong>: aka JIT. Given how slow IE8 is in benchmarks, this is very good news for development of rich web applications on the Internet.</li><li><strong>CSS3, DOM, SVG</strong>: they are really supporting a lots of new features available in nowadays standards. I think the support for SVG will make many people wonder if Christmas came for Easter this year. And yes, it is indeed coming.</li></ul><p>A very interesting news is that Microsoft is announcing <strong>GPU Acceleration</strong> for IE9. Now, we are a little short on details (and I&#8217;m not at MIX10 where IE9 is being presented) but I still see legends floating around.</p><div
class="zemanta-img" style="margin:1em"><div
class="wp-caption alignleft" style="width: 220px"><img
class=" " src="http://giovanni.bajo.it/files/2010/03/300px-Microsoft_sign_closeup.jpg" alt="300px Microsoft sign closeup IE9: new features and GPU acceleration" width="210" height="158" title="IE9: new features and GPU acceleration" /><p
class="wp-caption-text">Microsoft: not so much graved in stone anymore, it seems  (Image via Wikipedia)</p></div></div><p>First, there is absolutely <strong>no way</strong> that IE9 (or any browser, for that matter) could offload to the GPU the whole Javascript engine (and not even the compilation step required by the JIT). Modern GPUs are massively paralleled calculators. They are very good at executing a 100-lines mathematical function at the same time in 4096 cores, which is exactly what you need to do graphics (2D, 3D, whatever) and heavy calculations like physics, simulations, and so on; but they can&#8217;t really go beyond this. Moreover, there is a measurable overhead in starting each GPU task (use DMA to fill internal memory, upload the task) and fetching results (DMA the results off the video memory into main RAM).  So there is basically no way to use such a horse for running custom Javascript code.</p><p>In fact, what Microsoft is saying is that it will <strong>recompile Javascript in a different thread/process</strong>, offloading it to a different core of the CPU. This is probably a step forward in the right direction. IE8 already had a multi-process architecture much like Chrome does, and offloading the JIT itself to another process looks like a very good idea. I wonder what Chrome developers think of it.</p><p>So, what about GPU acceleration? It will be used in these main areas:</p><ul><li><strong>Video decoding:</strong> Modern video cards have hardware support for H.264 decompression. This is already supported by Windows drivers, and if you use a Windows system to watch a H.264 HD video, you will see that the CPU usage is indeed pretty low. Alas, since H.264 is a <strong>patent encumbered</strong> format, this acceleration is not supported in Linux and probably will never be. The most Linux can achieve is YUV HW compositing, which is nothing like offloading the whole H.264 decoding to hardware.</li><li><strong>HTML5 Canvas</strong>: through the new <a
class="zem_slink" title="Direct2D" rel="wikipedia" href="http://en.wikipedia.org/wiki/Direct2D">Direct2D</a> API, it will be easier to leverage hardware acceleration within Canvas. Notice that Direct2D, to the best of my understanding, is just a better API (both at the user-space level and driver level) to access the existing programmable graphics hardware. It is not different from what Cairo or Qt&#8217;s QPainter already offer, but it is fully accelerated. If you use Firefox on Linux, you get cairo -&gt; xlib -&gt; xrender, and then you&#8217;re doomed.</li><li><strong>Font rendering</strong>: rendering text is a big part in a browser, and the new <a
class="zem_slink" title="DirectWrite" rel="wikipedia" href="http://en.wikipedia.org/wiki/DirectWrite">DirectWrite</a> API will take care of hardware acceleration for fonts. DirectWrite is a whole font-rendering stack that has been designed with hardware acceleration in mind. It is like Freetype of the next millennium. I believe there is absolutely no equivalent for DirectWrite in the OSS world.</li></ul><p>I plan to do a quick benchmark of IE9 this week. I will specifically be focusing on HTML5 canvas rendering, since that is the part where it is meant to shine, even compared to Chrome.</p><p>Stay tuned!</p><div
class="zemanta-pixie" style="margin-top: 10px;height: 15px"><img
class="zemanta-pixie-img" style="border: none;float: right" src="http://img.zemanta.com/pixy.gif?x-id=3303aca8-299e-43c2-a32a-f3857d578a38" alt=" IE9: new features and GPU acceleration"  title="IE9: new features and GPU acceleration" /></div> ]]></content:encoded> <wfw:commentRss>http://giovanni.bajo.it/2010/03/ie9-new-features-and-gpu-acceleration/feed/</wfw:commentRss> <slash:comments>1</slash:comments> <feedburner:origLink>http://giovanni.bajo.it/2010/03/ie9-new-features-and-gpu-acceleration/</feedburner:origLink></item> <item><title>I think therefore I am</title><link>http://feedproxy.google.com/~r/giovannibajo/~3/DsTxQVWpYLs/</link> <comments>http://giovanni.bajo.it/2010/03/i-think-therefore-i-am/#comments</comments> <pubDate>Tue, 16 Mar 2010 01:53:13 +0000</pubDate> <dc:creator>Giovanni Bajo</dc:creator> <category><![CDATA[Meta]]></category> <guid isPermaLink="false">http://giovanni.bajo.it/?p=23</guid> <description><![CDATA[This is my first attempt in many years to maintain a blog. Many people have encouraged me to open a blog in the past few years, but my previous attempts were not very successful. So let&#8217;s see how this works out. I&#8217;m not good at introductions (I do skip them myself, while starting books), so let&#8217;s [...]]]></description> <content:encoded><![CDATA[<div
class="zemanta-img" style="margin: 1em;float:left"> <img
src="http://farm4.static.flickr.com/3150/2413495787_d5a78a67ff_m.jpg" alt="2413495787 d5a78a67ff m I think therefore I am" width="144" height="125" title="I think therefore I am" /></div></div><p>This is my first attempt in many years to maintain a blog. Many people have encouraged me to open a blog in the past few years, but my previous attempts were not very successful. So let&#8217;s see how this works out. I&#8217;m not good at introductions (I do skip them myself, while starting books), so let&#8217;s call it a <del>day</del> post.</p> ]]></content:encoded> <wfw:commentRss>http://giovanni.bajo.it/2010/03/i-think-therefore-i-am/feed/</wfw:commentRss> <slash:comments>0</slash:comments> <feedburner:origLink>http://giovanni.bajo.it/2010/03/i-think-therefore-i-am/</feedburner:origLink></item> </channel> </rss>

