<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <title>Faruk Akgul</title>
    <link>http://faruk.akgul.org/blog</link>
    <description>My rants mostly on programming.</description>
    <pubDate>Mon, 12 Aug 2013 03:52:26 GMT</pubDate>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <item>
      <title>Python Web Frameworks Benchmark</title>
      <link>http://faruk.akgul.org/blog/python-web-frameworks-benchmark</link>
      <pubDate>Mon, 01 Jul 2013 00:00:00 PDT</pubDate>
      <guid>http://faruk.akgul.org/blog/python-web-frameworks-benchmark</guid>
      <description>Python Web Frameworks Benchmark</description>
      <content:encoded><![CDATA[<p><a href="http://falconframework.org/">Falcon</a> is a Python web framework developed by Rackspace. It also has a benchmarking tool to test the framework against other frameworks.</p>
<p>Out of curiosity, I wanted to see how other frameworks perform in this benchmark.</p>
<p>Frameworks of this benchmark:</p>
<ul>
<li><a href="http://bottlepy.org/docs/dev/">bottle</a></li>
<li><a href="http://www.cherrypy.org/">CherryPy</a></li>
<li><a href="https://www.djangoproject.com/">Django</a></li>
<li><a href="http://falconframework.org/">Falcon</a></li>
<li><a href="http://flask.pocoo.org/">Flask</a></li>
<li><a href="http://pecanpy.org/">Pecan</a>: This was already part of Falcon's default test. I had never heard of it.</li>
<li><a href="http://webpy.org/">web.py</a></li>
<li><a href="http://werkzeug.pocoo.org/">Werkzeug</a>: More like a utility library.</li>
<li><a href="http://pythonhosted.org/wheezy.web/">wheezy.web</a></li>
</ul>
<h1>Iteration 10</h1>
<p>In this test, benchmark is iterated 10 times. The result:</p>
<pre><code>1. wheezy.web........39,681 req/sec or  25 μs/req (28x)
2. Falcon............23,981 req/sec or  42 μs/req (17x)
3. Bottle.............9,479 req/sec or 106 μs/req (7x)
4. webpy..............6,246 req/sec or 160 μs/req (4x)
5. Werkzeug...........5,949 req/sec or 168 μs/req (4x)
6. Pyramid............3,573 req/sec or 280 μs/req (3x)
7. Flask..............3,079 req/sec or 325 μs/req (2x)
8. Pecan..............2,640 req/sec or 379 μs/req (2x)
9. django.............1,770 req/sec or 565 μs/req (1x)
10. CherryPy..........1,410 req/sec or 709 μs/req (1x)
</code></pre>
<p>wheezy.web looks like the winner.</p>
<h1>Iteration 100</h1>
<p>In this test, benchmark is iterated 100 times. wheezy.web is the winner of this iteration as well.</p>
<pre><code>1. wheezy.web........49,022 req/sec or  20 μs/req (35x)
2. Falcon............29,378 req/sec or  34 μs/req (21x)
3. Bottle............11,685 req/sec or  86 μs/req (8x)
4. webpy..............6,763 req/sec or 148 μs/req (5x)
5. Werkzeug...........6,533 req/sec or 153 μs/req (5x)
6. pyramid............3,932 req/sec or 254 μs/req (3x)
7. Flask..............3,334 req/sec or 300 μs/req (2x)
8. Pecan..............2,813 req/sec or 355 μs/req (2x)
9. django.............1,867 req/sec or 536 μs/req (1x)
10. CherryPy..........1,405 req/sec or 712 μs/req (1x)
</code></pre>
<h1>Iteration 1,000</h1>
<p>Benchmark is iterated 1,000 times. wheezy.web is the winner of this iteration.</p>
<pre><code>1. wheezy.web........51,343 req/sec or  19 μs/req (37x)
2. Falcon............29,742 req/sec or  34 μs/req (21x)
3. Bottle............12,030 req/sec or  83 μs/req (9x)
4. webpy..............6,795 req/sec or 147 μs/req (5x)
5. Werkzeug...........6,759 req/sec or 148 μs/req (5x)
6. pyramid............4,061 req/sec or 246 μs/req (3x)
7. Flask..............3,231 req/sec or 310 μs/req (2x)
8. Pecan..............2,839 req/sec or 352 μs/req (2x)
9. django.............1,875 req/sec or 533 μs/req (1x)
10. CherryPy..........1,386 req/sec or 722 μs/req (1x)
</code></pre>
<h1>Iteration 10,000</h1>
<p>Benchmark is iterated 10,000 times. wheezy.web is the winner of this iteration.</p>
<pre><code>1. wheezy.web........52,259 req/sec or  19 μs/req (45x)
2. Falcon............30,345 req/sec or  33 μs/req (26x)
3. Bottle............12,013 req/sec or  83 μs/req (10x)
4. webpy..............6,831 req/sec or 146 μs/req (6x)
5. Werkzeug...........6,723 req/sec or 149 μs/req (6x)
6. pyramid............4,061 req/sec or 246 μs/req (3x)
7. Flask..............3,263 req/sec or 306 μs/req (3x)
8. Pecan..............2,825 req/sec or 354 μs/req (2x)
9. django.............1,862 req/sec or 537 μs/req (2x)
10. CherryPy..........1,173 req/sec or 852 μs/req (1x)
</code></pre>
<h1>Iteration 50,000</h1>
<p>Benchmark is iterated 50,000 times. wheezy.web is the winner of this iteration.</p>
<pre><code>1. wheezy.web........52,245 req/sec or  19 μs/req (48x)
2. Falcon............30,195 req/sec or  33 μs/req (28x)
3. Bottle............11,977 req/sec or  83 μs/req (11x)
4. webpy..............6,950 req/sec or 144 μs/req (6x)
5. Werkzeug...........6,837 req/sec or 146 μs/req (6x)
6. pyramid............4,035 req/sec or 248 μs/req (4x)
7. Flask..............3,300 req/sec or 303 μs/req (3x)
8. Pecan..............2,881 req/sec or 347 μs/req (3x)
9. django.............1,882 req/sec or 531 μs/req (2x)
10. CherryPy..........1,090 req/sec or 917 μs/req (1x)
</code></pre>
<p>I've run Falcon's benchmark including my favorite web frameworks (especially wheezy.web) just to see how they perform. In overall, I'm quite surprised of Flask's performance. I knew wheezy.web was going to perform very well so I wasn't really surprised. Andriy's own <a href="http://mindref.blogspot.com/2012/09/python-fastest-web-framework.html">benchmark</a> shows similar results.</p>
<p>Here's benchmark as a chart.</p>
<script type="text/javascript" src="http://www.google.com/jsapi">
</script>

<script type="text/javascript">google.load('visualization', '1', {packages: ['corechart']});
</script>

<script type="text/javascript">
function drawVisualization() {
var data = google.visualization.arrayToDataTable([
['Framework', 'Iteration - 10', 'Iteration - 100', 'Iteration - 1,000', 'Iteration - 10,000', 'Iteration - 50,000'],
['wheezy.web', 39681, 49022, 51343, 52249, 52245],
['Falcon', 23981, 29378, 29742, 30345, 30195],
['Bottle', 9479, 11685, 12030, 12013, 11977],
['webpy', 6246, 6763, 6795, 6831, 6950],
['Werkzeug', 5949, 6533, 6759, 6723, 6837],
['pyramid', 3573, 3932, 4061, 4061, 4035],
['Flask', 3079, 3334, 3231, 3263, 3300],
['Pecan', 2640, 2813, 2839, 2825, 2881],
['Django', 1770, 1867, 1875, 1862, 1882],
['CherryPy', 1410, 1405, 1386, 1173, 1090],

]);

// Create and draw the visualization.
new google.visualization.ColumnChart(document.getElementById('visualization')).
draw(data,
{title:"Python Framework Benchmark (more is better)",
width:900, height:400,
fontSize: 12,
hAxis: {title: "Framework Name"},
vAxis: {title: "req/sec"}});
}
google.setOnLoadCallback(drawVisualization);
</script><div id="visualization" style="width: 600px; height: 400px;"></div>]]></content:encoded>
    </item>
    <item>
      <title>Flask and web.py meet JVM (or how to use Apache Shiro with them)</title>
      <link>http://faruk.akgul.org/blog/flask-webpy-meet-jvm-jython</link>
      <pubDate>Tue, 08 Jan 2013 00:00:00 PST</pubDate>
      <guid>http://faruk.akgul.org/blog/flask-webpy-meet-jvm-jython</guid>
      <description>Flask and web.py meet JVM (or how to use Apache Shiro with them)</description>
      <content:encoded><![CDATA[<p>This is a demonstration of Flask and web.py with WTForms running on Jython and uses Apache Shiro, Java security framework that performs authentication, authorization, cryptography, and session management.</p>
<p>This is a small proof-of-concept project that mixes Flask, web.py and Apache Shiro. I have tested on Jython 2.7a2.</p>
<p><a href="http://shiro.apache.org/">Apache Shiro</a> is a Java security framework that performs authentication, authorization, cryptography, and session management. You may also be interested in reading <a href="http://www.infoq.com/articles/apache-shiro">this article</a> as well.</p>
<p>This is a demonstration of <a href="http://flask.pocoo.org/">Flask</a> and <a href="http://webpy.org/">web.py</a> with <a href="http://wtforms.simplecodes.com/">WTForms</a> running on <a href="http://www.jython.org/">Jython</a> and uses <a href="http://shiro.apache.org/">Apache Shiro</a>.</p>
<p>Source code is on <a href="https://github.com/faruken/flask-web.py-jvm">GitHub</a> if you're interested.</p>
<p>Related Article:</p>
<ul>
<li><a href="http://faruk.akgul.org/blog/storing-passwords-password-hashing-apache-shiro/">Storing Passwords: Password Hashing</a></li>
</ul>]]></content:encoded>
    </item>
    <item>
      <title>ztree: A New Structure For ZeroMQ's CZMQ: High Level C API</title>
      <link>http://faruk.akgul.org/blog/ztree-new-structure-zeromq-czmq</link>
      <pubDate>Sat, 01 Dec 2012 00:00:00 PST</pubDate>
      <guid>http://faruk.akgul.org/blog/ztree-new-structure-zeromq-czmq</guid>
      <description>ztree: A New Structure For ZeroMQ's CZMQ: High Level C API</description>
      <content:encoded><![CDATA[<p>CZMQ is a high-level C binding for ZeroMQ. Its main purposes are:</p>
<ul>
<li>To hide differences between different versions of ZeroMQ.</li>
<li>To provide a space for development of more sophisticated API semantics.</li>
<li>To wrap ZeroMQ core API in semantics that are natural and lead to shorter and more readable applications.</li>
</ul>
<p>CZMQ provides two different structures out of the box:</p>
<ul>
<li>zlist: A Singly linked list.</li>
<li>zhash: A Hash table.</li>
</ul>
<p>I have implemented a new structure: <strong>ztree</strong>.</p>
<p>ztree is a generic binary search tree implementation. Sample usage:</p>
<div class="pygments_emacs"><pre><span class="kt">int</span> <span class="nf">main</span> <span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="k">const</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span> <span class="p">{</span>

  <span class="n">zctx_t</span><span class="o">*</span> <span class="n">context</span> <span class="o">=</span> <span class="n">zctx_new</span><span class="p">();</span>
  <span class="kt">void</span><span class="o">*</span> <span class="n">socket</span> <span class="o">=</span> <span class="n">zsocket_new</span><span class="p">(</span><span class="n">context</span><span class="p">,</span> <span class="n">ZMQ_REP</span><span class="p">);</span>
  <span class="n">zsocket_bind</span><span class="p">(</span><span class="n">socket</span><span class="p">,</span> <span class="s">&quot;tcp://*:5050&quot;</span><span class="p">);</span>

  <span class="n">printf</span><span class="p">(</span><span class="s">&quot;Starting server...</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span>

  <span class="n">ztree_t</span><span class="o">*</span> <span class="n">tree</span> <span class="o">=</span> <span class="n">ztree_new</span><span class="p">((</span><span class="n">ztree_compare_fn</span><span class="p">)</span> <span class="n">compare</span><span class="p">);</span>
  <span class="n">ztree_insert</span><span class="p">(</span><span class="n">tree</span><span class="p">,</span> <span class="s">&quot;hello&quot;</span><span class="p">);</span>
  <span class="n">ztree_insert</span><span class="p">(</span><span class="n">tree</span><span class="p">,</span> <span class="s">&quot;world&quot;</span><span class="p">);</span>
  <span class="n">ztree_insert</span><span class="p">(</span><span class="n">tree</span><span class="p">,</span> <span class="s">&quot;bonjour&quot;</span><span class="p">);</span>
  <span class="kt">char</span><span class="o">*</span> <span class="n">s</span> <span class="o">=</span> <span class="n">ztree_find_max</span><span class="p">(</span><span class="n">tree</span><span class="p">);</span>
  <span class="k">for</span><span class="p">(;;)</span> <span class="p">{</span>

    <span class="kt">char</span><span class="o">*</span> <span class="n">msg</span> <span class="o">=</span> <span class="n">zstr_recv</span><span class="p">(</span><span class="n">socket</span><span class="p">);</span>

    <span class="kt">char</span><span class="o">*</span> <span class="n">mss</span> <span class="o">=</span> <span class="n">ztree_find_rec</span><span class="p">(</span><span class="n">tree</span><span class="p">,</span> <span class="n">msg</span><span class="p">);</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">&quot;Received: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="n">msg</span><span class="p">);</span>

    <span class="k">if</span><span class="p">(</span><span class="n">mss</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">printf</span><span class="p">(</span><span class="s">&quot;Found: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="n">mss</span><span class="p">);</span>
      <span class="n">zstr_send</span><span class="p">(</span><span class="n">socket</span><span class="p">,</span> <span class="n">s</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="n">printf</span><span class="p">(</span><span class="s">&quot;Not Found: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="n">mss</span><span class="p">);</span>
      <span class="n">zstr_send</span><span class="p">(</span><span class="n">socket</span><span class="p">,</span> <span class="s">&quot;Not Found&quot;</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">free</span><span class="p">(</span><span class="n">msg</span><span class="p">);</span>
  <span class="p">}</span>

  <span class="n">zsocket_destroy</span><span class="p">(</span><span class="n">context</span><span class="p">,</span> <span class="n">socket</span><span class="p">);</span>
  <span class="n">zctx_destroy</span><span class="p">(</span><span class="o">&amp;</span><span class="n">context</span><span class="p">);</span>

  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>

<p>Source code is on <a href="https://github.com/faruken/czmq">github</a> if anyone is interested.</p>]]></content:encoded>
    </item>
    <item>
      <title>Hadoop: Finding The Most Important Words In Documents</title>
      <link>http://faruk.akgul.org/blog/hadoop-finding-most-important-words-documents</link>
      <pubDate>Mon, 23 Jul 2012 00:00:00 PDT</pubDate>
      <guid>http://faruk.akgul.org/blog/hadoop-finding-most-important-words-documents</guid>
      <description>Hadoop: Finding The Most Important Words In Documents</description>
      <content:encoded><![CDATA[<p>This is a follow up post on <a href="http://faruk.akgul.org/blog/finding-most-important-words-documents">Finding the most important words in documents</a>. This time I did setup a Hadoop cluster with only 1 node (Hey! I'm still with my Macbook in a hotel room).</p>
<p>I didn't do much customizations. It's a straightforward Hadoop that comes out of the box. I had my map code and reducer code and ran the code to find out the most important words of some documents I had on my computer since I don't have access to a corpus right now.</p>
<h1>Things to Note</h1>
<p>Hadoop sorts using Quicksort by default. If you want to change it to something else (Heapsort or Mergesort), you need to edit <code>mapred-site.xml</code>. For example;</p>
<div class="pygments_emacs"><pre><span class="nt">&lt;property&gt;</span>
    <span class="nt">&lt;name&gt;</span>map.sort.class<span class="nt">&lt;/name&gt;</span>
    <span class="nt">&lt;value&gt;</span>org.apache.hadoop.util.HeapSort<span class="nt">&lt;/value&gt;</span>
    <span class="nt">&lt;description&gt;</span>Use heap sort class for sorting keys.<span class="nt">&lt;/description&gt;</span>
<span class="nt">&lt;/property&gt;</span>
</pre></div>

<p><strong>Note</strong>:</p>
<ul>
<li>If you want to use Mergesort be careful: If the length of data is less than 7, then it will be sorted with Insertion sort. See line 40 in <code>org.apache.hadoop.util.Mergesort.java</code>.</li>
<li>If the depth of recursion goes too deep, then Quicksort will switch to Heapsort. See line 35 and line 76 in <code>org.apache.hadoop.util.Quicksort.java</code>.</li>
</ul>
<h1>Steps</h1>
<ol>
<li>
<p>Let's have a clean system.</p>
<p><code>hadoop namenode -format</code></p>
</li>
<li>
<p>Once the system has been formatted then we could start the nodes.</p>
<p><code>./start-all.sh</code></p>
</li>
<li>
<p>Let's create a new folder in HDFS.</p>
<p><code>hadoop dfs -mkdir input</code></p>
</li>
<li>
<p>Then run your map code and reduce code.</p>
</li>
</ol>
<p>When everything goes well, I could go to http://localhost:50030 and see how the jobs perform.</p>
<p class="article_image"><img src="/blog/img/h1.png" alt=""/><br/>Everything went better than expected</p>

<p>Let's have an error on reduce code (Yes, I made reduce code fail on purpose to take the screenshot). In that case, the graph looks like</p>
<p class="article_image"><img src="/blog/img/h2.png" alt=""/><br/>Houston, we have a problem</p>

<p>You can do anything with Hadoop. Anything at all. The only limit is yourself.</p>]]></content:encoded>
    </item>
    <item>
      <title>Finding The Most Important Words In Documents</title>
      <link>http://faruk.akgul.org/blog/finding-most-important-words-documents</link>
      <pubDate>Sun, 22 Jul 2012 00:00:00 PDT</pubDate>
      <guid>http://faruk.akgul.org/blog/finding-most-important-words-documents</guid>
      <description>Finding The Most Important Words In Documents</description>
      <content:encoded><![CDATA[<p>It's 04:23 AM and I'm sitting in a hotel room without <em>real</em> Internet connection so I've decided to write something on semantic web while listening Françoise Hardy. Anyway, let's say you want to detect which words are important of a given article.</p>
<h1>Why would you want to detect the important words?</h1>
<p>The answer is quite easy. Let's say you want to build keywords of a document to describe it. There are more complex ways of analysing the most important words but the simplest approach would be calculating the Tf.Idf scores of a document.</p>
<h1>What is Tf.Idf?</h1>
<ul>
<li><strong>Tf</strong>: Term Frequency</li>
<li><strong>Idf</strong>: Inverse Document Frequency</li>
</ul>
<h2>Tf (Term Frequency)</h2>
<p>Let's start with finding the <code>Tf</code> of a given document. The formula is simple:</p>
<pre><code>Tf(d, t)
</code></pre>
<ul>
<li><code>d</code>: Document</li>
<li><code>t</code>: Number of times term <code>t</code> appears in document <code>d</code>.</li>
</ul>
<h3>Term Frequency</h3>
<p>Here's an example on how to construct Tf table. The sentences are from Friedrich Nietzsche's Ecce Homo.</p>
<ul>
<li><strong>document1</strong>: how one becomes what one is.</li>
<li><strong>document2</strong>: the remotest idea of what one is.</li>
<li><strong>document3</strong>: one becomes what one is.</li>
</ul>
<table>
<tr>
    <th>Term `t`</th><th>Document `d`</th><th>Term Frequency `Tf(d, t)`</th>
</tr>
<tr>
    <td>one</td><td>d1</td><td>2</td>
</tr>
<tr>
    <td>one</td><td>d2</td><td>1</td>
</tr>
<tr>
    <td>one</td><td>d3</td><td>2</td>
</tr>
<tr><td></td><td></td><td></td></tr>

<tr>
    <td>becomes</td><td>d1</td><td>1</td>
</tr>
<tr>
    <td>becomes</td><td>d2</td><td>0</td>
</tr>
<tr>
    <td>becomes</td><td>d3</td><td>1</td>
</tr>

</table>

<p>The problem with term frequency is most frequent words don't tell much about the document and in a real-world example the most common words you're going to get will be; and, the, of, a, to....</p>
<p>Most common words are not the most important words of a document.</p>
<p>In a real-world situation you'd most likely want to split the words with a smart REGEX and by filtering out with a stop words list. Anyway, so if we want to calculate <code>Tf</code> scores then simplest approach would be;</p>
<div class="pygments_emacs"><pre><span class="n">List</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">stop_words</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;();</span>

<span class="kd">static</span> <span class="o">{</span>
    <span class="n">stop_words</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="s">&quot;a&quot;</span><span class="o">);</span>
    <span class="n">stop_words</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="s">&quot;an&quot;</span><span class="o">);</span>
    <span class="n">stop_words</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="s">&quot;the&quot;</span><span class="o">);</span>
<span class="o">}</span>

<span class="n">List</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">build_words</span><span class="o">(</span><span class="n">String</span> <span class="n">document</span><span class="o">)</span> <span class="o">{</span>
    <span class="n">Matcher</span> <span class="n">matcher</span> <span class="o">=</span> <span class="n">Pattern</span><span class="o">.</span><span class="na">compile</span><span class="o">(</span><span class="s">&quot;[a-z]{2,}&quot;</span><span class="o">).</span><span class="na">matcher</span><span class="o">(</span><span class="n">document</span><span class="o">);</span>
    <span class="n">List</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">list</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;();</span>
    <span class="k">while</span><span class="o">(</span><span class="n">matcher</span><span class="o">.</span><span class="na">find</span><span class="o">())</span> <span class="n">list</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">matcher</span><span class="o">.</span><span class="na">group</span><span class="o">());</span>
    <span class="k">return</span> <span class="n">list</span><span class="o">;</span>
<span class="o">}</span>

<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Int</span><span class="o">&gt;</span> <span class="n">tf</span><span class="o">(</span><span class="n">String</span> <span class="n">document</span><span class="o">)</span> <span class="o">{</span>
    <span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Int</span><span class="o">&gt;</span> <span class="n">map</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Int</span><span class="o">&gt;();</span>
    <span class="n">String</span><span class="o">[]</span> <span class="n">list</span> <span class="o">=</span> <span class="n">document</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">);</span>
    <span class="k">for</span> <span class="o">(</span><span class="n">String</span> <span class="nl">s:</span> <span class="n">list</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">Int</span> <span class="n">counter</span> <span class="o">=</span> <span class="n">map</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">s</span><span class="o">);</span>
        <span class="k">if</span> <span class="o">(</span><span class="n">counter</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="n">map</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">s</span><span class="o">,</span> <span class="k">new</span> <span class="n">Int</span><span class="o">());</span>
        <span class="k">else</span> <span class="n">counter</span><span class="o">.</span><span class="na">incr</span><span class="o">();</span>
    <span class="o">}</span>
    <span class="k">return</span> <span class="n">map</span><span class="o">;</span>
<span class="o">}</span>
</pre></div>

<h2>Df (Document Frequency)</h2>
<p>The formula is:</p>
<pre><code>Df(c, t)
</code></pre>
<ul>
<li><code>c</code>: Documents (corpus) of a given dataset.</li>
<li><code>t</code>: Number of times term <code>t</code> appears in corpus <code>c</code>.</li>
</ul>
<p>Inverse Document Frequency Formula:</p>
<pre><code>1 / Df(c, t)
</code></pre>
<h3>Document Frequency</h3>
<p>Document frequency is the total number of documents that word <code>w</code> appears in.</p>
<ul>
<li><strong>document1</strong>: how one becomes what one is.</li>
<li><strong>document2</strong>: the remotest idea of what one is.</li>
<li><strong>document3</strong>: one becomes what one is.</li>
</ul>
<table>
<tr>
    <th>Term `t`</th><th>Document Frequency `Df(c, t)`</th><th>Inverse Document Frequency `1/Df(c, t)`</th>
</tr>
<tr>
    <td>one</td><td>3</td><td>0.33</td>
</tr>
<tr>
    <td>becomes</td><td>2</td><td>0.5</td>
</tr>
<tr>
    <td>what</td><td>3</td><td>0.33</td>
</tr>
<tr>
    <td>remotest</td><td>1</td><td>1</td>
</tr>
</table>

<p>When you calculate Idf values of words then you start getting somewhere but it's still not good enough.</p>
<div class="pygments_emacs"><pre><span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Int</span><span class="o">&gt;</span> <span class="n">df</span><span class="o">(</span><span class="n">List</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">documents</span><span class="o">)</span> <span class="o">{</span>
    <span class="n">List</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">list</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;();</span>
    <span class="k">for</span> <span class="o">(</span><span class="n">String</span> <span class="nl">s:</span> <span class="n">documents</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">Set</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">set</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashSet</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;();</span>
        <span class="n">set</span><span class="o">.</span><span class="na">addAll</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">s</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">)));</span>
        <span class="k">for</span> <span class="o">(</span><span class="n">String</span> <span class="nl">d:</span> <span class="n">set</span><span class="o">)</span> <span class="n">list</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">d</span><span class="o">);</span>
    <span class="o">}</span>
    <span class="k">return</span> <span class="nf">counter</span><span class="o">(</span><span class="n">list</span><span class="o">);</span>
<span class="o">}</span>

<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;</span> <span class="n">idf</span><span class="o">(</span><span class="n">List</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">documents</span><span class="o">,</span> <span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Int</span><span class="o">&gt;</span> <span class="n">map</span><span class="o">)</span> <span class="o">{</span>
    <span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;</span> <span class="n">map2</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;();</span>
    <span class="k">for</span> <span class="o">(</span><span class="n">String</span> <span class="nl">s:</span> <span class="n">documents</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">String</span><span class="o">[]</span> <span class="n">split</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">);</span>
        <span class="k">for</span><span class="o">(</span><span class="n">String</span> <span class="nl">d:</span> <span class="n">split</span><span class="o">)</span> <span class="o">{</span>
            <span class="k">if</span> <span class="o">(</span><span class="n">map</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">d</span><span class="o">)</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
                <span class="kt">double</span> <span class="n">val</span> <span class="o">=</span> <span class="mf">1.0</span> <span class="o">/</span> <span class="o">(</span><span class="n">map</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">d</span><span class="o">).</span><span class="na">getCounter</span><span class="o">()</span> <span class="o">+</span> <span class="mi">1</span><span class="n">e</span><span class="o">-</span><span class="mi">100</span><span class="o">);</span>
                <span class="n">map2</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">d</span><span class="o">,</span> <span class="n">val</span><span class="o">);</span>
            <span class="o">}</span>
        <span class="o">}</span>
    <span class="o">}</span>
    <span class="k">return</span> <span class="n">map2</span><span class="o">;</span>
<span class="o">}</span>
</pre></div>

<p>Please note that, these codes shouldn't be used in <em>real-world</em>. I'm just trying to make things easier and simple.</p>
<h2>Tf.Idf</h2>
<p>Now, we know how to calculate Tf and Idf, as it can be easily guessed Tf.Idf is the combined version of Tf and Idf.</p>
<p>The formula is:</p>
<pre><code>Tf.Idf(c, d, t) = Tf(d, t) / Df(c, t)
</code></pre>
<table>
<tr>
    <th>Term `t`</th><th>Document `d`</th><th>Term Frequency `Tf(d, t)`</th><th>Document Frequency `Df(c, t)`</th><th>`Tf(d, t) / Df(c, t)`</th>
</tr>
<tr>
    <td>one</td><td>d1</td><td>2</td><td>3</td><td>0.66</td>
</tr>
<tr>
    <td>one</td><td>d2</td><td>1</td><td>3</td><td>0.33</td>
</tr>
<tr>
    <td>one</td><td>d3</td><td>2</td><td>3</td><td>0.66</td>
</tr>
<tr><td></td><td></td><td></td><td></td><td></td></tr>

<tr>
    <td>becomes</td><td>d1</td><td>1</td><td>2</td><td>0.5</td>
</tr>
<tr>
    <td>becomes</td><td>d2</td><td>0</td><td>2</td><td>0.0</td>
</tr>
<tr>
    <td>becomes</td><td>d3</td><td>1</td><td>2</td><td>0.5</td>
</tr>

</table>

<h3>Le Problem</h3>
<p>The problem with the simplest Tf.Idf approach is it gives very high score for very rare words so we need to "tweak" the formula. The simplest way would be adding <code>log</code>.</p>
<p>The updated formula:</p>
<pre><code>Tf.Idf(c, d, t) = Tf(d, t) log (N / Df(c, t))
</code></pre>
<ul>
<li><strong>N</strong>: The number of total documents.</li>
</ul>
<h3>Le Problem</h3>
<p>We still have a problem with the tweaked formula. The longer documents will have higher Tf.Idf scores than the shorter documents. Solution would be dividing the Tf.Idf score by the document length.</p>
<p>The updated formula:</p>
<pre><code>Tf.Idf(c, d, t) = Tf.Idf(c, d, t) / |dN|
</code></pre>
<ul>
<li><strong>dN</strong>: Document length.</li>
</ul>
<p>So, finally the most important words:</p>
<div class="pygments_emacs"><pre><span class="k">def</span> <span class="nf">most_important_words</span><span class="p">(</span><span class="n">documents</span><span class="p">,</span> <span class="n">length</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
  <span class="n">mapp</span> <span class="o">=</span> <span class="n">tfidf</span><span class="p">(</span><span class="n">documents</span><span class="p">)</span>
  <span class="n">mapp</span> <span class="o">=</span> <span class="n">collections</span><span class="o">.</span><span class="n">OrderedDict</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="n">mapp</span><span class="o">.</span><span class="n">items</span><span class="p">()),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">reverse</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
  <span class="k">return</span> <span class="n">mapp</span><span class="o">.</span><span class="n">items</span><span class="p">()[:</span><span class="n">length</span><span class="p">]</span>
</pre></div>

<p>I've used stop words to get more accurate results. The stop words I've used are from <code>google.appengine.ext.search</code> package.</p>
<p>All words are important but some words are more important than the others.</p>]]></content:encoded>
    </item>
    <item>
      <title>A Python Development Story: Why web.py?</title>
      <link>http://faruk.akgul.org/blog/python-development-story-why-webpy</link>
      <pubDate>Fri, 13 Jul 2012 00:00:00 PDT</pubDate>
      <guid>http://faruk.akgul.org/blog/python-development-story-why-webpy</guid>
      <description>A Python Development Story: Why web.py?</description>
      <content:encoded><![CDATA[<p>web.py is a Python framework which was started by <a href="http://www.aaronsw.com/">Aaron Swartz</a> and was the framework that ran <a href="http://reddit.com/">reddit</a> for years.</p>
<p>The major features of web.py is; simplicity, minimalism, doesn't get in your way and plays well with other libraries.</p>
<h1>Why web.py?</h1>
<p>If you're after:</p>
<ul>
<li>simplicity</li>
<li>freedom</li>
<li>don't want to shoot yourself in the foot</li>
<li>writing clean code</li>
<li>minimalism</li>
<li>a solid web framework</li>
</ul>
<p>then web.py is the web framework you're looking for.</p>
<p>A quote from web.py's mailing list that explains web.py quite well:</p>
<blockquote>
<p>I don't want to be forced to abstract my database into objects and compartmentalize my code into MVC. I don't wanna have four technologies (html/css/js/php) intertwined all in one file. I don't want to compile all my code every time I change one line just to test it. Some of my objects actually have sub-classes six levels deep. Web.py empowered me to start working immediately.</p>
</blockquote>
<h1>Getting Started</h1>
<p>Skeleton! Your web application needs a skeleton. I had created a sample skeleton for web.py and put it on <a href="https://github.com/faruken/web.py-skeleton">GitHub</a>. You could have a look at it if you wish.</p>
<p>Here's my web.py skeleton and what each element does.</p>
<ul>
<li><strong>doc</strong>: Here goes the project's documentation files.</li>
<li><strong>licenses</strong>: That's where I put the licenses (GPL, MIT, BSD etc.) of the project and each library I use for the application.</li>
<li><strong>requirements</strong>: Required 3rd party libraries for the project.</li>
<li><strong>sh</strong>: Bash script files of the project.</li>
<li><strong>www</strong>: The project itself.<ul>
<li><strong>app</strong>: Contains the application modules.<ul>
<li><strong>controllers</strong>: The project itself is based on MVC architecture so this module contains the handler modules of controllers package.</li>
<li><strong>tools</strong>: Tools that is used for the project.</li>
<li><strong>views</strong>: Template files.</li>
<li><strong>models</strong>: Database models of the application. It's 120% chance that these are SQLAlchemy files.</li>
<li><strong>bridge</strong>: Sometimes you need to communicate with a server that's written in another language. That's where I put all those "communication" files.</li>
</ul>
</li>
<li><strong>lib</strong>: These are libraries I develop for the project but aren't really tools. Most likely these are written in C (Python modules though). You know, sometimes you need to "put the pedal to the metal". The difference between a <em>lib</em> and a <em>tools</em> is; the modules in libs could be used in another projects whereas tools are dependent to project itself and can't be used elsewhere.</li>
<li><strong>log</strong>: Application's log files. I log everything. EVERYTHING.</li>
<li><strong>public</strong>: These folder contains the minimized, compiled CSS, Javascript, CoffeeScript files and images so the files in this folder are production ready and can't be used in development.<ul>
<li><strong>css</strong>: Production ready CSS files which are minimized and compiled (from LESS).</li>
<li><strong>js</strong>: Generated Javascript files from CoffeeScript and minimized and compiled using Google Closure.</li>
<li><strong>img</strong>: Minimized images files. These are compressed using unusual techniques.</li>
</ul>
</li>
<li><strong>static</strong>: Contains the development CSS, CoffeeScript, Javascript, and images files of the project.<ul>
<li><strong>css</strong>: LESS/CSS files for development. For this project we didn't use LESS though.</li>
<li><strong>cs</strong>: CoffeeScript files.</li>
<li><strong>js</strong>: Javascript files in development mode.</li>
<li><strong>img</strong>: Images of the application.</li>
</ul>
</li>
<li><strong>test</strong>: Test files of the application. These are, as you could guess easily, automated by using Selenium and nose.</li>
<li><strong>tmp</strong>: Garbage files basically.</li>
<li><strong>main.py</strong>: The only files that's executed directly by the server (WSGI, FastCGI or whatever it's).</li>
<li><strong>main_development.py</strong>: This is the main executed file in development mode.</li>
<li><strong>settings.py</strong>: Global constants and settings of the application. This is somehow similar to Django's settings file.</li>
<li><strong>urls.py</strong>: Contains the URLs of the application.</li>
</ul>
</li>
</ul>
<h2>High Level Diagram</h2>
<p>Here's the high level diagram of this skeleton. I think it's self explanatory. The skeleton is based on MVC (model-view-controller) architecture.</p>
<p class="article_image"><img src="/blog/img/wpk0.png" alt=""/><br/>Package diagram - Architecture is based on MVC.</p>

<p class="article_image"><img src="/blog/img/wpk1.png" alt=""/><br/>High Level Package Diagram</p>

<p class="article_image"><img src="/blog/img/wpk2.png" alt=""/><br/>High Level Diagram</p>

<h1>web.py features</h1>
<h2>Database</h2>
<p>web.py has very interesting features. It comes with a db package which lets you use various of different databases. However, it's not an ORM. It's somehow similar to <code>sqlite3</code> package of Python but is capable of connecting to multiple databases (This feature was missing in Django for a quite long time but it was a feature of web.py from day 1). This is a compelling reason for people who are good at SQL and don't like to use an ORM.</p>
<p>As I said, the main thing about web.py is simplicity. If you're doing something simple this package is very suitable. However, I'm a big fan of SQLAlchemy and the last thing I did was not really a simple application so I didn't use <code>web.db</code>.</p>
<p>The best thing about this web.py module is you could just take it and use it independently. Here, I've created a small Flask application which uses web.py's database module <code>web.db</code>.</p>
<div class="pygments_emacs"><pre><span class="kn">from</span> <span class="nn">web</span> <span class="kn">import</span> <span class="n">database</span>
<span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">Flask</span><span class="p">,</span> <span class="n">render_template</span><span class="p">,</span> <span class="n">abort</span><span class="p">,</span> <span class="n">flash</span><span class="p">,</span> <span class="n">redirect</span><span class="p">,</span> <span class="n">url_for</span><span class="p">,</span> <span class="n">request</span>

<span class="n">DATABASE</span> <span class="o">=</span> <span class="s">&#39;/tmp/flaskr.db&#39;</span>
<span class="n">SECRET_KEY</span> <span class="o">=</span> <span class="s">&#39;what&#39;</span>
<span class="n">DEBUG</span> <span class="o">=</span> <span class="bp">True</span>

<span class="n">app</span> <span class="o">=</span> <span class="n">Flask</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>
<span class="n">app</span><span class="o">.</span><span class="n">config</span><span class="o">.</span><span class="n">from_object</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>

<span class="n">conn</span> <span class="o">=</span> <span class="n">database</span><span class="p">(</span><span class="n">dbn</span><span class="o">=</span><span class="s">&#39;sqlite&#39;</span><span class="p">,</span> <span class="n">db</span><span class="o">=</span><span class="n">DATABASE</span><span class="p">)</span>

<span class="nd">@app.route</span><span class="p">(</span><span class="s">&#39;/&#39;</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">index</span><span class="p">():</span>
    <span class="n">entries</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&#39;entries&#39;</span><span class="p">,</span>
                          <span class="n">what</span><span class="o">=</span><span class="s">&#39;id, title, text&#39;</span><span class="p">,</span>
                          <span class="n">limit</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span>
                          <span class="n">order</span><span class="o">=</span><span class="s">&#39;id DESC&#39;</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">render_template</span><span class="p">(</span><span class="s">&#39;index.html&#39;</span><span class="p">,</span> <span class="n">entries</span><span class="o">=</span><span class="n">entries</span><span class="p">)</span>

<span class="nd">@app.route</span><span class="p">(</span><span class="s">&#39;/add&#39;</span><span class="p">,</span> <span class="n">methods</span><span class="o">=</span><span class="p">[</span><span class="s">&#39;GET&#39;</span><span class="p">,</span> <span class="s">&#39;POST&#39;</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">():</span>
    <span class="k">if</span> <span class="n">request</span><span class="o">.</span><span class="n">method</span> <span class="o">==</span> <span class="s">&#39;POST&#39;</span><span class="p">:</span>
        <span class="n">conn</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="s">&#39;entries&#39;</span><span class="p">,</span>
                    <span class="n">title</span><span class="o">=</span><span class="n">request</span><span class="o">.</span><span class="n">form</span><span class="p">[</span><span class="s">&#39;title&#39;</span><span class="p">],</span>
                    <span class="n">text</span><span class="o">=</span><span class="n">request</span><span class="o">.</span><span class="n">form</span><span class="p">[</span><span class="s">&#39;text&#39;</span><span class="p">])</span>
        <span class="n">flash</span><span class="p">(</span><span class="s">&#39;a new entry&#39;</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">redirect</span><span class="p">(</span><span class="n">url_for</span><span class="p">(</span><span class="s">&#39;index&#39;</span><span class="p">))</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">render_template</span><span class="p">(</span><span class="s">&#39;add.html&#39;</span><span class="p">)</span>

<span class="nd">@app.route</span><span class="p">(</span><span class="s">&#39;/entry/&lt;int:entry_id&gt;&#39;</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">show_entry</span><span class="p">(</span><span class="n">entry_id</span><span class="p">):</span>
    <span class="n">entry</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&#39;entries&#39;</span><span class="p">,</span>
                        <span class="n">what</span><span class="o">=</span><span class="s">&#39;id, title, text&#39;</span><span class="p">,</span>
                        <span class="n">where</span><span class="o">=</span><span class="s">&#39;id = $id&#39;</span><span class="p">,</span>
                        <span class="nb">vars</span><span class="o">=</span><span class="p">{</span><span class="s">&#39;id&#39;</span><span class="p">:</span> <span class="n">entry_id</span><span class="p">},</span>
                        <span class="n">limit</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">render_template</span><span class="p">(</span><span class="s">&#39;entry.html&#39;</span><span class="p">,</span> <span class="n">entry</span><span class="o">=</span><span class="n">entry</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
    <span class="k">except</span> <span class="ne">IndexError</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">abort</span><span class="p">(</span><span class="mi">404</span><span class="p">)</span>

<span class="k">def</span> <span class="nf">a</span><span class="p">():</span>
    <span class="n">q</span> <span class="o">=</span> <span class="s">&quot;&quot;&quot;</span>
<span class="s">create table entries(</span>
<span class="s">id integer primary key autoincrement,</span>
<span class="s">title string not null,</span>
<span class="s">text string not null</span>
<span class="s">);</span>
<span class="s">&quot;&quot;&quot;</span>

<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">&#39;__main__&#39;</span><span class="p">:</span>
    <span class="n">app</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">debug</span><span class="o">=</span><span class="n">DEBUG</span><span class="p">)</span>
</pre></div>

<p>web.py is so flexible and has flexible modules that you could wipe out whatever you want from it and use it with another web framework. The above code is proof of that. Entire application is on <a href="https://gist.github.com">GitHub</a>.</p>
<h2>Forms</h2>
<p>web.py comes with a forms package which let's you create forms and validators. Ironically, it doesn't have built-in protection against CSRF. You want to create a login form? You could use forms package to create yours. Here's a simple example;</p>
<div class="pygments_emacs"><pre><span class="k">class</span> <span class="nc">IFormLogin</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
  <span class="nd">@staticmethod</span>
  <span class="k">def</span> <span class="nf">render_login_form</span><span class="p">(</span><span class="n">token</span><span class="p">):</span>
    <span class="n">login_form</span> <span class="o">=</span> <span class="n">Form</span><span class="p">(</span>
      <span class="n">form</span><span class="o">.</span><span class="n">Textbox</span><span class="p">(</span><span class="s">&#39;username&#39;</span><span class="p">,</span> <span class="n">form</span><span class="o">.</span><span class="n">Validator</span><span class="p">(</span><span class="s">&#39;Please write your username&#39;</span><span class="p">,</span>
                                              <span class="k">lambda</span>
                                                  <span class="n">x</span><span class="p">:</span> <span class="n">USERNAME_MAX_LENGTH</span> <span class="o">&gt;</span>
                                                     <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="o">&gt;</span>
                                                     <span class="n">USERNAME_MIN_LENGTH</span><span class="p">),</span>
                   <span class="n">description</span><span class="o">=</span><span class="s">&#39;Username:&#39;</span><span class="p">,</span>
                   <span class="nb">id</span><span class="o">=</span><span class="s">&#39;username&#39;</span><span class="p">,</span>
                   <span class="n">class_</span><span class="o">=</span><span class="s">&#39;span4&#39;</span><span class="p">),</span>
      <span class="n">form</span><span class="o">.</span><span class="n">Password</span><span class="p">(</span><span class="s">&#39;password&#39;</span><span class="p">,</span> <span class="n">form</span><span class="o">.</span><span class="n">Validator</span><span class="p">(</span><span class="s">&#39;Please write your password&#39;</span><span class="p">,</span>
                                               <span class="k">lambda</span>
                                                   <span class="n">x</span><span class="p">:</span> <span class="n">PASSWORD_MAX_LENGTH</span>
                                                      <span class="o">&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="o">&gt;</span>
                                                      <span class="n">PASSWORD_MIN_LENGTH</span><span class="p">),</span>
                    <span class="n">description</span><span class="o">=</span><span class="s">&#39;Password:&#39;</span><span class="p">,</span>
                    <span class="nb">id</span><span class="o">=</span><span class="s">&#39;password&#39;</span><span class="p">,</span>
                    <span class="n">class_</span><span class="o">=</span><span class="s">&#39;span4&#39;</span><span class="p">),</span>
                <span class="n">form</span><span class="o">.</span><span class="n">Hidden</span><span class="p">(</span><span class="s">&#39;token&#39;</span><span class="p">,</span> <span class="nb">id</span><span class="o">=</span><span class="s">&#39;token&#39;</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="n">token</span><span class="p">),</span>
      <span class="n">validators</span><span class="o">=</span><span class="p">[</span><span class="n">form</span><span class="o">.</span><span class="n">Validator</span><span class="p">(</span><span class="s">&#39;Username or password is wrong&#39;</span><span class="p">,</span>
                                 <span class="n">Member</span><span class="o">.</span><span class="n">exists</span><span class="p">)]</span>
    <span class="p">)</span>
    <span class="k">return</span> <span class="n">login_form</span>
</pre></div>

<h3>Related Posts</h3>
<ol>
<li><a href="http://faruk.akgul.org/blog/jenkins-meets-webpy-pycharm/">A Python Development Story: Part 1 Jenkins-CI meets web.py and PyCharm</a></li>
<li><a href="http://faruk.akgul.org/blog/python-development-story-part-two/">A Python Development Story: Part 2</a></li>
</ol>]]></content:encoded>
    </item>
    <item>
      <title>A Python Development Story: Part Deux</title>
      <link>http://faruk.akgul.org/blog/python-development-story-part-two</link>
      <pubDate>Thu, 12 Jul 2012 00:00:00 PDT</pubDate>
      <guid>http://faruk.akgul.org/blog/python-development-story-part-two</guid>
      <description>A Python Development Story: Part Deux</description>
      <content:encoded><![CDATA[<p>For the last couple of months, I was part of a small team (only 10 people) for a project. This article is a summary of what I learnt what I still didn't learn what I should do in future and the reader may have some ideas as well.</p>
<h1>What we did</h1>
<p>I can't talk much about it much but the big picture was to analyze and visualize data sets. It wasn't a some sort of a blog engine that 99.99% of Django applications are about (That's the part I salute Cal Henderson).</p>
<h1>Tools</h1>
<p>It was a Python application with the following technologies;</p>
<ul>
<li>web.py</li>
<li>SQLAlchemy</li>
<li>Mako templates</li>
<li>CoffeeScript</li>
<li>and some C sauce</li>
</ul>
<h2>Framework Choice</h2>
<p>We couldn't choose Django, web2py or any other macro-framework since the application we built wouldn't be suitable for them. It wasn't a web application, it was simply a web service to analyze and visualize data sets. For example;</p>
<ul>
<li>Django's ORM would literally cry if we ever wanted to use it and its forms modules would be useless for us.</li>
<li>I've never used web2py so I didn't want to touch something I don't know and since it's not really small and I was afraid learning curve might be too long.</li>
</ul>
<p>By looking at these options we had 3 alternatives on the table;</p>
<ul>
<li><strong>web.py</strong>: The ultimate web framework that used to power <a href="http://reddit.com">reddit</a>.</li>
<li><strong>Flask</strong>: <em>New</em> Python micro framework that relies on Werkzeug.</li>
<li><strong>bottle</strong>: The web framework that's only 1 file.</li>
<li><strong>brubeck</strong>: The most interesting framework I've seen for a quite long time.</li>
</ul>
<h3>Why not Flask?</h3>
<p>If you bother to read the rest of the article, you'll notice that I mention "strict regex rules" somewhere.</p>
<p>What I want: Regex based URLs</p>
<p>Flask provides helpers such as <code>int/float</code> but they're useless in our application. We don't have integers in URLs nor names. We have UUIDs and bunch of other things.</p>
<h4>The Problem With Flask</h4>
<p>Maybe it's my ignorance but after searching the documents and the web, it's come to my attention that this is what I need to do if I want REGEX URLs in Flask:</p>
<p>First I need to create a Converter class:</p>
<div class="pygments_emacs"><pre><span class="kn">from</span> <span class="nn">werkzeug.routing</span> <span class="kn">import</span> <span class="n">BaseConverter</span>

<span class="k">class</span> <span class="nc">Converter</span><span class="p">(</span><span class="n">BaseConverter</span><span class="p">):</span>
  <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">mapper</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
    <span class="nb">super</span><span class="p">(</span><span class="n">Converter</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="n">mapper</span><span class="p">)</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">regex</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
</pre></div>

<p>I've no clue why I needed this in the first place. I know Flask relies on Werkzeug but at the end of the day if I wanted to do something with Flask, I had to import a 3rd party library. This doesn't look very right to me and as far as I know other frameworks I've coded such as Django, Brubeck and Google App Engine's webapp don't have this kind of thing. After defining this converter class, this is what I need to do;</p>
<div class="pygments_emacs"><pre><span class="n">app</span><span class="o">.</span><span class="n">mapper</span><span class="o">.</span><span class="n">converters</span><span class="p">[</span><span class="s">&#39;regex&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">Converter</span>
</pre></div>

<p>After this, I could write REGEX URLs such as;</p>
<div class="pygments_emacs"><pre><span class="n">app</span><span class="o">.</span><span class="n">add_url_rule</span><span class="p">(</span><span class="s">r&#39;/u/[a-z]{5,10}&#39;</span><span class="p">,</span> <span class="n">view_func</span><span class="o">=</span><span class="n">IndexHandler</span><span class="o">.</span><span class="n">as_view</span><span class="p">(</span><span class="s">&#39;index&#39;</span><span class="p">))</span>
</pre></div>

<p>Maybe I looked at the wrong documentation but I really shouldn't needed to do all of these. In my point of view, this is a horror story. If you look at Django, web.py, Google App Engine's webapp or Brubeck you just write your REGEX URLs.</p>
<h5>Mako Support</h5>
<p>There's an extension for Flask to support Mako but I do remember I had run into some problems with it (and I ended up writing my own mako integration) and a quick search on StackOverFlow showed me that I wasn't the only one who had problems.</p>
<p>I didn't look at the rest of features Flask offers after these.</p>
<p>If I look at the bright side; Flask has an amazing documentation and is downloadable. Downloadable documentation is useful if you code in some place that you don't have Internet access. I wish more libraries would do that.</p>
<p>However, I had noticed Flask is good enough for creating small applications but I wouldn't touch it for creating complex applications.</p>
<h3>Why not bottle?</h3>
<p>I wanted class based controllers and as far as I know bottle doesn't provide such functionality. If you do bottle, please forgive my ignorance!</p>
<h3>Why not Brubeck?</h3>
<p>Brubeck relies on mongrel2 and some other cool technologies however we had to use Apache (we didn't have any alternatives and it wasn't our call to make a decision).</p>
<h3>Why web.py?</h3>
<p>web.py gives you freedom and does what you say. The thing I like about it is its flexibility thus it was very suitable for us. However, it wasn't all that good. I don't like a thing in web.py: URL definitions.</p>
<p>URL definitions in web.py are defined as in string pairs. For example;</p>
<div class="pygments_emacs"><pre><span class="n">urls</span> <span class="o">=</span> <span class="p">(</span><span class="s">r&#39;/hello&#39;</span><span class="p">,</span> <span class="n">Hello</span><span class="p">,</span> <span class="s">r&#39;/world&#39;</span><span class="p">,</span> <span class="n">World</span><span class="p">)</span>
</pre></div>

<p>I think this isn't very right. In my point of view, this looks more beautiful;</p>
<div class="pygments_emacs"><pre><span class="n">urls</span> <span class="o">=</span> <span class="p">[(</span><span class="s">r&#39;/hello&#39;</span><span class="p">,</span> <span class="n">Hello</span><span class="p">),</span> <span class="p">(</span><span class="s">r&#39;/world&#39;</span><span class="p">,</span> <span class="n">World</span><span class="p">)]</span>
</pre></div>

<p>Apparently I'm not the only one who <a href="http://www.youtube.com/watch?v=AYjPIMe0BhA#t=1208s">complains about this</a>.</p>
<p>However, web.py met all the mandatory conditions we looked for so it was suitable for us. If you're interested in more about why web.py, you may want to read the follow up <a href="http://faruk.akgul.org/blog/python-development-story-why-webpy/">article</a>.</p>
<h1>The Big 3: Development, Source Control, Bug Tracking</h1>
<h2>The Cycle</h2>
<p>The cycle of an application is;</p>
<ul>
<li>Develop</li>
<li>Deploy</li>
<li>Maintain</li>
</ul>
<h2>Development</h2>
<p>Development is the most important part since it's a continuos process. You need to select the right tools for the job. Some rules we followed:</p>
<ul>
<li>If a code can't pass PEP8*, I don't want to see it in codebase. I don't care if it works.</li>
<li>If PyLint scores the code less than 8/10, I don't want to see it. I don't care if it works.</li>
<li>If there's no comment, that code has no place in codebase.</li>
<li>If you introduce a new method and you didn't write the necessary test for it, I don't want to see it.</li>
<li>Don't forget the clone detection and code execution order analysis.</li>
</ul>
<p>*: We used a slightly modified PEP8.</p>
<h3>IDE Choice</h3>
<p>I'm an emacs user and I do development in emacs when it comes to Python, C, C++, Cilk++, Erlang etc. I use an IDE (<a href="http://www.jetbrains.com/idea/">IntelliJ IDEA</a>) only for Java development.</p>
<p>However, I had always wanted to try PyCharm and for this project I used PyCharm.</p>
<h4>PyCharm</h4>
<p>A Python IDE from JetBrains. It's just great. I loved it even more when it did <code>coverage</code> automatically. My only criticise about PyCharm is themes. There should be a way to import TextMate themes (if there's a way, I don't know).</p>
<p>It's not only a Python IDE though. It does lint checking, auto completion for Javascript, CoffeeScript and CSS. It supports Mako templates out of the box and provides GUI for version control systems and connects to database too.</p>
<p>And the price is bargain if you consider what it can do. There are some criticism about PyCharm being slow but I didn't have any problems.</p>
<p>The only problem I saw was it takes a little bit more time to load the project if you have lots of files and maybe it's my ignorance but I couldn't see a way to disable loading the files on some folders. It'd be nice if there was a setting to ignore loading the files on startup on some certain folders. This is a serious issue if you have more than 280,000 log files in your log folder. In this case PyCharm never loads the project and you give up and close it.</p>
<p>PyCharm does lint checking and error checking for Javascript and CoffeeScript files as well.</p>
<p class="article_image"><img src="/blog/img/co.jpg" alt=""/><br/>PyCharm - CoffeeScript Error Checking</p>

<p>Another cool feature of PyCharm is it allows you to see the auto-generated Javascript code of your CoffeeScript side by side.</p>
<p class="article_image"><a href="/blog/img/co1.jpg"><img src="/blog/img/co1_t.jpg" alt=""/></a><br/>CoffeScript code on the left and auto-generated Javascript code on the right</p>

<h2>Talk to me about Testing</h2>
<h3>Automated Tests</h3>
<p>Yes, all the cool kids use Selenium which is good enough and integrates with nose well. It automates all the tests. Takes screenshots when necessary. Taking screenshot is quite useful I think since it lets you see what a user experience. The visible error message are very useful too.</p>
<p>For example consider the following piece of Selenium test code;</p>
<ol>
<li><code>test_get_something</code>: Goes to the URL and finds the text named <code>My View</code> and clicks on it, then browser waits 10 seconds (since the other page is loading), then after 10 seconds find a text named <code>view</code>.</li>
<li><code>test_screenshot</code>: saves the screenshot of the given URL.</li>
</ol>
<div class="pygments_emacs"><pre><span class="k">def</span> <span class="nf">test_get_something</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
  <span class="bp">self</span><span class="o">.</span><span class="n">browser</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">url</span><span class="p">)</span>
  <span class="n">text</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">browser</span><span class="o">.</span><span class="n">find_element_by_link_text</span><span class="p">(</span><span class="s">&#39;My View&#39;</span><span class="p">)</span>
  <span class="n">text</span><span class="o">.</span><span class="n">click</span><span class="p">()</span>
  <span class="n">WebDriverWait</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">browser</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span><span class="o">.</span><span class="n">until</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">find_element_by_id</span><span class="p">(</span><span class="s">&#39;view&#39;</span><span class="p">))</span>
  <span class="n">find_text</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">browser</span><span class="o">.</span><span class="n">find_element_by_id</span><span class="p">(</span><span class="s">&#39;view&#39;</span><span class="p">)</span>
  <span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">find_text</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span>

<span class="k">def</span> <span class="nf">test_screenshot</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
  <span class="bp">self</span><span class="o">.</span><span class="n">browser</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">url</span> <span class="o">+</span> <span class="s">&#39;/go_where&#39;</span><span class="p">)</span>
  <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
  <span class="bp">self</span><span class="o">.</span><span class="n">browser</span><span class="o">.</span><span class="n">save_screenshot</span><span class="p">(</span><span class="s">&#39;test.jpg&#39;</span><span class="p">)</span>
  <span class="bp">self</span><span class="o">.</span><span class="n">browser</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</pre></div>

<h3>Stress Testing</h3>
<p>It's all about testing the limits of your application. <code>ab</code> is your friend. I was quite happy when I found out our web.py application was capable of handling 40,000 requests and 165 multiple connections without any caching or anything else.</p>
<p>A sample <code>ab</code> command would be;</p>
<div class="pygments_emacs"><pre>ab -c 180 -n 40000 http://192.168.100.124/
</pre></div>

<h4>Performance</h4>
<p>Performance measurement of a web application has 2 parts:</p>
<ol>
<li>Client Side</li>
<li>Server Side</li>
</ol>
<h4>Client Side Performance</h4>
<p>This could be measured by YSlow, an extension for Firebug. Google has a tool for performance measurement but it's only for 32 bit architecture.</p>
<h4>Server Side Performance</h4>
<p>As I said <code>ab</code> is your friend. It does the job.</p>
<h3>Penetrate Testing</h3>
<p>There are a lot of tools you could use for penetrate testing. For example; XSSer for XSS testing. The main thing we (we means I by the way since I was the only developer) had no choice but to use offline testing tools. Online testing tools are useless for various of reasons;</p>
<ul>
<li>99.99% times your development servers don't have Internet connection and the only way to connect them is via VPS.</li>
<li>I'd like to run the test on development code not on production code.</li>
<li>Let's say I found a security vulnerability on production code. I can't fix it immediately. I know, I know, real men do make changes on production code using nano but I'm a coward.</li>
</ul>
<p>Online testing tools were useless in our case since our application servers didn't have Internet connection and we could only connect them via VPS.</p>
<h2>Source Control</h2>
<p>This part is important. You need to select something that won't screw you back or won't corrupt your files. Our choice was git whereas the other team (there were two teams on the project which designed different parts of the project) decided to go with subversion.</p>
<h3>Before Committing</h3>
<p>Before committing files to repository, the following steps were done:</p>
<ul>
<li>Do lint analysis</li>
<li>Optimize imports</li>
<li>PEP8 and PyLint analysis</li>
</ul>
<h2>Bug Tracking</h2>
<p>There are a lot of bug tracking software out there but the teams went with the following:</p>
<ul>
<li><strong>YouTrack</strong>: Issue tracker and bug tracker from JetBrains. It's awesome-o.</li>
<li><strong>Trac</strong>: It's good enough and was the choice of other team.</li>
</ul>
<h2>Security</h2>
<p>If you don't use a framework, or if you use a minimalist framework it's your responsibility to secure your application. Even if you use a framework such as Rails or Django, it's still your responsibility to secure your application. I'm not much a security guy but I took care of every vulnerability I know and took necessary steps to store the passwords. No, passwords are not stored in plain text nor are they hashed with <code>SHA1</code> without any salt key (I'm looking at you LinkedIn).</p>
<ul>
<li><strong>XSS</strong>: web.py provides a method to protect against XSS. However, I used markupsafe library.</li>
<li><strong>SQL-Injection</strong>: Since we used SQLAlchemy this wasn't really an issue.</li>
<li><strong>CSRF</strong>: web.py doesn't do any protection against CSRF and as far as I know there's not a 3rd party library so I had to code my own protection. The key point of CSRF is to generate a key that's hard to guess for an attacker so I generated random SHA256 keys for CSRF tokens.</li>
<li><strong>Cookie Poisoning</strong>: How I handled this is easy. You need to send the cookie to user for obvious reasons but the cookie the application sends to user is a SHA256 digest which stores the necessary information. There's an encrypt key, validation key and secret key which are used to generate the cookie and these keys are stored only on server side and the cookie is stored in database too. I don't want to go into too much detail how these different keys are used.</li>
</ul>
<p>Later on you need to take care of active network attackers and bunch of other things.</p>
<h3>Application Specific Security</h3>
<p>This part is about form validation and storing the passwords. Since I can't go into too much detail but we had lots of forms which needed their own validation and protection layers. This involved writing strict regex rules and strictly checking every single input.</p>
<h4>Storing Passwords</h4>
<p>User passwords are hashed N times where N is different for each user by using a global salt key and a salt key which is different for each user.</p>
<h2>Deployment (aka Production vs. Development)</h2>
<p>The development code and the production code are different than each other. You need to compile, minimize the static files such as Javascript, CSS and image files. Even the Python files are different on production and on development mode.</p>
<p>For this reason, I use my own tools. I'm not cool enough to use fabric (no, there's no sarcasm or arrogance when I say it).</p>
<h3>Compile Javascript</h3>
<p>There are lots of tools for this job. In our case the Javascript files were auto-generated from CoffeeScript.</p>
<ul>
<li><strong>Google Closure Compiler</strong>: My personal favorite.</li>
<li><strong>Yahoo Compressor</strong>: Never used it to compress Javascript. I have no clue.</li>
<li><strong>Dojo Compressor</strong>: Never used it. I have no clue.</li>
<li><strong>JSMin</strong>: Never used it. I have no clue.</li>
</ul>
<h3>JS Lint Checking</h3>
<p>PyCharm provides lint checking out of the box.</p>
<p class="article_image"><img src="/blog/img/de.jpg" alt=""/><br/>PyCharm - Javascript Lint Checking</p>

<h3>Minify CSS</h3>
<ul>
<li><strong>Yahoo Compressor</strong>: Good enough. Does the job.</li>
<li><strong>Less</strong>: LESS provides a minimizer as well, you could also use Yahoo Compressor with it though.</li>
<li><strong>Icey's Compressor</strong>: This is an online tool so there's no way to use it offline. That's the disadvantage I noticed first. Never really used it.</li>
</ul>
<h3>Gzip All The Things!</h3>
<div class="pygments_emacs"><pre>/usr/bin/gzip -cn9 main.js &gt; small.js.gz
</pre></div>

<p>Sorcery?</p>
<h1>Team Issues</h1>
<p>Every person has different point of views and skills and abilities. The most common issue is communication issues.</p>
<h2>Communication Issues</h2>
<p>This part is huge. What I understand is different than what you understand. People have different levels of skills and abilities. If another team member doesn't have the knowledge and technical background as you then the entire team is in trouble.</p>
<p>Another part is the language. English is the second or the third language for some people. Even if the person is the native speaker of English, if he doesn't have the technical background, level of knowledge differences end up with miscommunication problems.</p>
<h3>Communication</h3>
<p>Telephone, SMS, Skype, meetings and email are the tools you could use for communication. You could also "shout at each other" as a communication tool. It works but we didn't use it, well we rarely used it, maybe only once.</p>
<h1>The Thing That Should Not Be</h1>
<p>Software methodologies. Waterfall, agile, scrum, XP, you name it. I could write something about them but legendary <a href="http://www.programming-motherfucker.com/">Zed Shaw said it very well</a>.</p>
<h2>Lessons Learnt</h2>
<ol>
<li>Hubris will cause you problems.</li>
<li>Never say "what can go wrong" and take frequent backup of files and make sure you commit everything. EVERYTHING.</li>
<li>Make sure you commit and backup everything (I know I said this above). Seeing <code>node.right can't be null: ReferenceException</code> in the last minute isn't very pleasant.</li>
<li>C can play with Python quite well.</li>
<li>Even if it's a web application there's nothing wrong with "putting the pedal to the metal". C is your friend where suits.</li>
<li>Asynchronous all the things!</li>
<li>web.py is solid and capable of creating complex web services.</li>
</ol>
<h3>Related Posts</h3>
<ol>
<li><a href="http://faruk.akgul.org/blog/jenkins-meets-webpy-pycharm/">A Python Development Story: Part 1 Jenkins-CI meets web.py and PyCharm</a></li>
<li><a href="http://faruk.akgul.org/blog/python-development-story-why-webpy/">A Python Development Story: Part 3 - web.py Skeleton</a></li>
</ol>]]></content:encoded>
    </item>
    <item>
      <title>My Emacs Theme: Endless</title>
      <link>http://faruk.akgul.org/blog/my-emacs-theme-endless</link>
      <pubDate>Sun, 17 Jun 2012 02:00:00 PDT</pubDate>
      <guid>http://faruk.akgul.org/blog/my-emacs-theme-endless</guid>
      <description>My Emacs Theme: Endless</description>
      <content:encoded><![CDATA[<p>This theme is somehow a mixed version of Molokai and vim-sorcerer theme.  I've modified Molokai to look like vim-sorcerer theme. I provide some screenshots so you could have an idea how it looks like.</p>
<p>You could grab the theme from <a href="/blog/u/color-theme-endless.el">here</a>.</p>
<p><a href="/blog/img/emacs.jpg"><img alt="" src="/blog/img/emacs_t.jpg" /></a> <a href="/blog/img/emacs1.jpg"><img alt="" src="/blog/img/emacs1_t.jpg" /></a></p>]]></content:encoded>
    </item>
    <item>
      <title>A Python Development Story: Jenkins-CI meets web.py and PyCharm</title>
      <link>http://faruk.akgul.org/blog/jenkins-meets-webpy-pycharm</link>
      <pubDate>Sun, 06 May 2012 00:00:00 PDT</pubDate>
      <guid>http://faruk.akgul.org/blog/jenkins-meets-webpy-pycharm</guid>
      <description>A Python Development Story: Jenkins-CI meets web.py and PyCharm</description>
      <content:encoded><![CDATA[<p><a href="http://jenkins-ci.org/">Jenkins-CI</a> is a continuos integration platform. Setting up Jenkins and installing plugins is quite easy since it does everything automatically.</p>
<h2>Jenkins-CI</h2>
<p>We use the following tools with Jenkins-CI:</p>
<ul>
<li><strong>git</strong>: Our vcs choice</li>
<li><strong>nose</strong>: Unit test library.</li>
<li><strong>sloccount</strong>: Calculates the lines of code.</li>
<li><strong>PyLint</strong>: Static analyzer</li>
<li><strong>clonedigger</strong>: Detects duplicate code.</li>
<li><strong>coverage</strong>: Measures code coverage by monitoring the program, traces hooks, notes which parts have been executed, which parts could have been executed but wasn't.</li>
<li><strong>pep8</strong>: Coding style.</li>
</ul>
<p>The following plugins are necessary:</p>
<ul>
<li>git</li>
<li>Sloccount</li>
<li>Python</li>
<li>Violations</li>
<li>Warnings</li>
<li>xUnit</li>
</ul>
<p>All these plugins could be installed from plugin manager.</p>
<p>** There's also ShiningPanda plugin which adds some useful builders such as virtualenv but we've not tried it.</p>
<p>Jenkins takes care of everything. It clones your git repository and does the necessary steps.</p>
<h4>Shell Build Steps</h4>
<p>Shell build steps are straightforward. For example;</p>
<div class="pygments_emacs"><pre><span class="k">for </span>filename in <span class="sb">`</span>find . -name *.py | egrep -v <span class="s1">&#39;^./modules/&#39;</span><span class="sb">`</span>; <span class="k">do </span>pep8 --ignore<span class="o">=</span>E111 <span class="nv">$filename</span>; <span class="k">done</span> <span class="o">||</span> :
</pre></div>

<h2>PyCharm</h2>
<p><a href="http://www.jetbrains.com/pycharm/">PyCharm</a> is a Python IDE from JetBrains. In addition to plain Python, it can syntax highlight and check for errors for Cython, CoffeeScript, Mako/Jinja2 templates as well. You could connect to database and work on database tables too!</p>
<p>It supports git, virtualenv, pip (and probably lots of other things I've not noticed yet).</p>
<p>For example; Cython highlighting and error and spelling checking:</p>
<p class="article_image"><img src="/blog/img/p1.jpg" alt=""/><br/>Cython highlighting and error and spelling checking</p>

<p>and when you connect to your database in PyCharm;</p>
<p><img alt="" src="/blog/img/p2.jpg" /></p>
<p>Pretty cool, huh?</p>
<p>The rest is web.py, SQLAlchemy, Mako, gevent and ZeroMQ.</p>
<h3>Related Posts</h3>
<ol>
<li><a href="http://faruk.akgul.org/blog/python-development-story-part-two/">A Python Development Story: Part 2</a></li>
<li><a href="http://faruk.akgul.org/blog/python-development-story-why-webpy/">A Python Development Story: Part 3 - web.py Skeleton</a></li>
</ol>]]></content:encoded>
    </item>
    <item>
      <title>Introduction to Algorithms: Patricia Trie</title>
      <link>http://faruk.akgul.org/blog/introduction-to-algorithms-patricia-trie</link>
      <pubDate>Wed, 29 Feb 2012 00:00:00 PST</pubDate>
      <guid>http://faruk.akgul.org/blog/introduction-to-algorithms-patricia-trie</guid>
      <description>Introduction to Algorithms: Patricia Trie</description>
      <content:encoded><![CDATA[<p>According to NIST[1] definition Patricia trie is; <em>a compact representation of a trie in which any node that's an only child is merged with its parent.</em></p>
<p>Patricia Trie was discovered by Donald Morrison in 1968. It takes <code>log N</code> bit for comparison of each search even though insert/delete operations run on <code>O(n)</code>.</p>
<p>Nodes in Patricia Trie contain a field which tracks the bit positions. We use a dummy node and initialize its bit to -1. Root of the trie is <code>root.left</code>.</p>
<div class="pygments_emacs"><pre><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="n">K</span><span class="p">,</span> <span class="k">typename</span> <span class="n">T</span><span class="o">&gt;</span>
<span class="k">class</span> <span class="nc">INode</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
  <span class="k">virtual</span> <span class="o">~</span><span class="n">INode</span><span class="p">()</span> <span class="p">{}</span>
  <span class="k">virtual</span> <span class="kt">unsigned</span> <span class="n">hash</span><span class="p">(</span><span class="n">K</span> <span class="n">key</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="n">bit</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="n">T</span><span class="o">&gt;</span>
<span class="k">class</span> <span class="nc">StringKeyNode</span><span class="o">:</span> <span class="k">public</span> <span class="n">INode</span><span class="o">&lt;</span><span class="n">string</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>

<span class="k">private</span><span class="o">:</span>
  <span class="k">static</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">SIZE</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">string</span><span class="p">)</span> <span class="o">&lt;&lt;</span> <span class="mi">2</span><span class="p">;</span>

<span class="k">public</span><span class="o">:</span>

  <span class="n">string</span> <span class="n">key</span><span class="p">;</span>
  <span class="n">T</span> <span class="n">value</span><span class="p">;</span>
  <span class="kt">int</span> <span class="n">bit</span><span class="p">;</span>
  <span class="n">StringKeyNode</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;*</span> <span class="n">left</span><span class="p">;</span>
  <span class="n">StringKeyNode</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;*</span> <span class="n">right</span><span class="p">;</span>

  <span class="n">StringKeyNode</span><span class="p">(</span><span class="n">string</span> <span class="n">key</span><span class="p">,</span> <span class="n">T</span> <span class="n">value</span><span class="p">,</span> <span class="kt">int</span> <span class="n">bit</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">this</span><span class="o">-&gt;</span><span class="n">key</span> <span class="o">=</span> <span class="n">key</span><span class="p">;</span>
    <span class="k">this</span><span class="o">-&gt;</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span><span class="p">;</span>
    <span class="k">this</span><span class="o">-&gt;</span><span class="n">bit</span> <span class="o">=</span> <span class="n">bit</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="o">~</span><span class="n">StringKeyNode</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">this</span><span class="o">-&gt;</span><span class="n">left</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="k">this</span><span class="o">-&gt;</span><span class="n">right</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="k">this</span><span class="o">-&gt;</span><span class="n">bit</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
    <span class="k">this</span><span class="o">-&gt;</span><span class="n">value</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="k">virtual</span> <span class="kt">unsigned</span> <span class="n">hash</span><span class="p">(</span><span class="n">string</span> <span class="n">key</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="n">i</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span><span class="p">(</span><span class="n">key</span><span class="p">.</span><span class="n">empty</span><span class="p">())</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="kt">unsigned</span> <span class="n">index</span> <span class="o">=</span> <span class="k">static_cast</span><span class="o">&lt;</span><span class="kt">unsigned</span><span class="o">&gt;</span><span class="p">(</span><span class="n">i</span> <span class="o">/</span> <span class="n">SIZE</span><span class="p">);</span>
    <span class="k">if</span><span class="p">(</span><span class="n">index</span> <span class="o">&gt;=</span> <span class="n">key</span><span class="p">.</span><span class="n">length</span><span class="p">())</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="kt">unsigned</span> <span class="n">val</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o">&lt;&lt;</span> <span class="n">SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">&gt;&gt;</span> <span class="p">(</span><span class="n">i</span> <span class="o">&amp;</span> <span class="n">SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">key</span><span class="p">.</span><span class="n">at</span><span class="p">(</span><span class="n">index</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">val</span><span class="p">;</span>
  <span class="p">}</span>
<span class="p">};</span>
</pre></div>

<h2>Insertion</h2>
<p>Inserting a new node takes <code>O(n)</code> time.</p>
<p>For example;</p>
<pre>
    -- NULL : initialization
    -- insert:  ababbb -> key
                123456 -> positions
    -- insert:  ababba:
                search: ababb**b** == ababba
                diff. at position 6;

         [6]
        /   \
       a     b
      /       \
    ababba   ababbb
</pre>

<p>Inserting a new node begins with searching. <code>search</code> method returns a unique key. Once we detect the leftmost position of the key we're inserting and the search key, we call the <code>insert_inner</code> method which recursively walks the tree and inserts the node where <code>node</code> is pointing at.</p>
<div class="pygments_emacs"><pre><span class="n">StringKeyNode</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;*</span> <span class="n">insert_inner</span><span class="p">(</span><span class="n">StringKeyNode</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;*</span> <span class="n">node</span><span class="p">,</span>
                                  <span class="n">K</span> <span class="n">key</span><span class="p">,</span>
                                  <span class="n">T</span> <span class="n">value</span><span class="p">,</span>
                                  <span class="kt">int</span> <span class="n">i</span><span class="p">,</span>
                                  <span class="n">StringKeyNode</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;*</span> <span class="n">root</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">bit</span> <span class="o">&gt;=</span> <span class="n">i</span> <span class="o">||</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">bit</span> <span class="o">&lt;=</span> <span class="n">root</span><span class="o">-&gt;</span><span class="n">bit</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">StringKeyNode</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;*</span> <span class="n">temp</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StringKeyNode</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
      <span class="k">const</span> <span class="kt">bool</span> <span class="n">FLAG</span> <span class="o">=</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">hash</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">temp</span><span class="o">-&gt;</span><span class="n">bit</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">;</span>
      <span class="n">temp</span><span class="o">-&gt;</span><span class="n">left</span> <span class="o">=</span> <span class="n">FLAG</span> <span class="o">?</span> <span class="n">node</span> <span class="o">:</span> <span class="n">temp</span><span class="p">;</span>
      <span class="n">temp</span><span class="o">-&gt;</span><span class="n">right</span> <span class="o">=</span> <span class="n">FLAG</span> <span class="o">?</span> <span class="n">temp</span> <span class="o">:</span> <span class="n">node</span><span class="p">;</span>
      <span class="k">return</span> <span class="n">temp</span><span class="p">;</span>
      <span class="k">delete</span> <span class="n">temp</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">if</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">hash</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">bit</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">node</span><span class="o">-&gt;</span><span class="n">left</span> <span class="o">=</span> <span class="n">cilk_spawn</span> <span class="n">insert_inner</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">node</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="n">node</span><span class="o">-&gt;</span><span class="n">right</span> <span class="o">=</span> <span class="n">insert_inner</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">right</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">node</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">cilk_sync</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">node</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="n">insert</span><span class="p">(</span><span class="n">string</span> <span class="n">key</span><span class="p">,</span> <span class="n">T</span> <span class="n">value</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">StringKeyNode</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;*</span> <span class="n">node</span><span class="p">;</span>
    <span class="n">node</span> <span class="o">=</span> <span class="n">cilk_spawn</span> <span class="n">search_inner</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">root</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="n">cilk_sync</span><span class="p">;</span>
    <span class="k">if</span><span class="p">(</span><span class="n">node</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">string</span> <span class="n">temp</span> <span class="o">=</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">;</span>
      <span class="k">if</span><span class="p">(</span><span class="n">temp</span><span class="p">.</span><span class="n">compare</span><span class="p">(</span><span class="n">key</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="k">while</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">hash</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span> <span class="o">==</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">hash</span><span class="p">(</span><span class="n">temp</span><span class="p">,</span> <span class="n">i</span><span class="p">))</span> <span class="o">++</span><span class="n">i</span><span class="p">;</span>
        <span class="k">this</span><span class="o">-&gt;</span><span class="n">root</span><span class="o">-&gt;</span><span class="n">left</span> <span class="o">=</span> <span class="n">cilk_spawn</span> <span class="n">insert_inner</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">root</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span>
                                                    <span class="n">key</span><span class="p">,</span>
                                                    <span class="n">value</span><span class="p">,</span>
                                                    <span class="n">i</span><span class="p">,</span>
                                                    <span class="k">this</span><span class="o">-&gt;</span><span class="n">root</span><span class="p">);</span>
        <span class="n">cilk_sync</span><span class="p">;</span>
      <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="n">node</span><span class="o">-&gt;</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span><span class="p">;</span>
      <span class="p">}</span>
    <span class="p">}</span>
  <span class="p">}</span>
</pre></div>

<h2>Searching</h2>
<p>Important part of searching is the <code>search_inner</code> method which walks the trie recursively and finds a unique key which can have the record of <code>node</code>'s key by using the bits of the trie.</p>
<div class="pygments_emacs"><pre><span class="n">StringKeyNode</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;*</span> <span class="n">search_inner</span><span class="p">(</span><span class="n">StringKeyNode</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;*</span> <span class="n">node</span><span class="p">,</span> <span class="n">K</span> <span class="n">key</span><span class="p">,</span> <span class="kt">int</span> <span class="n">i</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">bit</span> <span class="o">&lt;=</span> <span class="n">i</span><span class="p">)</span> <span class="k">return</span> <span class="n">node</span><span class="p">;</span>
    <span class="k">if</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">hash</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">bit</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
      <span class="k">return</span> <span class="n">search_inner</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">bit</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="k">return</span> <span class="n">search_inner</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">right</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">bit</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="n">T</span> <span class="n">search</span><span class="p">(</span><span class="n">string</span> <span class="n">key</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">StringKeyNode</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;*</span> <span class="n">node</span><span class="p">;</span>
    <span class="n">node</span> <span class="o">=</span> <span class="n">cilk_spawn</span> <span class="n">search_inner</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">root</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="n">cilk_sync</span><span class="p">;</span>
    <span class="k">if</span><span class="p">(</span><span class="n">node</span> <span class="o">==</span> <span class="nb">NULL</span> <span class="o">||</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">key</span> <span class="o">!=</span> <span class="n">key</span><span class="p">)</span> <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>

<h2>Inorder</h2>
<p>Putting the nodes in order is no different than an ordinary tree. The only difference is we check the bits of the nodes to detect if node's bit is smaller than parent's bit.</p>
<div class="pygments_emacs"><pre><span class="n">string</span> <span class="n">inorder_inner</span><span class="p">(</span><span class="n">StringKeyNode</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;*</span> <span class="n">node</span><span class="p">,</span> <span class="kt">int</span> <span class="n">i</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span><span class="p">(</span><span class="n">node</span> <span class="o">==</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">root</span><span class="p">)</span> <span class="k">return</span> <span class="s">&quot;&quot;</span><span class="p">;</span>
    <span class="k">if</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">bit</span> <span class="o">&lt;=</span> <span class="n">i</span><span class="p">)</span> <span class="k">return</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">inorder_inner</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">bit</span><span class="p">)</span> <span class="o">+</span> <span class="n">inorder_inner</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">right</span><span class="p">,</span>
                                                                  <span class="n">node</span><span class="o">-&gt;</span><span class="n">bit</span><span class="p">);</span>
<span class="p">}</span>

<span class="n">string</span> <span class="n">inorder</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">inorder_inner</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">root</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>

<h2>Testing</h2>
<p>Here's a small code to insert some values to test the trie.</p>
<div class="pygments_emacs"><pre><span class="n">vector</span><span class="o">&lt;</span><span class="n">string</span><span class="o">&gt;</span> <span class="n">generate_names</span><span class="p">(</span><span class="n">string</span> <span class="n">s</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// generating some strings to put into trie.</span>
  <span class="n">vector</span><span class="o">&lt;</span><span class="kt">char</span><span class="o">&gt;</span><span class="n">v</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">begin</span><span class="p">(),</span> <span class="n">s</span><span class="p">.</span><span class="n">end</span><span class="p">());</span>
  <span class="n">vector</span><span class="o">&lt;</span><span class="n">string</span><span class="o">&gt;</span> <span class="n">e</span><span class="p">;</span>
  <span class="n">e</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">s</span><span class="p">);</span>
  <span class="k">while</span><span class="p">(</span><span class="n">next_permutation</span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">begin</span><span class="p">(),</span> <span class="n">v</span><span class="p">.</span><span class="n">end</span><span class="p">()))</span> <span class="p">{</span>
    <span class="n">stringstream</span> <span class="n">ss</span><span class="p">;</span>
    <span class="k">for</span><span class="p">(</span><span class="kt">unsigned</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">v</span><span class="p">.</span><span class="n">size</span><span class="p">();</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="n">ss</span> <span class="o">&lt;&lt;</span> <span class="n">v</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
    <span class="n">e</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">ss</span><span class="p">.</span><span class="n">str</span><span class="p">());</span>
  <span class="p">}</span>
  <span class="k">return</span> <span class="n">e</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="n">factorial</span><span class="p">(</span><span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
  <span class="kt">int</span> <span class="n">res</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
  <span class="k">for</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;=</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="n">res</span> <span class="o">*=</span> <span class="n">i</span><span class="p">;</span>
  <span class="k">return</span> <span class="n">res</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>

<h3>Benchmarking</h3>
<p>To see how our trie performs in various of scenarios and different workers, we could use <code>cilkview</code> for analyzing.</p>
<div class="pygments_emacs"><pre><span class="kt">int</span> <span class="n">cilk_main</span><span class="p">()</span> <span class="p">{</span>

  <span class="n">cilk</span><span class="o">::</span><span class="n">cilkview</span> <span class="n">cv</span><span class="p">;</span>

  <span class="n">string</span> <span class="n">generator</span> <span class="o">=</span> <span class="s">&quot;abc&quot;</span><span class="p">;</span>
  <span class="k">const</span> <span class="kt">int</span> <span class="n">LENGTH</span> <span class="o">=</span> <span class="n">factorial</span><span class="p">(</span><span class="n">generator</span><span class="p">.</span><span class="n">length</span><span class="p">());</span>
  <span class="n">vector</span><span class="o">&lt;</span><span class="n">string</span><span class="o">&gt;</span> <span class="n">names</span> <span class="o">=</span> <span class="n">generate_names</span><span class="p">(</span><span class="n">generator</span><span class="p">);</span>

  <span class="n">cv</span><span class="p">.</span><span class="n">start</span><span class="p">();</span>

  <span class="k">for</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">LENGTH</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="n">tree</span><span class="o">-&gt;</span><span class="n">insert</span><span class="p">(</span><span class="n">names</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">SomeValue</span><span class="p">(</span><span class="n">names</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">names</span><span class="p">[</span><span class="n">i</span><span class="p">]));</span>

  <span class="n">cv</span><span class="p">.</span><span class="n">stop</span><span class="p">();</span>

  <span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">&quot;tree filled in &quot;</span> <span class="o">&lt;&lt;</span> <span class="n">cv</span><span class="p">.</span><span class="n">accumulated_milliseconds</span><span class="p">()</span> <span class="o">/</span> <span class="mf">1000.f</span> <span class="o">&lt;&lt;</span> <span class="s">&quot; seconds.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">;</span>
  <span class="n">cv</span><span class="p">.</span><span class="n">reset</span><span class="p">();</span>

<span class="p">}</span>
</pre></div>

<h2>Fun Stuff</h2>
<p>Since we've constructed our trie and it works like a champ so now we could play with it. Let's rotate the trie (for absolutely no particular reason)</p>
<div class="pygments_emacs"><pre><span class="kt">void</span> <span class="n">rotate_inner</span><span class="p">(</span><span class="n">StringKeyNode</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;*</span> <span class="n">node</span><span class="p">,</span> <span class="kt">int</span> <span class="n">bit</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span><span class="p">(</span><span class="n">node</span> <span class="o">!=</span> <span class="nb">NULL</span> <span class="o">&amp;&amp;</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">bit</span> <span class="o">&gt;</span> <span class="n">bit</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">cilk_spawn</span> <span class="n">rotate_inner</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">bit</span><span class="p">);</span>
      <span class="n">rotate_inner</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">right</span><span class="p">,</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">bit</span><span class="p">);</span>
      <span class="n">cilk_sync</span><span class="p">;</span>
      <span class="n">swap</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">right</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="n">rotate</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">cilk_spawn</span> <span class="n">rotate_inner</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">root</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="n">cilk_sync</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>

<h3>Source Code</h3>
<p>You could get the source code on <a href="https://github.com/faruken/cilk-patricia">GitHub</a>. Feel free to fork and improve or fix it. Released under "do whatever you want to do with it" license.</p>
<h3>References:</h3>
<ol>
<li><a href="http://xlinux.nist.gov/dads//HTML/patriciatree.html">NIST</a></li>
</ol>]]></content:encoded>
    </item>
  </channel>
</rss>
