<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">

<channel>
	<title>Honest to a Segfault » Python</title>
	
	<link>http://blog.cdleary.com</link>
	<description>__author__ = 'Chris Leary'</description>
	<lastBuildDate>Sun, 05 Sep 2010 20:41:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/VaporWarning-Python" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="vaporwarning-python" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>A prototypal binding trap</title>
		<link>http://blog.cdleary.com/2010/09/a-prototypal-binding-trap/</link>
		<comments>http://blog.cdleary.com/2010/09/a-prototypal-binding-trap/#comments</comments>
		<pubDate>Sun, 05 Sep 2010 20:29:30 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Languages]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Prototypes]]></category>
		<category><![CDATA[Trap]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=905</guid>
		<description><![CDATA[It always pains me to explain these little identifier resolution traps: #!/usr/bin/env python3 &#160; class Egg: &#160; _next_id = 1 &#160; def __init__&#40;self&#41;: self.id = self._next_id self._next_id += 1 assert Egg._next_id is self._next_id &#160; &#160; if __name__ == '__main__': f = Egg&#40;&#41; Fails the assertion. It&#8217;s decomposing the assignment-update into its constituent tmp = self._next_id [...]]]></description>
			<content:encoded><![CDATA[<p>It always pains me to explain these little identifier resolution traps:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/env python3</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> Egg:
&nbsp;
    _next_id = <span style="color: #ff4500;">1</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: #008000;">id</span> = <span style="color: #008000;">self</span>._next_id
        <span style="color: #008000;">self</span>._next_id += <span style="color: #ff4500;">1</span>
        <span style="color: #ff7700;font-weight:bold;">assert</span> Egg._next_id <span style="color: #ff7700;font-weight:bold;">is</span> <span style="color: #008000;">self</span>._next_id
&nbsp;
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>:
    f = Egg<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Fails the assertion. It&#8217;s decomposing the assignment-update into its constituent <tt class="docutils literal"><span class="pre">tmp</span> <span class="pre">=</span> <span class="pre">self._next_id</span> <span class="pre">+</span> <span class="pre">1;</span> <span class="pre">self._next_id</span> <span class="pre">=</span> <span class="pre">tmp</span></tt> components, but a programmer could reasonably expect a Lookup/Update hash map ADT operation to occur instead &#8212; get the slot by lookup, mutate the value found, and store back in an atomic sense, clobbering the class member with the updated value &#8212; but that&#8217;s <strong>not</strong> how it works.</p>
<p>This goes with the prototypal territory:</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> Egg<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066; font-weight: bold;">this</span>.<span style="color: #660066;">id</span> <span style="color: #339933;">=</span> <span style="color: #000066; font-weight: bold;">this</span>.<span style="color: #660066;">next_id</span><span style="color: #339933;">;</span>
    <span style="color: #000066; font-weight: bold;">this</span>.<span style="color: #660066;">next_id</span> <span style="color: #339933;">+=</span> <span style="color: #CC0000;">1</span><span style="color: #339933;">;</span>
    assertEq<span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">this</span>.<span style="color: #660066;">next_id</span><span style="color: #339933;">,</span> Egg.<span style="color: #660066;">prototype</span>.<span style="color: #660066;">next_id</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
Egg.<span style="color: #660066;">prototype</span>.<span style="color: #660066;">next_id</span> <span style="color: #339933;">=</span> <span style="color: #CC0000;">1</span><span style="color: #339933;">;</span>
<span style="color: #003366; font-weight: bold;">var</span> e <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">new</span> Egg<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #006600; font-style: italic;">// Error: Assertion failed: got 2, expected 1</span></pre></div></div>

<p>Fortunately, note that this behavior definitely has the semantics you want when you add inheritance to the mix. Is the <tt class="docutils literal"><span class="pre">self.viscosity</span></tt> that this class implementation is referring to a class member or an instance member in the base class?</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/env python3</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> sauce
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> AwesomeSauce<span style="color: black;">&#40;</span>sauce.<span style="color: black;">Sauce</span><span style="color: black;">&#41;</span>:
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">super</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">viscosity</span> += <span style="color: #ff4500;">12</span> <span style="color: #808080; font-style: italic;"># Deca-viscoses per milliliter.</span></pre></div></div>

<p>The answer is that you don&#8217;t care &#8212; it&#8217;s being rebound on the <tt class="docutils literal"><span class="pre">AwesomeSauce</span></tt> instance no matter what.</p>
<p>Moral of the story is to be mindful when using update operations on class members &#8212; if you want to rebind a class member within an instance method, you&#8217;ve got to use the class name instead of <tt>self</tt>.</p>
<img src="http://feeds.feedburner.com/~r/VaporWarning-Python/~4/QfHA2bgNl5I" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/09/a-prototypal-binding-trap/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Efficiency of list comprehensions</title>
		<link>http://blog.cdleary.com/2010/04/efficiency-of-list-comprehensions/</link>
		<comments>http://blog.cdleary.com/2010/04/efficiency-of-list-comprehensions/#comments</comments>
		<pubDate>Fri, 30 Apr 2010 19:31:08 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Efficiency]]></category>
		<category><![CDATA[List Comprehensions]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=782</guid>
		<description><![CDATA[I&#8217;m psyched about the awesome comments on my previous entry, Python by example: list comprehensions. Originally this entry was just a response to those comments, but people who stumbled across this entry on the interwebz found the response format too confusing, so I&#8217;ve restructured it for posterity. Efficiency of the more common usage Let&#8217;s look [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m psyched about the awesome comments on my previous entry, <a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/">Python by example: list comprehensions</a>. Originally this entry was just a response to those comments, but people who stumbled across this entry on the interwebz found the response format too confusing, so I&#8217;ve restructured it for posterity.</p>
<div class="section" id="efficiency-of-the-more-common-usage">
<h3>Efficiency of the more common usage</h3>
<p>Let&#8217;s look at the efficiency of list comprehensions in the more common usage, where the comprehension&#8217;s list result is actually relevant (or, in compiler-speak, live-out).</p>
<p>Using the following program, you can see the time spent in each implementation and the corresponding bytecode sequence:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">dis</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">inspect</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">timeit</span>
&nbsp;
&nbsp;
programs = <span style="color: #008000;">dict</span><span style="color: black;">&#40;</span>
    loop=<span style="color: #483d8b;">&quot;&quot;&quot;
result = []
for i in range(20):
    result.append(i * 2)
&quot;&quot;&quot;</span>,
   loop_faster=<span style="color: #483d8b;">&quot;&quot;&quot;
result = []
add = result.append
for i in range(20):
    add(i * 2)
&quot;&quot;&quot;</span>,
    comprehension=<span style="color: #483d8b;">'result = [i * 2 for i in range(20)]'</span>,
<span style="color: black;">&#41;</span>
&nbsp;
&nbsp;
<span style="color: #ff7700;font-weight:bold;">for</span> name, text <span style="color: #ff7700;font-weight:bold;">in</span> programs.<span style="color: black;">iteritems</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> name, <span style="color: #dc143c;">timeit</span>.<span style="color: black;">Timer</span><span style="color: black;">&#40;</span>stmt=text<span style="color: black;">&#41;</span>.<span style="color: #dc143c;">timeit</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">code</span> = <span style="color: #008000;">compile</span><span style="color: black;">&#40;</span>text, <span style="color: #483d8b;">'&lt;string&gt;'</span>, <span style="color: #483d8b;">'exec'</span><span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">dis</span>.<span style="color: black;">disassemble</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">code</span><span style="color: black;">&#41;</span></pre></div></div>


<div class="wp_syntax"><div class="code"><pre class="python-disasm" style="font-family:monospace;">loop 11.1495118141
  2           0 BUILD_LIST               0
              3 STORE_NAME               0 (result)
&nbsp;
  3           6 SETUP_LOOP              37 (to 46)
              9 LOAD_NAME                1 (range)
             12 LOAD_CONST               0 (20)
             15 CALL_FUNCTION            1
             18 GET_ITER
        &gt;&gt;   19 FOR_ITER                23 (to 45)
             22 STORE_NAME               2 (i)
&nbsp;
  4          25 LOAD_NAME                0 (result)
             28 LOAD_ATTR                3 (append)
             31 LOAD_NAME                2 (i)
             34 LOAD_CONST               1 (2)
             37 BINARY_MULTIPLY
             38 CALL_FUNCTION            1
             41 POP_TOP
             42 JUMP_ABSOLUTE           19
        &gt;&gt;   45 POP_BLOCK
        &gt;&gt;   46 LOAD_CONST               2 (None)
             49 RETURN_VALUE
loop_faster 8.36096310616
  2           0 BUILD_LIST               0
              3 STORE_NAME               0 (result)
&nbsp;
  3           6 LOAD_NAME                0 (result)
              9 LOAD_ATTR                1 (append)
             12 STORE_NAME               2 (add)
&nbsp;
  4          15 SETUP_LOOP              34 (to 52)
             18 LOAD_NAME                3 (range)
             21 LOAD_CONST               0 (20)
             24 CALL_FUNCTION            1
             27 GET_ITER
        &gt;&gt;   28 FOR_ITER                20 (to 51)
             31 STORE_NAME               4 (i)
&nbsp;
  5          34 LOAD_NAME                2 (add)
             37 LOAD_NAME                4 (i)
             40 LOAD_CONST               1 (2)
             43 BINARY_MULTIPLY
             44 CALL_FUNCTION            1
             47 POP_TOP
             48 JUMP_ABSOLUTE           28
        &gt;&gt;   51 POP_BLOCK
        &gt;&gt;   52 LOAD_CONST               2 (None)
             55 RETURN_VALUE
comprehension 7.08145213127
  1           0 BUILD_LIST               0
              3 DUP_TOP
              4 STORE_NAME               0 (_[1])
              7 LOAD_NAME                1 (range)
             10 LOAD_CONST               0 (20)
             13 CALL_FUNCTION            1
             16 GET_ITER
        &gt;&gt;   17 FOR_ITER                17 (to 37)
             20 STORE_NAME               2 (i)
             23 LOAD_NAME                0 (_[1])
             26 LOAD_NAME                2 (i)
             29 LOAD_CONST               1 (2)
             32 BINARY_MULTIPLY
             33 LIST_APPEND
             34 JUMP_ABSOLUTE           17
        &gt;&gt;   37 DELETE_NAME              0 (_[1])
             40 STORE_NAME               3 (result)
             43 LOAD_CONST               2 (None)
             46 RETURN_VALUE</pre></div></div>

<p>List comprehensions perform better here because you don’t need to load the append attribute off of the list (loop program, bytecode 28) and call it as a function (loop program, bytecode 38). Instead, in a comprehension, a specialized <tt class="docutils literal"><span class="pre">LIST_APPEND</span></tt> bytecode is generated for a fast append onto the result list (comprehension program, bytecode 33).</p>
<p>In the <tt class="docutils literal"><span class="pre">loop_faster</span></tt> program, you avoid the overhead of the <tt class="docutils literal"><span class="pre">append</span></tt> attribute lookup by hoisting it out of the loop and placing the result in a fastlocal (bytecode 9-12), so it loops more quickly; however, the comprehension uses a specialized <tt class="docutils literal"><span class="pre">LIST_APPEND</span></tt> bytecode instead of incurring the overhead of a function call, so it still trumps.</p>
</div>
<div class="section" id="using-list-comprehensions-for-side-effects">
<h3>Using list comprehensions for side effects</h3>
<p>I want to address a point that was brought up in the <a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comments">previous entry</a> as to the efficiency of for loops versus list comprehensions when used purely for side effects, but I&#8217;ll discuss the subjective bit first, since that&#8217;s the least sciency part.</p>
<div class="section" id="readability">
<h4>Readability</h4>
<blockquote>
<p>Simple test – if you did need the result would the comprehension be easily understood? If the answer is yes then removing the assignment on the left hand side doesn’t magically make it less readable…</p>
<p class="attribution">&mdash;<a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comment-1539">Michael Foord</a></p>
</blockquote>
<p>First of all, thanks to Michael for his excellent and thought provoking comment!</p>
<p>My response is that removing the use of the result does indeed make it less readable, precisely <em>because</em> you&#8217;re using a result-producing control flow construct where the result is not needed. I suppose I&#8217;m positing that it&#8217;s inherently confusing to do that with your syntax: there&#8217;s a looping form that doesn&#8217;t produce a result, so that should be used instead. It&#8217;s expressing your semantic intention via syntax.</p>
<p>For advanced Pythonistas it&#8217;s easy for figure out what&#8217;s going on at a glance, but comprehension-as-loop definitely has a &quot;there&#8217;s more than one way to do it&quot; smell about it, which also makes it less amenable to people learning the language.</p>
<p>With a viable comprehension-as-loop option, every time a user goes to write a loop that doesn&#8217;t require a result they now ask themselves, &quot;Can I fit this into the list comprehension form?&quot; Those mental branches are, to me, what &quot;one way to do it&quot; is designed to avoid. When I read Perl code, I take &quot;mental exceptions&quot; all the time because the author didn&#8217;t use the construct that I would have used in the same situation. Minimizing that is a good thing, so I maintain that &quot;no result needed&quot; should automatically imply a loop construct.</p>
</div>
<div class="section" id="efficiency">
<h4>Efficiency</h4>
<p>Consider two functions, <tt class="docutils literal"><span class="pre">comprehension</span></tt> and <tt class="docutils literal"><span class="pre">loop</span></tt>:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> loop<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    accum = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">20</span><span style="color: black;">&#41;</span>:
        accum.<span style="color: black;">append</span><span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> accum
&nbsp;
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> comprehension<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    accum = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: black;">&#91;</span>accum.<span style="color: black;">append</span><span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">20</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> accum</pre></div></div>

<p>N.B. This example is comparing the efficiency of a list comprehension <strong>where the result of the comprehension is ignored</strong> to a for loop that produces no result, as is discussed in the referenced entry, <a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/">Python by example: list comprehensions</a>.</p>
<p><a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comment-1539">Michael Foord</a> comments:</p>
<blockquote><p>
Your alternative for the single line, easily readable, list comprehension is four lines that are less efficient because the loop happens in the interpreter rather than in C.</p></blockquote>
<p>However, the disassembly, obtained via <a class="reference external" href="http://docs.python.org/library/dis.html#dis.dis">dis.dis(func)</a> looks like the following for the loop:</p>

<div class="wp_syntax"><div class="code"><pre class="python-disasm" style="font-family:monospace;">2           0 BUILD_LIST               0
            3 STORE_FAST               0 (accum)
&nbsp;
3           6 SETUP_LOOP              33 (to 42)
            9 LOAD_GLOBAL              0 (range)
           12 LOAD_CONST               1 (20)
           15 CALL_FUNCTION            1
           18 GET_ITER
      &gt;&gt;   19 FOR_ITER                19 (to 41)
           22 STORE_FAST               1 (i)
&nbsp;
4          25 LOAD_FAST                0 (accum)
           28 LOAD_ATTR                1 (append)
           31 LOAD_FAST                1 (i)
           34 CALL_FUNCTION            1
           37 POP_TOP
           38 JUMP_ABSOLUTE           19
      &gt;&gt;   41 POP_BLOCK
&nbsp;
5     &gt;&gt;   42 LOAD_FAST                0 (accum)
           45 RETURN_VALUE</pre></div></div>

<p>And it looks like the following for the comprehension:</p>

<div class="wp_syntax"><div class="code"><pre class="python-disasm" style="font-family:monospace;">2           0 BUILD_LIST               0
            3 STORE_FAST               0 (accum)
&nbsp;
3           6 BUILD_LIST               0
            9 DUP_TOP
           10 STORE_FAST               1 (_[1])
           13 LOAD_GLOBAL              0 (range)
           16 LOAD_CONST               1 (20)
           19 CALL_FUNCTION            1
           22 GET_ITER
      &gt;&gt;   23 FOR_ITER                22 (to 48)
           26 STORE_FAST               2 (i)
           29 LOAD_FAST                1 (_[1])
           32 LOAD_FAST                0 (accum)
           35 LOAD_ATTR                1 (append)
           38 LOAD_FAST                2 (i)
           41 CALL_FUNCTION            1
           44 LIST_APPEND
           45 JUMP_ABSOLUTE           23
      &gt;&gt;   48 DELETE_FAST              1 (_[1])
           51 POP_TOP
&nbsp;
4          52 LOAD_FAST                0 (accum)
           55 RETURN_VALUE</pre></div></div>

<p>By looking at the bytecode instructions, we see that the list comprehension is, at a language level, actually just &quot;syntactic sugar&quot; for the <tt class="docutils literal"><span class="pre">for</span></tt> loop, as mentioned by <a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comment-1540">nes</a> &#8212; they both lower down into the same control flow construct at a virtual machine level, at least in CPython.</p>
<p>The primary difference between the two disassemblies is that a superfluous list comprehension result is stored into fastlocal 1, which is loaded (bytecode 29) and appended to (bytecode 44) each iteration, creating some additional overhead &#8212; it&#8217;s simply deleted in bytecode 48. Unless the <tt class="docutils literal"><span class="pre">POP_BLOCK</span></tt> operation (bytecode 41) of the loop disassembly is very expensive (I haven&#8217;t looked into its implementation), the comprehension disassembly is guaranteed to be less efficient.</p>
<p>Because of this, I believe that Michael was mistaken in referring to an overhead that results from use of a <tt class="docutils literal"><span class="pre">for</span></tt> loop versus a list comprehension for CPython. It would be interesting to perform a survey of the list comprehension optimization techniques used in various Python implementations, but optimization seems difficult outside of something like a special <a class="reference external" href="http://www.cython.org/">Cython</a> construct, because <tt class="docutils literal"><span class="pre">LOAD_GLOBAL</span> <span class="pre">range</span></tt> could potentially be changed from the builtin range function. Various issues of this kind are discussed in the (very interesting) paper <a class="reference external" href="http://portal.acm.org/citation.cfm?id=1534530.1534550">The effect of unrolling and inlining for Python bytecode optimizations</a>.</p>
</div>
</div>
<img src="http://feeds.feedburner.com/~r/VaporWarning-Python/~4/eaAep1EWpOY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/04/efficiency-of-list-comprehensions/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Learning Python by example: list comprehensions</title>
		<link>http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/</link>
		<comments>http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comments</comments>
		<pubDate>Thu, 29 Apr 2010 16:00:59 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Introductory]]></category>
		<category><![CDATA[List Comprehensions]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=762</guid>
		<description><![CDATA[My friend, who is starting to learn Python 2.x, asked me what this snippet did: def collapse&#40;seq&#41;: # Preserve order. uniq = &#91;&#93; &#91;uniq.append&#40;item&#41; for item in seq if not uniq.count&#40;item&#41;&#93; return uniq This is not a snippet that should be emulated (i.e. it&#8217;s bad); however, it makes me happy: there are so many things [...]]]></description>
			<content:encoded><![CDATA[<p>My friend, who is starting to learn Python 2.x, asked me what this snippet did:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> collapse<span style="color: black;">&#40;</span>seq<span style="color: black;">&#41;</span>:
    <span style="color: #808080; font-style: italic;"># Preserve order.</span>
    uniq = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: black;">&#91;</span>uniq.<span style="color: black;">append</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> item <span style="color: #ff7700;font-weight:bold;">in</span> seq <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> uniq.<span style="color: black;">count</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> uniq</pre></div></div>

<p>This is not a snippet that should be emulated (i.e. <em>it&#8217;s bad</em>); however, it makes me happy: there are so many things that can be informatively corrected!</p>
<div class="section" id="what-is-a-list-comprehension">
<h3>What is a list comprehension?</h3>
<p>A list comprehension is a special brackety syntax to perform a <em>transform</em> operation with an optional <em>filter</em> clause that always produces a new sequence (list) object as a <em>result</em>. To break it down visually, you perform:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">new_range = <span style="color: black;">&#91;</span>i <span style="color: #66cc66;">*</span> i          <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span>   <span style="color: #ff7700;font-weight:bold;">if</span> i <span style="color: #66cc66;">%</span> <span style="color: #ff4500;">2</span> == <span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span></pre></div></div>

<p>Which corresponds to:</p>

<div class="wp_syntax"><div class="code"><pre class="pseudocode" style="font-family:monospace;">*result*  = [*transform*    *iteration*         *filter*     ]</pre></div></div>

<p>The <em>filter</em> piece answers the question, &quot;should this item be transformed?&quot; If the answer is yes, then the <em>transform</em> piece is evaluated and becomes an element in the <em>result</em>. The <em>iteration</em> <a class="footnote-reference" href="#id2" id="id1"><tt>[*]</tt></a> order is preserved in the <em>result</em>.</p>
<p>Go ahead and figure out what you expect <tt class="docutils literal"><span class="pre">new_range</span></tt> to be in the prior example. You can double check me in the Python shell, but I think it comes out to be:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> new_range = <span style="color: black;">&#91;</span>i <span style="color: #66cc66;">*</span> i <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">if</span> i <span style="color: #66cc66;">%</span> <span style="color: #ff4500;">2</span> == <span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">print</span> new_range
<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">16</span><span style="color: black;">&#93;</span></pre></div></div>

<p>If it still isn&#8217;t clicking, we can try to make the example less noisy by getting rid of the transform and filter &#8212; can you tell what this will produce?</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> new_range = <span style="color: black;">&#91;</span>i <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span></pre></div></div>

</div>
<div class="section" id="so-what-s-wrong-with-that-first-snippet">
<h3>So what&#8217;s wrong with that first snippet?</h3>
<p>As we observed in the previous section, a list comprehension always produces a <em>result</em> list, where the elements of the result list are the <em>transformed</em> elements of the <em>iteration</em>. That means, if there&#8217;s no <em>filter</em> piece, there are exactly as many <em>result</em> elements as there were <em>iteration</em> elements.</p>
<p>Weird thing number one about the snippet &#8212; the list comprehension <em>result</em> is unused. It&#8217;s created, mind you &#8212; list comprehension always create a value, even if you don&#8217;t care what it is &#8212; but it just goes off to oblivion. (In technical terms, it becomes <em>garbage</em>.) When you don&#8217;t need the <em>result</em>, just use a <tt class="docutils literal"><span class="pre">for</span></tt> loop! This is better:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> colapse<span style="color: black;">&#40;</span>seq<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Preserve order.&quot;&quot;&quot;</span>
    uniq = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> item <span style="color: #ff7700;font-weight:bold;">in</span> seq:
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> uniq.<span style="color: black;">count</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span>:
            uniq.<span style="color: black;">append</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> uniq</pre></div></div>

<p>It&#8217;s two more lines, but it&#8217;s less weird looking and wasteful. &quot;Better for everybody who reads <strong>and</strong> runs your code,&quot; means <a class="reference external" href="http://steve-yegge.blogspot.com/2008/09/programmings-dirtiest-little-secret.html">you should do it</a>.</p>
<p>Moral of the story: a list comprehension isn&#8217;t just, &quot;shorthand for a loop.&quot; It&#8217;s shorthand for a transform from an input sequence to an output sequence with an optional filter. If it gets too complex or weird looking, just make a loop.  It&#8217;s not that hard and readers of your code will thank you.</p>
<p>Weird thing number two: the transform, <tt class="docutils literal"><span class="pre">list.append(item)</span></tt>, produces <tt class="docutils literal"><span class="pre">None</span></tt> as its output value, because the return value from <tt class="docutils literal"><span class="pre">list.append</span></tt> is always <tt class="docutils literal"><span class="pre">None</span></tt>. Therefore, the <em>result</em>, even though it isn&#8217;t kept anywhere, is a list of <tt class="docutils literal"><span class="pre">None</span></tt> values of the same length as <tt class="docutils literal"><span class="pre">seq</span></tt> (notice that there&#8217;s no filter clause).</p>
<p>Weird thing number three: <tt class="docutils literal"><span class="pre">list.count(item)</span></tt> iterates over every element in the <tt class="docutils literal"><span class="pre">list</span></tt> looking for things that <tt class="docutils literal"><span class="pre">==</span></tt> to <tt class="docutils literal"><span class="pre">item</span></tt>. If you think through the case where you call <tt class="docutils literal"><span class="pre">collapse</span></tt> on an entirely unique sequence, you can tell that the collapse algorithm is O(n<sup>2</sup>).  In fact, it&#8217;s even worse than it may seem at first glance, because <tt class="docutils literal"><span class="pre">count</span></tt> will keep going all the way to the end of <tt class="docutils literal"><span class="pre">uniq</span></tt>, even if it finds <tt class="docutils literal"><span class="pre">item</span></tt> in the first index of <tt class="docutils literal"><span class="pre">uniq</span></tt>. What the original author really wanted was <tt class="docutils literal"><span class="pre">item</span> <span class="pre">not</span> <span class="pre">in</span> <span class="pre">uniq</span></tt>, which bails out early if it finds <tt class="docutils literal"><span class="pre">item</span></tt> in <tt class="docutils literal"><span class="pre">uniq</span></tt>.</p>
<p>Also worth mentioning for the computer-sciency folk playing along at home: if all elements of the sequence are comparable, you can bring that down to O(n * log n) by using a &quot;shadow&quot; sorted sequence and <a class="reference external" href="http://docs.python.org/library/bisect.html#module-bisect">bisecting</a> to test for membership. If the sequence is hashable you can bring it down to O(n), perhaps by using the <a class="reference external" href="http://docs.python.org/library/stdtypes.html#set">set</a> datatype if you are in Python &gt;= 2.3. Note that the common cases of strings, numbers, and tuples (any built-in immutable datatype, for that matter) are hashable.</p>
</div>
<div class="section" id="from-python-history">
<h3>From Python history</h3>
<p>It&#8217;s interesting to note that <a class="reference external" href="http://www.python.org/dev/peps/pep-0270/">Python Enhancement Proposal (PEP) #270</a> considered putting a <tt class="docutils literal"><span class="pre">uniq</span></tt> function into the language distribution, but withdrew it with the following statement:</p>
<blockquote><p>
Removing duplicate elements from a list is a common task, but there are only two reasons I can see for making it a built-in. The first is if it could be done much faster, which isn&#8217;t the case.  The second is if it makes it significantly easier to write code.  The introduction of <tt class="docutils literal"><span class="pre">sets.py</span></tt> eliminates this situation since creating a sequence without duplicates is just a matter of choosing a different data structure: a set instead of a list.</p></blockquote>
<p>Remember that sets can only contain hashable elements (same policy as dictionary keys) and are therefore not suitable for all uniq-ifying tasks, as mentioned in the last paragraph of the previous section.</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id2" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1"><tt>[*]</tt></a></td>
<td>&quot;Iteration&quot; is just a fancy word for &quot;step through the sequence, element by element, and give that element a name.&quot; In our case we&#8217;re giving the name <tt class="docutils literal"><span class="pre">i</span></tt>.</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/VaporWarning-Python/~4/U8CApnhMnc4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Code ☃ Unicode</title>
		<link>http://blog.cdleary.com/2010/04/code-%e2%98%83-unicode/</link>
		<comments>http://blog.cdleary.com/2010/04/code-%e2%98%83-unicode/#comments</comments>
		<pubDate>Thu, 01 Apr 2010 19:00:36 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Languages]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Sarcasm]]></category>
		<category><![CDATA[The Web]]></category>
		<category><![CDATA[Cpp]]></category>
		<category><![CDATA[Snowman]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=739</guid>
		<description><![CDATA[Let&#8217;s come to terms: angle brackets and forward slashes are overloaded. Between relational operators, templates, bitwise shift operators, XML tags, (HTML/squiggly brace language) comments, division, regular expressions, and path separators, what don&#8217;t they do? I think it&#8217;s clear to everyone that XML is the best and most human readable markup format ever conceived (for all [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s come to terms: angle brackets and forward slashes are overloaded. Between relational operators, templates, bitwise shift operators, XML tags, (HTML/squiggly brace language) comments, division, regular expressions, and path separators, what don&#8217;t they do?</p>
<p>I think it&#8217;s clear to everyone that XML is the best and most human readable markup format ever conceived (for all data serialization and database backing store applications without exception), so it&#8217;s time for all that crufty old junk from yesteryear to learn its place. Widely adopted web standards (such as <a class="reference external" href="http://en.wikipedia.org/wiki/Binary_XML">Binary XML</a> and <a class="reference external" href="http://en.wikipedia.org/wiki/ECMAScript_for_XML">E4X</a>) and well specified information exchange protocols (such as <a class="reference external" href="http://en.wikipedia.org/wiki/SOAP#Technical_critique">SOAP</a>) speak for themselves through the synergy they&#8217;ve utilized in enterprise compute environments.</p>
<p>The results of a confidential survey I conducted conclusively demonstrate beyond any possibility of refutation that you type more angle brackets in an average markup document than you will type angle-bracket relational operators for the next ten years.</p>
<p>In conclusion, your life expectancy decreases as you continue to use the less-than operator and forward slash instead of accepting XML into your heart as a first-class syntax. I understand that some may not enjoy life or the pursuit of happiness and that they will continue to use deprecated syntaxes. To each their own.</p>
<p>I have contributed a <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=556488">JavaScript parser patch</a> to rectify the situation: the ☃ operator is a heart-warming replacement for the (now XML-exclusive) pointy-on-the-left angle bracket and the commonly seen tilde diaeresis ⍨ replaces slash for delimiting regular expressions. I am confident this patch will achieve swift adoption, as it decreases the context sensitivity of the JavaScript parser, which is a clear and direct benefit for browser end users.</p>
<p>The (intolerably whitespace-sensitive) Python programming language nearly came to a similar conclusion to <a class="reference external" href="http://www.python.org/dev/peps/pep-3117/">use unicode more pervasively</a>, while simultaneously making it a <em>real</em> programming language by way of the use of types, but did not have enough garments to see it through.</p>
<p>Another interesting benefit: because JavaScript files may be UTF-16 encoded, this increases the utilization of bytes in the source text by filling the upper octets with non-zero values. This, in the aggregate, will increase the meaningful bandwidth utilization of the Internet as a whole.</p>
<p>Of course, I&#8217;d also recommend that C++ solve its nested template delimiter issue with ☃ and ☼ to close instead of increasing the context-sensitivity of the parser. <a class="footnote-reference" href="#id7" id="id6"><tt>[*]</tt></a> It clearly follows the logical flow of start/end delimiting.</p>
<p>As soon as <a class="reference external" href="http://sites.google.com/site/unicodesymbols/Home/emoji-symbols">Emoji are accepted as proper unicode code points</a>, I will revise my recommendation to suggest using the standard <a class="reference external" href="http://gmailblog.blogspot.com/2009/04/new-in-labs-extra-emoticons.html">poo</a> emoticon for a template start delimiter, because <a class="reference external" href="http://colloquy.info/project/ticket/1411">increased giggling</a> is demonstrated to reduce the likelihood of head-and-wall involved injuries during C++ compilation, second only to regular use of head protection while programming.</p>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id7" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id6"><tt>[*]</tt></a></td>
<td>Which provides a direct detriment to the end user &#8212; optimizing compilers spend most of their time in the parser.</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/VaporWarning-Python/~4/YfkSI1MSD1Q" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/04/code-%e2%98%83-unicode/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Two postfix operations redux: sequence points</title>
		<link>http://blog.cdleary.com/2010/01/two-postfix-operations-redux-sequence-points/</link>
		<comments>http://blog.cdleary.com/2010/01/two-postfix-operations-redux-sequence-points/#comments</comments>
		<pubDate>Sun, 10 Jan 2010 19:00:43 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[C]]></category>
		<category><![CDATA[Languages]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Language Lawyering]]></category>
		<category><![CDATA[Sequence Points]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=709</guid>
		<description><![CDATA[Describes sequence points and the postincrement operator in C, contrasts with the Java language, and notes the expressions in Python that unfortunately look like preincrement/predecrement.]]></description>
			<content:encoded><![CDATA[<p>Get ready for some serious language lawyering.</p>
<p>I was going back and converting my old entries to <a class="reference external" href="http://en.wikipedia.org/wiki/ReStructuredText">reStructuredText</a> when I found <a class="reference external" href="http://blog.cdleary.com/2007/09/two-postfix-operations-in-a-single-statement-in-gcc/">an entry in which I was wrong</a>! (Shocking, I know.)</p>
<div class="section" id="c">
<h3>C</h3>
<p>Stupid old me didn&#8217;t know about sequence points back in 2007: the effects of the <tt class="docutils literal"><span class="pre">++</span></tt> operator in the C expression <tt class="docutils literal"><span class="pre">i++</span> <span class="pre">*</span> <span class="pre">i++</span></tt> are in an indeterminate state of side-effect completion until one of the language-defined <a class="reference external" href="http://en.wikipedia.org/wiki/Sequence_point">sequence points</a> is encountered (i.e. a semicolon or function invocation).</p>
<p>From the C99 standard 6.5.4.2 item 2 regarding the postfix increment and decrement operators:</p>
<blockquote><p>
The result of the postfix ++ operator is the value of the operand. After the result is obtained, the value of the operand is incremented. The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point.</p></blockquote>
<p>Therefore, the compiler is totally at liberty to interpret that expression as:</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #00007f; font-weight: bold;">mov</span> lhs_result<span style="color: #339933;">,</span> i     <span style="color: #666666; font-style: italic;">; Copy the values of the postincrement evaluation.</span>
<span style="color: #00007f; font-weight: bold;">mov</span> rhs_result<span style="color: #339933;">,</span> i     <span style="color: #666666; font-style: italic;">; (Which is the original value of i.)</span>
<span style="color: #00007f; font-weight: bold;">mul</span> result<span style="color: #339933;">,</span> lhs_result<span style="color: #339933;">,</span> rhs_result
<span style="color: #00007f; font-weight: bold;">add</span> i<span style="color: #339933;">,</span> lhs_result<span style="color: #339933;">,</span> <span style="color: #0000ff;">1</span>
<span style="color: #00007f; font-weight: bold;">add</span> i<span style="color: #339933;">,</span> rhs_result<span style="color: #339933;">,</span> <span style="color: #0000ff;">1</span>  <span style="color: #666666; font-style: italic;">; Second increment clobbers with the same value!</span></pre></div></div>

<p>This results in the same result as the GCC compilation in the referenced entry: <tt class="docutils literal"><span class="pre">i</span></tt> is <tt class="docutils literal"><span class="pre">12</span></tt> and the result is <tt class="docutils literal"><span class="pre">121</span></tt>.</p>
<p>As I mentioned before, the reason this can occur is that nothing in the syntax <em>forces</em> the first postincrement to be evaluated before the second one. To give an analogy to concurrency constructs: you have a kind of compile-time &quot;race condition&quot; in your syntax between the two postincrements that could be solved with a sequence point &quot;barrier&quot;. <a class="footnote-reference" href="#id3" id="id2"><tt>[*]</tt></a></p>
<p>In this assembly, those <tt class="docutils literal"><span class="pre">add</span></tt>s can float anywhere they like after their corresponding <tt class="docutils literal"><span class="pre">mov</span></tt> instruction and can operate directly on <tt class="docutils literal"><span class="pre">i</span></tt> instead of the temporary if they&#8217;d prefer. Here&#8217;s an possible sequence that results in a value of <tt class="docutils literal"><span class="pre">132</span></tt> and <tt class="docutils literal"><span class="pre">i</span></tt> as <tt class="docutils literal"><span class="pre">13</span></tt>.</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #00007f; font-weight: bold;">mov</span> lhs_result<span style="color: #339933;">,</span> i <span style="color: #666666; font-style: italic;">; Gets the original 11.</span>
<span style="color: #00007f; font-weight: bold;">inc</span> i             <span style="color: #666666; font-style: italic;">; Increment in-place after the start value is copied.</span>
<span style="color: #00007f; font-weight: bold;">mov</span> rhs_result<span style="color: #339933;">,</span> i <span style="color: #666666; font-style: italic;">; Gets the new value 12.</span>
<span style="color: #00007f; font-weight: bold;">inc</span> i             <span style="color: #666666; font-style: italic;">; Increment occurs in-place again, making 13.</span>
<span style="color: #00007f; font-weight: bold;">mul</span> result<span style="color: #339933;">,</span> lhs_result<span style="color: #339933;">,</span> rhs_result</pre></div></div>

<p>Even if you know what you&#8217;re doing, mixing two postfix operations, or <em>any</em> side effect, using the less obvious sequence points (like function invocation) is dangerous and easy to get wrong. Clearly it is not a best practice. <a class="footnote-reference" href="#id5" id="id4"><tt>[†]</tt></a></p>
</div>
<div class="section" id="java">
<h3>Java</h3>
<p>The postincrement operation appears to have sequence-point-like semantics in the Java language <a class="reference external" href="http://stackoverflow.com/questions/654715/in-java-how-does-a-post-increment-operator-act-in-a-return-statement/654735#654735">through experimentation</a>, and it does! From the Java language specification (page 416):</p>
<blockquote><p>
The Java programming language also guarantees that every operand of an operator (except the conditional operators &amp;&amp;, ||, and ? :) appears to be fully evaluated before any part of the operation itself is performed.</p></blockquote>
<p>Which combines with the definition of the postfix increment expression (page 485):</p>
<blockquote><p>
A postfix expression followed by a ++ operator is a postfix increment expression.</p></blockquote>
<p>As well as left-to-right expression evaluation (page 415):</p>
<blockquote><p>
The left-hand operand of a binary operator appears to be fully evaluated before any part of the right-hand operand is evaluated.</p></blockquote>
<p>To a definitive conclusion that <tt class="docutils literal"><span class="pre">i++</span> <span class="pre">*</span> <span class="pre">i++</span></tt> will always result in <tt class="docutils literal"><span class="pre">132</span> <span class="pre">==</span> <span class="pre">11</span> <span class="pre">*</span> <span class="pre">12</span></tt> and <tt class="docutils literal"><span class="pre">i</span> <span class="pre">==</span> <span class="pre">13</span></tt> when <tt class="docutils literal"><span class="pre">i</span> <span class="pre">==</span> <span class="pre">11</span></tt> to start.</p>
</div>
<div class="section" id="python">
<h3>Python</h3>
<p>Python has no increment operators specifically so you don&#8217;t have to deal with this kind of nonsense.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> count = <span style="color: #ff4500;">0</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> count++
  File <span style="color: #483d8b;">&quot;&lt;stdin&gt;&quot;</span>, line <span style="color: #ff4500;">1</span>
    count++
          ^
<span style="color: #008000;">SyntaxError</span>: invalid syntax</pre></div></div>

<p>Annoyingly for newbies, though, it looks like <tt class="docutils literal"><span class="pre">++count</span></tt> is a valid expression that <a class="reference external" href="http://stackoverflow.com/questions/1485841/python-behaviour-of-increment-and-decrement-operators/1485854#1485854">happens to look like preincrement</a>.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> count = <span style="color: #ff4500;">0</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> ++count
<span style="color: #ff4500;">0</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> --count
<span style="color: #ff4500;">0</span></pre></div></div>

<p>They&#8217;re actually two unary positive and negative operators, respectively. Just one of the hazards of a context free grammar, I suppose.</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id3" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id2"><tt>[*]</tt></a></td>
<td>I threw this in because the ordeal reminds me of the classic bank account concurrency problem. If it&#8217;s more confusing than descriptive, please ignore it. :-)</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id5" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id4"><tt>[†]</tt></a></td>
<td>
<p class="first">Since function invocation defines sequence points, I <em>thought</em> this code sequence guaranteed those results:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &lt;stdio.h&gt;</span>
&nbsp;
<span style="color: #993333;">int</span> identity<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> value<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #b1b100;">return</span> value<span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #993333;">int</span> i <span style="color: #339933;">=</span> <span style="color: #0000dd;">11</span><span style="color: #339933;">;</span>
        <span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;%d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span> identity<span style="color: #009900;">&#40;</span>i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> identity<span style="color: #009900;">&#40;</span>i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;%d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span> i<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>As <a class="reference external" href="http://blog.cdleary.com/2010/01/two-postfix-operations-redux-sequence-points/comment-page-1/#comment-1393">Dan points out</a>, the order of evaluation is totally unspecified &#8212; the left hand and right hand subexpression can potentially be evaluated <em>concurrently</em>.</p>
</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/VaporWarning-Python/~4/z591_ULV5ow" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/01/two-postfix-operations-redux-sequence-points/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Why you should bother!</title>
		<link>http://blog.cdleary.com/2009/10/why-you-should-bother/</link>
		<comments>http://blog.cdleary.com/2009/10/why-you-should-bother/#comments</comments>
		<pubDate>Fri, 02 Oct 2009 07:40:34 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Languages]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Interviews]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Pragmatic Programming]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=648</guid>
		<description><![CDATA[<p>I write this entry in response to <a class="reference external" href="http://sayamindu.randomink.org/ramblings/2009/04/29/why-should-i-bother/">Why Should I Bother?</a>. The answer, in short, is that I find Python to be a great language for getting things done, and you shouldn't let stupid interviewers deter you from learning a language that allows you to get more done.</p>
<p>I think I'm a pretty tough interviewer, so I also describe the things I'd recommend that a Python coder knows before applying to a Java position, based on my own technical-interview tactics.</p>]]></description>
			<content:encoded><![CDATA[<p>I write this entry in response to <a class="reference external" href="http://sayamindu.randomink.org/ramblings/2009/04/29/why-should-i-bother/">Why Should I Bother?</a>. The answer, in short, is that I find Python to be a great language for getting things done, and you shouldn&#8217;t let stupid interviewers deter you from learning a language that allows you to get more done.</p>
<p>I think I&#8217;m a pretty tough interviewer, so I also describe the things I&#8217;d recommend that a Python coder knows before applying to a Java position, based on my own technical-interview tactics.</p>
<div class="section" id="a-spectrum-of-formality">
<h3>A spectrum of formality</h3>
<p>As <a class="reference external" href="http://twitter.com/gok">my friend</a> pointed out to me long ago, many computer scientists don&#8217;t care about effective programming methods, because they prefer theory. Understandably, we computer scientists qua programmers (AKA software engineers) find ourselves rent in twain.</p>
<p>As a computer science degree candidate, you are inevitably enamored with complex formalities, terminology, <a class="footnote-reference" href="#id4" id="id3"><tt>[*]</tt></a> and a robust knowledge of mathematical models that you&#8217;ll use a few times per programming project (if you&#8217;re lucky). Pragmatic programming during a course of study often takes a back seat to familiar computer science concepts and conformance to industry desires.</p>
<p>As a programmer, I enjoy Python because I find it minimizes boilerplate, maximizes my time thinking about the problem domain, and permits me to use whichever paradigm works best. I find that I write programs more quickly and spend less time working around language deficiencies.  Importantly, the execution model fits in my brain.</p>
<p>Real Computer Scientists (TM) tend to love pure-functional programming languages because they fit into mathematical models nicely &#8212; founded on recursion, <a class="reference external" href="http://en.wikipedia.org/wiki/Curry-Howard_isomorphism">Curry-Howard isomorphism</a>, and what have you &#8212; whereas Python is strongly imperative and, <a class="reference external" href="http://www.youtube.com/watch?v=OKFeLZqLxzQ">in its dynamism, lacks the same sort of formality</a>.</p>
<p>Languages like Java sit somewhere in the middle. They&#8217;re still strongly imperative (there are no higher-order functions in Java), but there are more formalities. As the most notable example, compile-time type checking eliminates the possibility of type errors, which gives some programmers a sense of safety. <a class="footnote-reference" href="#id8" id="id7"><tt>[†]</tt></a> Such languages still let scientists chew on some computer sciencey problems; for example, where values clash with the type system, like <a class="reference external" href="http://www.youtube.com/watch?v=3d0YlZbY92U">provably eliminating NullPointerExceptions</a>, which <a class="reference external" href="http://qconlondon.com/london-2009/presentation/Null+References:+The+Billion+Dollar+Mistake">is fun, but difficult</a>!</p>
<p>As the cost of increased formality, this class of languages is more syntax-heavy and leans on design patterns to get some of the flexibility dynamic typing gives you up front.</p>
<p>It&#8217;s debatable which category of languages is easiest to learn, but Java-like languages have footholds in the industry from historical C++ developer bases, Sun&#8217;s successful marketing of Java off of C++, and the more recent successes of the C# .NET platform.</p>
<p>It makes sense that we&#8217;re predominantly taught this category of languages in school: as a result, we can play the percentages and apply for most available developer jobs. Given that we have to learn it, you might as well do some <a class="reference external" href="http://en.wikipedia.org/wiki/Kata_%28programming%29">throw-away programming</a> in it now and again to keep yourself from forgetting everything; however, I&#8217;d recommend, as a programmer, that you save the fun projects for whichever language(s) that you find most intriguing.</p>
<p>I picture ease-and-rapidity of development-and-maintenance on a spectrum from low to high friction &#8212; other languages I&#8217;ve worked in fall somewhere on that spectrum as higher friction than Python. Though many computer scientists much smarter than I seem to conflate formality and safety, I&#8217;m fairly convinced I attain code completion and maintainability goals more readily with the imperative and flexible Python language. Plus, perhaps most importantly, <a class="reference external" href="http://imgs.xkcd.com/comics/python.png">I have fun</a>.</p>
</div>
<div class="section" id="my-technical-interview-protocol">
<h3>My technical-interview protocol</h3>
<p>The protocol I use to interview candidates is pretty simple. <a class="footnote-reference" href="#id14" id="id13"><tt>[‡]</tt></a></p>
<ul class="simple">
<li>Analyze their background experience for potential weaknesses that would affect their ability to perform in the position.</li>
<li>Don&#8217;t let the candidate talk about anything until the last <em>n</em> minutes, when they can ask questions or comment on the interview experience.</li>
<li>Hone in on the areas of potential weakness to determine how bad they are (if they&#8217;re bad at all).</li>
<li>Evaluate how easy it is to overcome those weaknesses. Determine where their level in relevant areas falls in the <a class="reference external" href="http://en.wikipedia.org/wiki/Dreyfus_model_of_skill_acquisition">Dreyfus model</a> and how much it matters to the description of the position.</li>
</ul>
</div>
<div class="section" id="potential-java-interview-weaknesses">
<h3>Potential Java interview weaknesses</h3>
<p>Interviewing a candidate whose background is primarily Python based for a <em>generic</em> Java developer position (as in <a class="reference external" href="http://sayamindu.randomink.org/ramblings/2009/04/29/why-should-i-bother/">Sayamindu&#8217;s entry</a>), I would immediately flag the following areas as potential weaknesses:</p>
<dl class="docutils">
<dt><strong>Primitive data types</strong></dt>
<dd>
<p class="first">A programmer can pretty much get away never knowing how a number works in Python, since you typically overflow to appropriately sized data types automatically.</p>
<p class="last">The candidate needs to know what <a class="reference external" href="http://java.sun.com/docs/books/tutorial/java/nutsandbolts/datatypes.html">all the Java primitives</a> are when the names are provided to them, and must be able to describe why you would choose to use one over another. Knowing pass-by-value versus pass-by-reference is a plus. In Python there is a somewhat similar distinction between mutable and immutable types &#8212; if they understand the subtleties of identifier binding, learning by-ref versus by-value will be a cinch. If they don&#8217;t know either, I&#8217;ll be worried.</p>
</dd>
<dt><strong>Object oriented design</strong></dt>
<dd>
<p class="first">The candidate&#8217;s Python background must not be entirely procedural, or they won&#8217;t fare well in a Java environment (which forces object orientation). Additionally, this would indicate that they probably haven&#8217;t done much design work: even if they&#8217;re an object-orientation iconoclast, they have to know what they&#8217;re rebelling against and why.</p>
<p>They need to know:</p>
<ul class="simple">
<li>When polymorphism is appropriate.</li>
<li>What should be exposed from an encapsulation perspective.</li>
<li>What the benefits of interfaces are (in a statically typed, single-inheritance language).</li>
</ul>
<p class="last">Basically, if they don&#8217;t know the fundamentals of object oriented design, I&#8217;ll assume they&#8217;ve only ever written &quot;scripts,&quot; by which I mean, &quot;Small, unimportant code that glues the I/O of several real applications together.&quot; I don&#8217;t use the term lightly.</p>
</dd>
<dt><strong>Unit testing</strong></dt>
<dd>
<p class="first">If they&#8217;ve been writing real-world Python without a single unit test or doctest, they&#8217;ve been Doing it Wrong (TM).</p>
<p class="last"><tt class="docutils literal"><span class="pre">unittest</span></tt> is purposefully modeled on xUnit. They may have to learn the new jUnit 4 decorator syntax when they start work, but they should be able to claim they&#8217;ve worked with a jUnit 3 -like API.</p>
</dd>
<dt><strong>Abstract data structures</strong></dt>
<dd>
<p class="first">Python has tuples, lists and dictionaries &#8212; all polymorphic containers &#8212; and they&#8217;ll do everything that your programmer heart desires. <a class="footnote-reference" href="#id18" id="id17"><tt>[§]</tt></a> Some other languages don&#8217;t have such nice abstractions.</p>
<p>It&#8217;d be awesome if they knew:</p>
<ul class="simple">
<li>The difference between injective and bijective and how those terms are important to hash function design. If they can tell me this, I&#8217;ll let them write my high-performance hash functions.</li>
</ul>
<p>They must know (in ascending importance):</p>
<ul class="simple">
<li>The difference between a <a class="reference external" href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/HashMap.html">HashMap</a> and a <a class="reference external" href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/TreeMap.html">TreeMap</a>.</li>
<li>The difference between a vector and a linked list, or when one should preferred over the other.  The names are unimportant &#8212; I&#8217;d clarify that a vector was a dynamically growing array.</li>
<li>The &quot;difference&quot; between a tree and a graph.</li>
</ul>
<p>Turning attention to you, the reader: if you&#8217;re lacking in data structures knowledge, I recommend you read a data structures book and actually implement the data structures. Then, take a few minutes to figure out where you&#8217;d actually use them in an application. They stick in your head fairly well once you&#8217;ve implemented them once.</p>
<p class="last">Some interviewers will ask stupid questions like how to implement sorting algorithms. Again, just pick up a data structures book and implement them once, and you&#8217;ll get the gist. Refresh yourself before the interview, because these are a silly favorite &#8212; <a class="reference external" href="http://en.wikipedia.org/wiki/Qsort_%28C_standard_library%29">very few people have to implement sorting functions anymore</a>.</p>
</dd>
<dt><strong>Design patterns</strong></dt>
<dd>
<p class="first">Design patterns serve several purposes:</p>
<ul class="simple">
<li>They establish a common language for communicating proposed solutions to commonly found problems.</li>
<li>They prevent developers for inventing stupid solutions to a solved class of problems.</li>
<li>They contain a suite of workarounds for inflexibilities in statically typed languages.</li>
</ul>
<p>I would want to assure myself that you had an appropriate knowledge of relevant design patterns. More important than the names: if I describe them to you, will you recognize them and their useful applications?</p>
<p class="last">For example, have you ever used the observer pattern? Adapter pattern? Proxying? Facade? You almost certainly had to use all of those if you&#8217;ve done major design work in Python.</p>
</dd>
<dt><strong>Background concepts</strong></dt>
<dd>
<p class="first">These are some things that I would feel extra good about if the candidate knew and could accurately describe how they relate to their Python analogs:</p>
<ul class="last simple">
<li>The importance of string builders (Python list joining idiom)</li>
<li>Basic idea of how I/O streams work (Python <tt class="docutils literal"><span class="pre">file</span></tt>s under the hood)</li>
<li>Basic knowledge of typecasting (Python has implicit polymorphism)</li>
</ul>
</dd>
</dl>
</div>
<div class="section" id="practical-advice">
<h3>Practical advice</h3>
<p>Some (bad) interviewers just won&#8217;t like you because you don&#8217;t know their favorite language. If you&#8217;re interviewing for a position that&#8217;s likely to be Java oriented, find the easiest IDE out there and write an application in it for fun. Try porting a Python application you wrote and see how the concepts translate &#8212; that&#8217;s often an eye-opener. Or <a class="reference external" href="http://en.wikipedia.org/wiki/Kata_%28programming%29">katas</a>!</p>
<p>If you find yourself unawares in an interview with these &quot;language crusaders,&quot; there&#8217;s nothing you can do but <strong>show that you have the capacity to learn their language in the few weeks vacation you have before you start</strong>. If it makes you feel better, keep a mental map from languages to number of jerks you&#8217;ve encountered &#8212; even normalizing by developer-base size the results can be surprising. ;-)</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id4" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id3"><tt>[*]</tt></a></td>
<td>Frequently <em>unncessary</em> terminology, often trending towards hot enterprise jargon, since that&#8217;s what nets the most jobs and grant money.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id8" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id7"><tt>[†]</tt></a></td>
<td>Dynamic typing proponents are quick to point out that this doesn&#8217;t prevent flaws in reasoning, which are the more difficult class of errors, and that you&#8217;ll end up writing tests for these anyway.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id14" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id13"><tt>[‡]</tt></a></td>
<td>Clearly candidates could exploit a vulnerability in my interview protocol: leave off things they know I&#8217;m likely to test that they know particularly well; however, I generally ask them to stop after I&#8217;m satisfied they know something. Plus, the less I know about their other weaknesses the more unsure I am about them, and thus the less likely I am to recommend them.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id18" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id17"><tt>[§]</tt></a></td>
<td>Though not necessarily in a performant way; i.e. note the existence of <tt class="docutils literal"><span class="pre">collections.deque</span></tt> and <tt class="docutils literal"><span class="pre">bisect</span></tt>. Knowing Python, I&#8217;d quiz the candidate to see if they knew of the performant datatypes.</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/VaporWarning-Python/~4/Im1kIelW468" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2009/10/why-you-should-bother/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Learning Python by example: RC4</title>
		<link>http://blog.cdleary.com/2009/09/learning-python-by-example-rc4/</link>
		<comments>http://blog.cdleary.com/2009/09/learning-python-by-example-rc4/#comments</comments>
		<pubDate>Mon, 21 Sep 2009 05:54:29 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Generators]]></category>
		<category><![CDATA[Introductory]]></category>
		<category><![CDATA[Kata]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=619</guid>
		<description><![CDATA[Writes an introductory RC4 program by translating pseudo-code (courtesy of Programming Praxis) to Python and adding a front-end. Aimed at people who already know how to program and have been looking for a tidbit to try in Python.]]></description>
			<content:encoded><![CDATA[<p>One of my friends at work was fakeplaining <a class="footnote-reference" href="#id3" id="id1"><tt>[*]</tt></a> that he had been on the Python programming mailing list at work for some time, yet still did not know Python. Being hopelessly suggestible in the face of obvious sarcasm, I decided to sacrifice a few hours of sleep to the god of blog. <a class="footnote-reference" href="#id4" id="id2"><tt>[†]</tt></a></p>
<p>Note that this entry is aimed at people who already know how to program and have been looking for a tidbit to try in Python.</p>
<p>There are a lot of side notes I&#8217;ve left out for simplicity of explanation; however, I also attempted to make the experience interesting by introducing one of Python&#8217;s more advanced features, called &quot;generator functions,&quot; into the mix. Hopefully it strikes a balance. Please comment if you are utterly confused by generators &#8212; I may add an alternate section that allows the reader to avoid them altogether.</p>
<div class="section" id="you-kata-wanna">
<h3>You kata wanna&#8230;</h3>
<p>A number of leaders in the programming community are hot on this trend called &quot;<a class="reference external" href="`codekatas`">code katas</a>.&quot; I&#8217;m actually a big fan of this trend, mainly because I&#8217;ve been writing code for no reason, hating on it, throwing it away, and subsequently rewriting it for my <em>entire life</em>, but I now get to call it something cool- and ninja-sounding. Doing such things in my spare time is no longer considered &quot;inexcusable nerdiness;&quot; rather, it&#8217;s my small endeavor to <a class="reference external" href="`professionalism`">bring professionalism to the field of software engineering</a>. <em>*cough*</em></p>
<p>One reason that I <em>really</em> enjoy this new trend is that programmers are posting their own morsel-sized programming problems left and right, giving ample opportunities to explore new languages (and dusty corners of ones you know well) without accidentally becoming BDFL of a seminal framework or utility. <a class="footnote-reference" href="#id8" id="id5"><tt>[‡]</tt></a></p>
</div>
<div class="section" id="rc4-pseudocode">
<h3>RC4 Pseudocode</h3>
<p>Case in point, I&#8217;ll use the recent <a class="reference external" href="http://programmingpraxis.com/2009/09/04/rons-cipher-4/">kata from Programming Praxis</a> for this Python exercise, as they provide straightforward pseudocode.  Here&#8217;s the encryption algorithm named RC4, as quoted from Programming Praxis:</p>
<blockquote>
<p>The key array K is initialized like this:</p>

<div class="wp_syntax"><div class="code"><pre class="pseudocode" style="font-family:monospace;">for i from 0 to 255
    K[i] := i
&nbsp;
j := 0
&nbsp;
for i from 0 to 255
    j := (j + K[i] + key[i mod klen]) mod 256
    swap K[i], K[j]</pre></div></div>

<p>Once the key array is initialized, the pseudo-random byte generator uses a similar calculation to build the stream of random bytes:</p>

<div class="wp_syntax"><div class="code"><pre class="pseudocode" style="font-family:monospace;">i := 0
j := 0
&nbsp;
start:
    i := (i + 1) mod 256
    j := (j + K[i]) mod 256
    swap K[i], K[j]
    output K[ (K[i] + K[j]) mod 256 ]
    goto start</pre></div></div>

</blockquote>
<p>The first step in writing our RC4 program is to translate this pseudocode to Python, while the second step is to add a command line front-end for some off-the-cuff implementation experience.</p>
<p>If you&#8217;d like to look at the final program ahead of time, grab a copy of my <a class="reference external" href="http://gist.github.com/188393">reference implementation</a>.</p>
</div>
<div class="section" id="porting-the-initialization">
<h3>Porting the initialization</h3>
<p>For initialization, we use a provided key to calculate a 256 entry integer sequence. Open a new file called <tt class="docutils literal"><span class="pre">rc4.py</span></tt> and write the following function:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> initialize<span style="color: black;">&#40;</span>key<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Produce a 256-entry list based on `key` (a sequence of numbers) as
    the first step in RC4.
    Note: indices in key greater than 255 will be ignored.
    &quot;&quot;&quot;</span>
    k = <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">256</span><span style="color: black;">&#41;</span>
    j = <span style="color: #ff4500;">0</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">256</span><span style="color: black;">&#41;</span>:
        j = <span style="color: black;">&#40;</span>j + k<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span> + key<span style="color: black;">&#91;</span>i <span style="color: #66cc66;">%</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>key<span style="color: black;">&#41;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">%</span> <span style="color: #ff4500;">256</span>
        k<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span>, k<span style="color: black;">&#91;</span>j<span style="color: black;">&#93;</span> = k<span style="color: black;">&#91;</span>j<span style="color: black;">&#93;</span>, k<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> k</pre></div></div>

<p>The simplicity of the translation demonstrates why Python is sometimes called &quot;executable pseudocode&quot;. Breaking it down line by line:</p>
<ol class="arabic simple" start="0">
<li><strong>def</strong>ines a function named <tt class="docutils literal"><span class="pre">initialize</span></tt> that takes a single argument, <tt class="docutils literal"><span class="pre">key</span></tt>.</li>
<li>A documentation string (&quot;docstring&quot; for short). In Python, documentation is associated with a function <em>even at runtime</em>, in contrast to traditional JavaDoc or POD.  <a class="footnote-reference" href="#id14" id="id10"><tt>[§]</tt></a> If the first statement in a function is a string literal, it is used as the docstring for that function. <a class="footnote-reference" href="#id15" id="id11"><tt>[¶]</tt></a></li>
</ol>
<ol class="arabic" start="5">
<li>
<p class="first">The built-in <tt class="docutils literal"><span class="pre">range</span></tt> function returns a list of values. <a class="footnote-reference" href="#id16" id="id12"><tt>[#]</tt></a> &quot;Built-in&quot; is the terminology used for items that are &quot;available all the time without explicitly importing anything.&quot;</p>
<p>This function also has a two-argument form, <tt class="docutils literal"><span class="pre">range(start,</span> <span class="pre">stop)</span></tt>; however, in the single argument form, <tt class="docutils literal"><span class="pre">start</span></tt> has a default of 0, so the function invocation returns a list of all the integers in the mathematical interval <tt class="docutils literal"><span class="pre">[0,</span> <span class="pre">256)</span></tt>, for a total of 256 values.</p>
</li>
</ol>
<ol class="arabic simple" start="7">
<li>There is only one <tt class="docutils literal"><span class="pre">for</span></tt> loop syntax: <tt class="docutils literal"><span class="pre">for</span> <span class="pre">[identifier]</span> <span class="pre">in</span> <span class="pre">[iterable]</span></tt>. Lists are iterable because they contain a sequence of objects.</li>
<li>Finite collections also support the built-in function <tt class="docutils literal"><span class="pre">len([sizable])</span></tt>. The way that numerical arithmetic works and sequence indexing via <tt class="docutils literal"><span class="pre">seq[idx]</span></tt> should be familiar.</li>
<li>Python has an elegant swap capability &#8212; what&#8217;s important to note is that the entire right hand side is evaluated, then assigned to the left hand side.</li>
<li>Python functions optionally return a value. If no return statement is encountered, <tt class="docutils literal"><span class="pre">None</span></tt> is returned, which indicates the absence of a value (<a class="reference external" href="http://docs.python.org/library/stdtypes.html#the-null-object">docs</a>).</li>
</ol>
</div>
<div class="section" id="generators-functions-that-pause">
<h3>Generators: functions that pause</h3>
<p>Python has a convenient feature, called &quot;generator functions,&quot; that allows you to create a stream of values using function-definition syntax. <a class="footnote-reference" href="#id22" id="id17"><tt>[♠]</tt></a> You can think of generator functions as special functions that can pause and resume, returning a value each time it pauses.</p>
<p>The canonical example illustrates the concept well &#8212; use the interactive Python shell to explore how generator functions work, by running the <tt class="docutils literal"><span class="pre">python</span></tt> command without arguments. Make sure the version is <tt class="docutils literal"><span class="pre">python2.3</span></tt> or above. Once you&#8217;re in the interactive shell, type the following:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">def</span> gen_counter<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
...     <span style="color: black;">i</span> = <span style="color: #ff4500;">0</span>
...     <span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #008000;">True</span>:
...         <span style="color: #ff7700;font-weight:bold;">yield</span> i
...         <span style="color: black;">i</span> += <span style="color: #ff4500;">1</span>
...
<span style="color: #66cc66;">&gt;&gt;&gt;</span></pre></div></div>

<p>Note the use of a <tt class="docutils literal"><span class="pre">yield</span></tt> statement, which tells Python that it is a generator function. Calling a generator function creates an iterable <em>generator object</em>, which can then produce a potentially infinite series of values:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> counter = gen_counter<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">print</span> counter
<span style="color: #66cc66;">&lt;</span>generator <span style="color: #008000;">object</span> gen_counter at 0xb7e3fbe4<span style="color: #66cc66;">&gt;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> counter.<span style="color: black;">next</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff4500;">0</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> counter.<span style="color: black;">next</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff4500;">1</span></pre></div></div>

<p>Note that because the stream of values is potentially infinite and lazily evaluated, there&#8217;s no concept of length: it&#8217;s not representative of a container so much as a sequence:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>counter<span style="color: black;">&#41;</span>
Traceback <span style="color: black;">&#40;</span>most recent call last<span style="color: black;">&#41;</span>:
  File <span style="color: #483d8b;">&quot;&lt;stdin&gt;&quot;</span>, line <span style="color: #ff4500;">1</span>, <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #66cc66;">&lt;</span>module<span style="color: #66cc66;">&gt;</span>
<span style="color: #008000;">TypeError</span>: <span style="color: #008000;">object</span> of <span style="color: #008000;">type</span> <span style="color: #483d8b;">'generator'</span> has no <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Also important to note is that none of the local values in the generator instances are shared; i.e. instantiating a second generator has no effect on the first:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> one_counter = gen_counter<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> another_counter = gen_counter<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> one_counter.<span style="color: black;">next</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff4500;">0</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> another_counter.<span style="color: black;">next</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff4500;">0</span></pre></div></div>

<p>Since generators are iterable, you can use them in a <tt class="docutils literal"><span class="pre">for</span></tt> loop just like containers. (But watch out for infinite generators and no <tt class="docutils literal"><span class="pre">break</span></tt> condition in your <tt class="docutils literal"><span class="pre">for</span></tt> loop!)</p>
<p>If you&#8217;re still confused, the <a class="reference external" href="http://docs.python.org/tutorial/">official tutorial</a>&#8216;s <a class="reference external" href="http://docs.python.org/tutorial/classes.html#generators">section on generators</a> may help to clarify things. For an in-depth look at generators and why they&#8217;re awesome, <a class="reference external" href="http://www.dabeaz.com/generators/">David M. Beazley</a>&#8216;s <a class="reference external" href="http://www.dabeaz.com/generators/Generators.pdf">PyCon presentation on generators</a> is excellent.</p>
</div>
<div class="section" id="applying-generators-to-rc4">
<h3>Applying generators to RC4</h3>
<p>This dove-tails nicely with the second part of the algorithm, which requires a stream of values to XOR against. The generator is nearly a direct translation from the pseudocode, which you may also add to <tt class="docutils literal"><span class="pre">rc4.py</span></tt>:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> gen_random_bytes<span style="color: black;">&#40;</span>k<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Yield a pseudo-random stream of bytes based on 256-byte array `k`.&quot;&quot;&quot;</span>
    i = <span style="color: #ff4500;">0</span>
    j = <span style="color: #ff4500;">0</span>
    <span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #008000;">True</span>:
        i = <span style="color: black;">&#40;</span>i + <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">%</span> <span style="color: #ff4500;">256</span>
        j = <span style="color: black;">&#40;</span>j + k<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">%</span> <span style="color: #ff4500;">256</span>
        k<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span>, k<span style="color: black;">&#91;</span>j<span style="color: black;">&#93;</span> = k<span style="color: black;">&#91;</span>j<span style="color: black;">&#93;</span>, k<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span>
        <span style="color: #ff7700;font-weight:bold;">yield</span> k<span style="color: black;">&#91;</span><span style="color: black;">&#40;</span>k<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span> + k<span style="color: black;">&#91;</span>j<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">%</span> <span style="color: #ff4500;">256</span><span style="color: black;">&#93;</span></pre></div></div>

<p>Each time <tt class="docutils literal"><span class="pre">.next()</span></tt> is called on the generator instance, the function executes until the first <tt class="docutils literal"><span class="pre">yield</span></tt> statement is encountered, produces that value, and saves the function state for later.</p>
<p>Yes, we <em>could</em> create a big list of pseudo-random values the length of the text, but creating them all at the same time adds <tt class="docutils literal"><span class="pre">O(len(text))</span></tt> memory overhead, whereas the generator is constant memory overhead (and computationally efficient).</p>
</div>
<div class="section" id="tying-it-together">
<h3>Tying it together</h3>
<p>Now we just need a function that does the XORing, which teaches us about strings and characters.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> run_rc4<span style="color: black;">&#40;</span>k, text<span style="color: black;">&#41;</span>:
    cipher_chars = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    random_byte_gen = gen_random_bytes<span style="color: black;">&#40;</span>k<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> char <span style="color: #ff7700;font-weight:bold;">in</span> text:
        byte = <span style="color: #008000;">ord</span><span style="color: black;">&#40;</span>char<span style="color: black;">&#41;</span>
        cipher_byte = byte ^ random_byte_gen.<span style="color: black;">next</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        cipher_chars.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #008000;">chr</span><span style="color: black;">&#40;</span>cipher_byte<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">''</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>cipher_chars<span style="color: black;">&#41;</span></pre></div></div>

<p>Line by line:</p>
<ol class="arabic simple">
<li>An empty list <em>cipher character accumulator</em> is created.</li>
<li>The generator object is instantiated by calling the generator function.</li>
<li>As you can see from the <tt class="docutils literal"><span class="pre">for</span></tt> loop, Python strings are iterable as sequences of characters. Characters in Python are just strings of length one, so you can think of a string iterator as stepping over all of its one-character substrings in order.</li>
</ol>
<ol class="arabic simple" start="5">
<li>To convert a textual character into its character-code numerical value, the built-in <tt class="docutils literal"><span class="pre">ord</span></tt> function is used (<a class="reference external" href="http://docs.python.org/library/functions.html#ord">docs</a>).</li>
<li>The meat of the algorithm: XOR the textual character with the next pseudo-random byte from the byte stream.</li>
<li>After obtaining the cipher-byte through the XOR, we want to convert back to a textual (character) representation, which we do via the built-in <tt class="docutils literal"><span class="pre">chr</span></tt> function (<a class="reference external" href="http://docs.python.org/library/functions.html#chr">docs</a>). We then place that character into a sequence of cipher characters. <a class="footnote-reference" href="#id29" id="id23"><tt>[♥]</tt></a></li>
<li>To join together characters to form a string, we use the <tt class="docutils literal"><span class="pre">str.join([iterable])</span></tt> method (<a class="reference external" href="http://docs.python.org/library/stdtypes.html#str.join">docs</a>). <a class="footnote-reference" href="#id30" id="id24"><tt>[♦]</tt></a> Note that, on some platforms, this is <em>much</em> more efficient than using <tt class="docutils literal"><span class="pre">+=</span></tt> (for string concatenation) over and over again. It&#8217;s a best practice to use this sequence-joining idiom to avoid possible concatenation overhead. <a class="footnote-reference" href="#id31" id="id25"><tt>[♣]</tt></a></li>
</ol>
</div>
<div class="section" id="front-end-fun">
<h3>Front-end fun</h3>
<p>If you thought that the pseudo-code translation looked like a piece of cake, you may feel up to a challenge: write a command line interface that:</p>
<ol class="arabic simple">
<li>Asks for an encryption key.</li>
<li>Turns the key to a sequence of integer values and initializes with it.</li>
<li>Continually asks for user-provided text to translate and spits out the corresponding cipher text.</li>
</ol>
<div class="section" id="what-you-need-to-know">
<h4>What you need to know</h4>
<ul>
<li>
<p class="first">In Python 2.x <tt class="docutils literal"><span class="pre">print</span></tt> is a statement that is followed by comma-separated values, where each comma turns into a space. The <tt class="docutils literal"><span class="pre">print</span></tt> statement puts a newline at the end of its output by default:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'a'</span>, <span style="color: #483d8b;">'b'</span>, <span style="color: #483d8b;">'c'</span>, <span style="color: #483d8b;">'d'</span>
a b c d</pre></div></div>

<p>To suppress the newline (for example, in a loop), leave on a trailing comma:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">for</span> char <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: black;">&#91;</span><span style="color: #483d8b;">'a'</span>, <span style="color: #483d8b;">'b'</span>, <span style="color: #483d8b;">'c'</span>, <span style="color: #483d8b;">'d'</span><span style="color: black;">&#93;</span>:
...     <span style="color: #ff7700;font-weight:bold;">print</span> char,
...
<span style="color: black;">a</span> b c d</pre></div></div>

<p>If there&#8217;s something that you can&#8217;t do using the above, I&#8217;ll refer you to the Python tutorial&#8217;s section on <a class="reference external" href="http://docs.python.org/tutorial/inputoutput.html#fancier-output-formatting">fancier output formatting</a>.</p>
</li>
<li>
<p class="first">The built-in function called <tt class="docutils literal"><span class="pre">raw_input</span></tt> (<a class="reference external" href="http://docs.python.org/library/functions.html#raw_input">docs</a>) displays a message and then requests user input as a command line, returning the user input as a string. For example:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">name = <span style="color: #008000;">raw_input</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'What is your name? '</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'Your name is'</span>, name</pre></div></div>

</li>
<li>
<p class="first">The built-in function called <tt class="docutils literal"><span class="pre">repr</span></tt> (<a class="reference external" href="http://docs.python.org/library/functions.html#repr">docs</a>) returns the Python representation of a string (or any object) &#8212; this is useful for escaping strings with funky, non- <a class="reference external" href="http://docs.python.org/library/string.html#string.printable">printable characters</a> in it, as our cipher algorithm is likely to do. For example, you&#8217;ll probably want to use something along the lines of:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">cipher_text = run_rc4<span style="color: black;">&#40;</span>k, text<span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'Cipher text:'</span>, <span style="color: #dc143c;">repr</span><span style="color: black;">&#40;</span>cipher_text<span style="color: black;">&#41;</span></pre></div></div>

</li>
<li>
<p class="first">The character-escaping performed by <tt class="docutils literal"><span class="pre">repr</span></tt> can be reversed using the string decode method: <tt class="docutils literal"><span class="pre">str.decode('string_escape')</span></tt>. For example:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> text_with_escapes = <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\\</span>x61<span style="color: #000099; font-weight: bold;">\\</span>x62<span style="color: #000099; font-weight: bold;">\\</span>x63'</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">print</span> text_with_escapes
\x61\x62\x63
<span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">print</span> text_with_escapes.<span style="color: black;">decode</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'string_escape'</span><span style="color: black;">&#41;</span>
abc</pre></div></div>

<p>So if you want to allow the user to enter ciphertext at the command prompt, you can read it in and decode it from the escaped format.</p>
</li>
<li>
<p class="first">If you just put the CLI front-end code to execute at the bottom of the <tt class="docutils literal"><span class="pre">rc4.py</span></tt> file, it will work; however, it&#8217;s a best practice to test to make sure that <tt class="docutils literal"><span class="pre">rc4.py</span></tt> is the file that&#8217;s being executed. To do this, wrap the command line interface code in a test like the following:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>: <span style="color: #808080; font-style: italic;"># This is the file being executed.</span>
    main<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

</li>
</ul>
</div>
<div class="section" id="if-you-need-help">
<h4>If you need help</h4>
<p>I wrote a <a class="reference external" href="http://gist.github.com/188393">reference implementation</a> and posted it to <a class="reference external" href="http://github.com">github</a> &#8212; feel free to check it out if you get stuck.</p>
<p>Here&#8217;s a sample usage of my implementation:</p>

<div class="wp_syntax"><div class="code"><pre class="none" style="font-family:monospace;">===========
RC4 Utility
===========
The output values are valid Python strings. They may contain escape characters
of the form \xhh to avoid confusing your terminal emulator. Only the first 256
characters of the encryption key are used.
&nbsp;
Enter an encryption key: an encryption key!
&nbsp;
Enter plain or cipher text: Victory is mine!
Your RC4 text is: '\xecL\xce(\x16\x8e3\xf02!\xcd\xc6\x9a\xc0j\x98'
&nbsp;
Enter plain or cipher text: '\xecL\xce(\x16\x8e3\xf02!\xcd\xc6\x9a\xc0j\x98'
Unescaping ciphertext...
Your RC4 text is: 'Victory is mine!'</pre></div></div>

<p>Once you find that your cipher is reversible, you&#8217;ve probably got it right!</p>
<p>Again, please comment if anything is unclear.</p>
</div>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id3" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1"><tt>[*]</tt></a></td>
<td>Also known as &quot;fitching.&quot; Often performed by those in brillig, slithy toves.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id4" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id2"><tt>[†]</tt></a></td>
<td>Could God make a blog entry so long and boring that God would proclaim &quot;TL:DR?&quot;</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id8" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id5"><tt>[‡]</tt></a></td>
<td>If I had a nickel for every time this happened to me, I would have no nickels.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id14" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id10"><tt>[§]</tt></a></td>
<td>This allows you to reflect on things and extract their documentation, which comes in handy when you&#8217;re running in an interactive Python session or spitting out module-level documentation in a command line argument parser.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id15" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id11"><tt>[¶]</tt></a></td>
<td>This same rule applies to classes and modules, which are beyond the scope of this entry.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id16" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id12"><tt>[#]</tt></a></td>
<td>Python lists are mutable sequences, implemented as vector ADTs under the hood.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id22" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id17"><tt>[♠]</tt></a></td>
<td>The same task can be accomplished with a custom iterator class, but generators are much more concise and more readable &#8212; note that the generator that we end up with reads just like the pseudocode!</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id29" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id23"><tt>[♥]</tt></a></td>
<td>Note that the language having a built-in <tt class="docutils literal"><span class="pre">join(iterable)</span></tt> method on its string datatype eliminates the need for every iterable type to implement some form of <tt class="docutils literal"><span class="pre">iterable.join(str)</span></tt>.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id30" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id24"><tt>[♦]</tt></a></td>
<td>There&#8217;s a way to use generators here as well, but the list of characters makes things simpler to understand for the moment. If you&#8217;re feeling confident, convert this function to be a generator function at the end of the exercise and make it work with the rest of the program.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id31" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id25"><tt>[♣]</tt></a></td>
<td>It&#8217;s bad practice to assume you&#8217;ll always be running on CPython &#8212; there are also JVM and .NET (CLR) interpreters. Remember, <a class="reference external" href="http://www.lysator.liu.se/c/ten-commandments.html">thou shalt not claimeth</a> that, &quot;<a class="reference external" href="http://dictionary.die.net/vaxocentrism">all the world&#8217;s a VAX</a>!&quot;</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/VaporWarning-Python/~4/jAM8SAbL8bs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2009/09/learning-python-by-example-rc4/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Python 3.1, you shouldn’t have!</title>
		<link>http://blog.cdleary.com/2009/07/python-3-1-you-shouldnt-have/</link>
		<comments>http://blog.cdleary.com/2009/07/python-3-1-you-shouldnt-have/#comments</comments>
		<pubDate>Wed, 01 Jul 2009 18:24:55 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Birthday]]></category>
		<category><![CDATA[Py3k]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=591</guid>
		<description><![CDATA[I highly appreciate the presents that the Python 3.1 team (unwittingly) got me for my birthday this year. This morning I wrote the following snippet to determine the day-frequency of birthday occurrences: [*] #!/usr/bin/env python3.1 &#160; import collections import datetime from operator import itemgetter &#160; &#160; birthday = &#123;part: int&#40;input&#40;'Enter birth {}: '.format&#40;part&#41;&#41;&#41; for part [...]]]></description>
			<content:encoded><![CDATA[<p>I highly appreciate the presents that the Python 3.1 team (unwittingly) got me for my birthday this year. This morning I wrote the following snippet to determine the day-frequency of birthday occurrences: <a class="footnote-reference" href="#id2" id="id1"><tt>[*]</tt></a></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/env python3.1</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">collections</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">datetime</span>
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">operator</span> <span style="color: #ff7700;font-weight:bold;">import</span> itemgetter
&nbsp;
&nbsp;
birthday = <span style="color: black;">&#123;</span>part: <span style="color: #008000;">int</span><span style="color: black;">&#40;</span><span style="color: #008000;">input</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'Enter birth {}: '</span>.<span style="color: black;">format</span><span style="color: black;">&#40;</span>part<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> part <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #483d8b;">'year month day'</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#125;</span>
now = <span style="color: #dc143c;">datetime</span>.<span style="color: #dc143c;">datetime</span>.<span style="color: black;">now</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
days = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> year <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>birthday<span style="color: black;">&#91;</span><span style="color: #483d8b;">'year'</span><span style="color: black;">&#93;</span>, now.<span style="color: black;">year</span> + <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>:
    iter_birthday = <span style="color: #008000;">dict</span><span style="color: black;">&#40;</span>birthday<span style="color: black;">&#41;</span>
    iter_birthday<span style="color: black;">&#91;</span><span style="color: #483d8b;">'year'</span><span style="color: black;">&#93;</span> = year
    date = <span style="color: #dc143c;">datetime</span>.<span style="color: black;">date</span><span style="color: black;">&#40;</span><span style="color: #66cc66;">**</span>iter_birthday<span style="color: black;">&#41;</span>
    days.<span style="color: black;">append</span><span style="color: black;">&#40;</span>date.<span style="color: black;">strftime</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'%A'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
counter = <span style="color: #dc143c;">collections</span>.<span style="color: black;">Counter</span><span style="color: black;">&#40;</span>days<span style="color: black;">&#41;</span>
fmt = <span style="color: #483d8b;">'{:15} {:&gt;3}'</span>
<span style="color: #ff7700;font-weight:bold;">for</span> day, frequency <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">sorted</span><span style="color: black;">&#40;</span>counter.<span style="color: black;">items</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>, key=itemgetter<span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>,
        reverse=<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span>fmt.<span style="color: black;">format</span><span style="color: black;">&#40;</span>day, frequency<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span>fmt.<span style="color: black;">format</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'Total'</span>, <span style="color: #008000;">sum</span><span style="color: black;">&#40;</span>counter.<span style="color: black;">values</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Check out the Python 3.1 goodness: <a class="reference external" href="http://docs.python.org/dev/py3k/whatsnew/3.1.html#other-language-changes">automatically numbered format strings</a> and <a class="reference external" href="http://docs.python.org/dev/py3k/whatsnew/3.1.html#new-improved-and-deprecated-modules">collections.Counter</a>. It&#8217;s also a relief that I no longer have to type <tt class="docutils literal"><span class="pre">iter</span></tt>, <tt class="docutils literal"><span class="pre">x</span></tt>, or <tt class="docutils literal"><span class="pre">raw</span></tt> &#8212; Python is even prettier without the 2.x cruft.</p>
<p>Turns out that my original day of birth is still in the lead!</p>
<div class="section" id="updates">
<h3>Updates</h3>
<ol class="arabic simple">
<li>Added dict comprehensions per astute comments. :-)</li>
<li>Replaced lambda argument to <tt class="docutils literal"><span class="pre">sorted</span></tt> with the more readable <tt class="docutils literal"><span class="pre">operator.itemgetter</span></tt>, which I didn&#8217;t realize exists! Thanks <a class="reference external" href="http://blog.cdleary.com/2009/07/python-3-1-you-shouldnt-have/#comment-1145">&#64;xtian</a>.</li>
</ol>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id2" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1"><tt>[*]</tt></a></td>
<td>Note that the script assumes your birthday has already occurred this year.</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/VaporWarning-Python/~4/HTNruhqt5YU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2009/07/python-3-1-you-shouldnt-have/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Registry pattern trumps import magic</title>
		<link>http://blog.cdleary.com/2009/06/registry-pattern-trumps-import-magic/</link>
		<comments>http://blog.cdleary.com/2009/06/registry-pattern-trumps-import-magic/#comments</comments>
		<pubDate>Mon, 01 Jun 2009 19:00:37 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Design Patterns]]></category>
		<category><![CDATA[Dynamism]]></category>
		<category><![CDATA[Jargon]]></category>
		<category><![CDATA[Magic]]></category>
		<category><![CDATA[Principle of Least Surprise]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=562</guid>
		<description><![CDATA[[...] Today's more flexible languages provide the programmer with a significant amount of power at runtime, making the barrier to "accidental magic" much lower. As a programmer who works with dynamic languages, there's an important responsibility to keep in mind: err on the side of caution with the `Principle of Least Surprise`. [...]]]></description>
			<content:encoded><![CDATA[<p>The other night I saw an interesting tweet in the <a class="reference external" href="http://twitter.com/#search?q=%23Python">#Python Twitter channel</a> &#8212; <a class="reference external" href="http://twitter.com/pvankouteren">Patrick</a> was <a class="reference external" href="http://www.vankouteren.eu/blog/?p=62">looking to harness the dynamism</a> of a language like Python in a way that many <a class="reference external" href="http://en.wikipedia.org/wiki/Pythonista#Neologisms">Pythonistas</a> would consider magical. <a class="footnote-reference" href="#id3" id="id1"><tt>[*]</tt></a> Coming from languages with more rigid execution models, it&#8217;s understandably easy to confuse <em>dynamic</em> and <em>magical</em>. <a class="footnote-reference" href="#id4" id="id2"><tt>[†]</tt></a></p>
<div class="section" id="what-is-magic">
<h3>What is magic?</h3>
<p>To quote the <a class="reference external" href="http://catb.org/jargon/html/M/magic.html">jargon file</a>, magic is:</p>
<blockquote><p>
Characteristic of something that works although no one really understands why (this is especially called <a class="reference external" href="http://catb.org/jargon/html/B/black-magic.html">black magic</a>).</p></blockquote>
<p>Taken in the context of programming, magic refers to code that works without a straightforward way of determining <em>why</em> it works.</p>
<p>Today&#8217;s more <em>flexible</em> languages provide the programmer with a significant amount of power at <em>runtime</em>, making the barrier to &quot;accidental magic&quot; much lower. As a programmer who works with dynamic languages, there&#8217;s an important responsibility to keep in mind: err on the side of caution with the <a class="reference external" href="http://www.faqs.org/docs/artu/ch11s01.html">Principle of Least Surprise</a>.</p>
<blockquote><p>
[T]o design usable interfaces, it&#8217;s best when possible not to design an entire new interface model. Novelty is a barrier to entry; it puts a learning burden on the user, so minimize it.</p></blockquote>
<p>This principle indicates that <strong>using well known design patterns and language idioms is a &quot;best practice&quot; in library design.</strong> When you follow that guideline, people will already have an understanding of the interface that you&#8217;re providing; therefore, they will have one less thing to worry about in leveraging your library to write their code.</p>
</div>
<div class="section" id="discovery-mechanism-proposals">
<h3>Discovery Mechanism Proposals</h3>
<p>Patrick is solving a common category of problem: he wants to allow clients to <em>flexibly extend</em> his <a class="reference external" href="http://www.vankouteren.eu/blog/?p=62">parsing library&#8217;s capabilities</a>. For example, if his module knows how to parse <tt class="docutils literal"><span class="pre">xml</span></tt> and <tt class="docutils literal"><span class="pre">yaml</span></tt> files <a class="reference external" href="http://en.wikipedia.org/wiki/Out_of_the_box">out of the box</a>, programmers using his library should be able to add their own <tt class="docutils literal"><span class="pre">rst</span></tt> and <tt class="docutils literal"><span class="pre">html</span></tt> parser capabilities with ease.</p>
<p><strong>Patrick&#8217;s proposal</strong> is this:</p>
<ul class="simple">
<li>Have the programmer place all extension modules that might contain parser classes in a known directory.</li>
<li>In a factory class constructor, take a directory listing of the known directory.</li>
<li>Import every module present in that listing.</li>
<li>Inspect each module imported this way for class members.</li>
<li>For each class found, add it to an accumulator if it inherits from a <tt class="docutils literal"><span class="pre">Parser</span></tt> abstract base class provided by the module.</li>
</ul>
<p>If you <em>were</em> to do this, you would use the various utilities in the <a class="reference external" href="http://docs.python.org/library/imp.html#module-imp">imp module</a> to load the modules dynamically, then determine the appropriate classes via the <a class="reference external" href="http://docs.python.org/library/inspect.html#inspect.getmembers">inspect module</a>. <a class="footnote-reference" href="#id7" id="id6"><tt>[‡]</tt></a></p>
<p><strong>My counter-proposal</strong> is this, which is also known as the <a class="reference external" href="http://martinfowler.com/eaaCatalog/registry.html">Registry Pattern</a>, a form of runtime configuration and behavior extension:</p>
<ul class="simple">
<li>Have the programmer import a decorator from our module.</li>
<li>Let them decorate any class <a class="footnote-reference" href="#id9" id="id8"><tt>[§]</tt></a> that conforms to the implicit <tt class="docutils literal"><span class="pre">Parser</span></tt> interface.</li>
</ul>
<p><strong>Parser library:</strong></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> UnknownMimetypeException<span style="color: black;">&#40;</span><span style="color: #008000;">Exception</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">pass</span>
<span style="color: #ff7700;font-weight:bold;">class</span> ParseError<span style="color: black;">&#40;</span><span style="color: #008000;">Exception</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">pass</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> IParser:
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;&quot;Reference interface for parser classes -- inheritance is not
    necessary.&quot;&quot;&quot;</span>
&nbsp;
    parseable_mimetypes = <span style="color: #008000;">set</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">file</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: #008000;">file</span> = <span style="color: #008000;">file</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">doctree</span> = <span style="color: #008000;">None</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> parse<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #483d8b;">&quot;&quot;&quot;Parse :ivar:`file` and place the parsed document tree into
        :ivar:`doctree`.
        &quot;&quot;&quot;</span>
        <span style="color: #ff7700;font-weight:bold;">raise</span> <span style="color: #008000;">NotImplementedError</span>
&nbsp;
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> ParserFacade:
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;&quot;Assumes that there can only be one parser per mimetype.
    :ivar mimetype_to_parser_cls: Storage for parser registry.
    &quot;&quot;&quot;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">mimetype_to_parser_cls</span> = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> register_parser<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, cls<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">for</span> mimetype <span style="color: #ff7700;font-weight:bold;">in</span> cls.<span style="color: black;">parseable_mimetypes</span>:
            <span style="color: #008000;">self</span>.<span style="color: black;">mimetype_to_parser_cls</span><span style="color: black;">&#91;</span>mimetype<span style="color: black;">&#93;</span> = cls
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> parse<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">file</span>, mimetype<span style="color: black;">&#41;</span>:
        <span style="color: #483d8b;">&quot;&quot;&quot;Determine the appropriate parser for the mimetype, create a
        parser to parse the file, and perform the parsing.
&nbsp;
        :return: The parser object.
        &quot;&quot;&quot;</span>
        <span style="color: #ff7700;font-weight:bold;">try</span>:
            parser_cls = <span style="color: #008000;">self</span>.<span style="color: black;">mimetype_to_parser_cls</span><span style="color: black;">&#91;</span>mimetype<span style="color: black;">&#93;</span>
        <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">KeyError</span>:
            <span style="color: #ff7700;font-weight:bold;">raise</span> UnknownMimetypeException<span style="color: black;">&#40;</span>mimetype<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #dc143c;">parser</span> = parser_cls<span style="color: black;">&#40;</span><span style="color: #008000;">file</span><span style="color: black;">&#41;</span>
        <span style="color: #dc143c;">parser</span>.<span style="color: black;">parse</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;"># May raise ParseError</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #dc143c;">parser</span>
&nbsp;
&nbsp;
default_facade = ParserFacade<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
register_parser = default_facade.<span style="color: black;">register_parser</span>
parse = default_facade.<span style="color: black;">parse</span></pre></div></div>

<p><strong>Client code:</strong></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> parser_lib <span style="color: #ff7700;font-weight:bold;">import</span> register_parser
&nbsp;
@register_parser
<span style="color: #ff7700;font-weight:bold;">class</span> SpamParser:
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;&quot;Parses ``.spam`` files.
    Conforms to implicit parser interface of `parser_lib`.
    &quot;&quot;&quot;</span>
&nbsp;
    parseable_mimetypes = <span style="color: black;">&#123;</span><span style="color: #483d8b;">'text/spam'</span><span style="color: black;">&#125;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">file</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: #008000;">file</span> = <span style="color: #008000;">file</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">doctree</span> = <span style="color: #008000;">None</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> parse<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">raise</span> <span style="color: #008000;">NotImplementedError</span></pre></div></div>

<p>After the client code executes, the <tt class="docutils literal"><span class="pre">SpamParser</span></tt> will then be available for parsing <tt class="docutils literal"><span class="pre">text/spam</span></tt> mimetype files via <tt class="docutils literal"><span class="pre">parser_lib.parse</span></tt>.</p>
<p>Here are some of my considerations in determining which of these is the least magical:</p>
<ul class="simple">
<li>Which <em>interface</em> is the easiest to explain?</li>
<li>Which <em>implementation</em> will be the easiest to explain?</li>
<li>Which is more <em>fragile</em>? (Which is most likely to break when &quot;special case uses&quot; crop up?)</li>
<li>Which is easier to test?</li>
</ul>
</div>
<div class="section" id="magical-allure">
<h3>Magical Allure</h3>
<p>The problem with magic is that it is <a class="reference external" href="http://www.youtube.com/watch?v=PG5RUNlxtkA">freaking cool</a> and it <a class="reference external" href="http://www.youtube.com/watch?v=---HldMdNzE">drives all the ladies crazy</a>. <a class="footnote-reference" href="#id12" id="id10"><tt>[¶]</tt></a> As a result, the right hemisphere of your developer-brain yearns for your library clients to read instructions like:</p>
<blockquote>
<p>Drag and drop your Python code into my directory &#8212; I&#8217;ll take care of it from there.</p>
<p>That&#8217;s right, that&#8217;s all there is to it.</p>
<p>Oh, I know what you&#8217;re thinking &#8212; yes, I&#8217;m available &#8212; check out <tt class="docutils literal"><span class="pre">parser_lib.PHONE_NUMBER</span></tt> and give me a call sometime.</p>
</blockquote>
<p>But, as you envision phone calls from sexy Pythonistas, the left hemisphere of your brain is screaming at the top of its lungs! <a class="footnote-reference" href="#id13" id="id11"><tt>[#]</tt></a></p>
<p>Magic leaves the audience wondering how the trick is done, and the analytical side of the programmer mind <em>hates</em> that. It implies that there&#8217;s a non-trivial abstraction somewhere that does reasonably complex things, but it&#8217;s unclear where it can be found or how to leverage it differently.</p>
<p>Coders need control and understanding of their code and, by extension, as much control and understanding over third party code as is reasonably possible. Because of this, concise, <a class="reference external" href="http://en.wikipedia.org/wiki/Loosely_coupled">loosely coupled</a>, and extensible abstractions are always preferred to the imposition of elaborate usage design ideas on clients of your code. It&#8217;s best to assume that people will want to leverage the <em>functionality</em> your code provides, but that you can&#8217;t foresee the <em>use cases</em>.</p>
</div>
<div class="section" id="to-reiterate-dynamic-does-not-imply-magical">
<h3>To Reiterate: Dynamic does not Imply Magical</h3>
<p>Revisiting my opening point: anecdotal evidence suggests that some members of the static typing camp see we programming-dynamism dynamos as anarchic lovers of programming <em>chaos</em>. Shoot-from-the-hip cowboys, strolling into lawless towns of code, type checking blowing by the vacant sheriff&#8217;s station as tumbleweeds in the wind. (Enough imagery for you?) With this outlook, it&#8217;s easy to see why you would start doing all sorts of fancy things when you cross into dynamism town &#8212; little do you know, we don&#8217;t take kindly to that &#8217;round these parts.</p>
<p>In other, more intelligble words, this is a serious misconception &#8212; dynamism isn&#8217;t a free pass to disregard the Principle of Least Surprise &#8212; dynamism proponents still want order in the programming universe. Perhaps we value our sanity even <em>more</em>! The key insight is that programming dynamism does <em>allow</em> you additional flexibility when it&#8217;s <em>required</em> or <em>practical</em> to use. More rigid execution models require you to use workarounds, laboriously at times, for a similar degree of flexibility.</p>
<p>As demonstrated by <a class="reference external" href="http://blog.cdleary.com/2009/04/monstrous-polymorphism-and-a-python-post-import-hook-decorator/#comment-1066">Marius&#8217; comment</a> in <a class="reference external" href="http://blog.cdleary.com/2009/04/monstrous-polymorphism-and-a-python-post-import-hook-decorator/">my last entry</a>, Python coders have a healthy respect for the power of <a class="reference external" href="http://docs.python.org/tutorial/classes.html#instance-objects">late binding</a>, <a class="reference external" href="http://docs.python.org/tutorial/modules.html#more-on-modules">arbitrary code execution on module import</a>, and <a class="reference external" href="http://docs.python.org/library/os.html#module-os">seamless platform integration</a>. Accompanying this is a healthy <em>wariness</em> of black magic.</p>
</div>
<div class="section" id="caveat">
<h3>Caveat</h3>
<p>It&#8217;s possible that Patrick was developing a closed-system application (e.g. the <a class="reference external" href="http://www.eclipse.org/">Eclipse IDE</a>) and <em>not</em> a library like I was assuming.</p>
<p>In the <em>application</em> case, extensions <em>are</em> typically discovered (though not necessarily activated) by enumerating a directory. When the user activates such an extension, the modules found within it are loaded into the application. This is the commonly found <a class="reference external" href="http://en.wikipedia.org/wiki/Plugin#Mechanism">plugin model</a> &#8212; it&#8217;s typically more difficult to wrap the application interface and do configurations at load time, so the application developer must provide an extension hook.</p>
<p>However, the registration pattern should <em>still</em> be preferred to reflection in this case! When the extension is activated and the extension modules load, the registration decorator will be executed along with all the other top-level code in the extension modules.</p>
<p>The extension has the capability to inform the application of the extension&#8217;s functionality instead having the application query the plugin for its capabilities. This is a form of loosely coupled <em>cooperative configuration</em> that eases the burden on the application and eliminates the requirement to foresee needs of the extensions. <a class="footnote-reference" href="#id15" id="id14"><tt>[♠]</tt></a></p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id3" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1"><tt>[*]</tt></a></td>
<td>Note that you can&#8217;t call it <a class="reference external" href="http://en.wikipedia.org/wiki/Dynamic_programming">dynamic programming</a>, as that would alias a well known term from the branch of computer science concerned with algorithms. <a class="reference external" href="http://en.wikipedia.org/wiki/Dynamic_programming_language">Programming language dynamism</a> it is!</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id4" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id2"><tt>[†]</tt></a></td>
<td>Much like a dehydrated wanderer in the desert mistakes a shapely pile of sand for an oasis!</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id7" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id6"><tt>[‡]</tt></a></td>
<td>As of the date of this publishing, Patrick&#8217;s implementation seems to have gone a bit astray with text processing of Python source files. Prefer dynamic module loading and inspection to text processing source code! Enumerating the reasons this is preferred is beyond the scope of this article.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id9" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id8"><tt>[§]</tt></a></td>
<td>
<p class="first">In Python &lt; 3.0 you can perform class decoration without the decorator syntax. Decorator syntax is just syntactic sugar for &quot;invoke this method and rebind the identifier in this scope&quot;, like so:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> SomeClass<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">pass</span>
SomeClass = my_class_decorator<span style="color: black;">&#40;</span>SomeClass<span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;"># Decorate the class.</span></pre></div></div>

</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id12" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id10"><tt>[¶]</tt></a></td>
<td>Perhaps men as well, but I&#8217;ve never seen any TV evidence to justify that conclusion.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id13" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id11"><tt>[#]</tt></a></td>
<td>Yes, in this analogy brains have lungs. If you&#8217;ve read this far you&#8217;re probably not a biologist anyway.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id15" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id14"><tt>[♠]</tt></a></td>
<td>Of course, the plugin model always has security implications. Unless you go out of your way to make a sandboxed Python environment for plugins, you need to trust the plugins that you activate &#8212; they have the ability to execute arbitrary code.</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/VaporWarning-Python/~4/iILxzenmxrw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2009/06/registry-pattern-trumps-import-magic/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Monstrous polymorphism and a Python post-import hook decorator</title>
		<link>http://blog.cdleary.com/2009/04/monstrous-polymorphism-and-a-python-post-import-hook-decorator/</link>
		<comments>http://blog.cdleary.com/2009/04/monstrous-polymorphism-and-a-python-post-import-hook-decorator/#comments</comments>
		<pubDate>Wed, 15 Apr 2009 18:00:29 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Operating Systems]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Dynamism]]></category>
		<category><![CDATA[Object Orientation]]></category>
		<category><![CDATA[Polymorphism]]></category>
		<category><![CDATA[Reflection]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=521</guid>
		<description><![CDATA[[...] This entry is a small walk-through for code to detect interface conformity by inspection, enumerate the classes in the environment, manipulate classes in place, and add an import hook to manipulate classes loaded from future modules. [...]]]></description>
			<content:encoded><![CDATA[<p>I queue up a few thousand things to do before I get on an airplane: synchronize two-thousand <a href='http://www.google.com/reader'>Google Reader</a> entries, load up a bunch of websites I&#8217;ve been meaning to read, and make sure for-fun projects are pulled from their most updated branches.</p>
<p>Then, once I get up in the air, I realize that <strong>I don&#8217;t really want to do 90% of those things crammed into a seat with no elbow room.</strong> I end up doing one or two. Along with reading <a href='http://steve.yegge.googlepages.com/when-polymorphism-fails'>Stevey&#8217;s Drunken Blog Rant: When Polymorphism Fails</a>, this entry is all the productivity I can claim. The <a href='http://bitbucket.org/cdleary/vaporwarning-monsters/'>full code repository for this entry</a> is online if you&#8217;d like to follow along.</p>
<h3>Polymorphism Recap</h3>
<p>The word &#8220;polymorphic&#8221; comes from Greek roots meaning &#8220;many shaped.&#8221; (Or they lied to me in school &#8212; one of those.) From a worldly perspective I can see this meaning two things:</p>
<ul>
<li>A single object can take on many shapes, or</li>
<li>Requirements for a general &#8220;shape&#8221; can be satisfied by different categories of objects.</li>
</ul>
<p>As it turns out, both of these concepts apply to the Object-Oriented programming, but the canonical meaning is the latter. <tt>[*]</tt> As Yegge says:</p>
<blockquote><p>If you have a bunch of similar objects [...], and they&#8217;re all supposed to respond differently to some situation, then you add a virtual method to them and implement it differently for each object.</p></blockquote>
<p>(If you don&#8217;t know what a virtual method is, <a href="http://en.wikipedia.org/wiki/Polymorphism_in_object-oriented_programming">the Wikipedia page</a> has an alternate explanation.)</p>
<h3>Yegge&#8217;s Example</h3>
<p>Yegge demonstrates that strictly adhering to the principles of polymorphism does not always produce the best design:</p>
<blockquote><p>
Let&#8217;s say you&#8217;ve got a big installed base of monsters. [...] Now let&#8217;s say one of your users wants to come in and write a little OpinionatedElf monster. [...] Let&#8217;s say the OpinionatedElf&#8217;s sole purpose in life is to proclaim whether it likes other monsters or not. It sits on your shoulder, and whenever you run into, say, an Orc, it screams bloodthirstily: &#8220;I hate Orcs!!! Aaaaaargh!!!&#8221; (This, incidentally, is how I feel about C++.)</p>
<p>The polymorphic approach to this problem is simple: go through every one of your 150 monsters and add a doesMrOpinionatedElfHateYou() method.
</p></blockquote>
<p>This is a great counterexample &#8212; it induces an instant recognition of absurdity.</p>
<p>He then touches on the fact that dynamic languages allow you to do neat things consistent with polymorphism due to the flexibility of the object structure (which is typically just a hash map from identifiers to arbitrary object values):</p>
<blockquote><p>
I guess if you could somehow enumerate all the classes in the system, and check if they derive from Monster, then you could do this whole thing in a few lines of code. In Ruby, I bet you can&#8230; but only for the already-loaded classes. It doesn&#8217;t work for classes still sitting on disk! You could solve that, but then there&#8217;s the network&#8230;
</p></blockquote>
<p>This is clearly impractical, but I figured there was some exploratory value to implementing this challenge in Python. This entry is a small walk-through for code to detect interface conformity by inspection, enumerate the classes in the environment, manipulate classes in place, and add an import hook to manipulate classes loaded from future modules.</p>
<h3>The Antagonist</h3>
<p>Double entendre intended. :-)</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> OpinionatedElf<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
&nbsp;
    is_liked_by_class_name = <span style="color: black;">&#123;</span>
        <span style="color: #483d8b;">'OpinionatedElf'</span>: <span style="color: #008000;">True</span>,
        <span style="color: #483d8b;">'Orc'</span>: <span style="color: #008000;">False</span>,
        <span style="color: #483d8b;">'Troll'</span>: <span style="color: #008000;">False</span><span style="color: black;">&#125;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, name<span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">name</span> = name
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> be_scary<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;I'm small, ugly, and don't like the cut of your jib!&quot;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> proclaim_penchance<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, other<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> IMonster.<span style="color: black;">is_conforming</span><span style="color: black;">&#40;</span>other<span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;I can't even tell what that is!&quot;</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">return</span>
        is_liked = other.<span style="color: black;">is_liked_by_elf</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        class_name = other.__class__.__name__
        <span style="color: #ff7700;font-weight:bold;">if</span> is_liked <span style="color: #ff7700;font-weight:bold;">is</span> <span style="color: #008000;">None</span>:
            <span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;I'm not sure how I feel about %s&quot;</span> <span style="color: #66cc66;">%</span> class_name<span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">return</span>
        emotion = <span style="color: #483d8b;">'love'</span> <span style="color: #ff7700;font-weight:bold;">if</span> is_liked <span style="color: #ff7700;font-weight:bold;">else</span> <span style="color: #483d8b;">'hate'</span>
        <span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'I %s %s!!! Aaaaaargh!!!'</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>emotion, other.__class__.__name__<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></div></div>

<h3>Determining which Classes are Monsters</h3>
<p>First of all, Python doesn&#8217;t require (nor does it encourage) a rigid type hierarchy. Python&#8217;s all about the interfaces, which are often implicit. <strong>Step one is to create a way to recognize classes that implement the monster interface</strong>: <tt>[%]</tt></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> IMonster<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
&nbsp;
    required_methods = <span style="color: black;">&#91;</span><span style="color: #483d8b;">'be_scary'</span><span style="color: black;">&#93;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> be_scary<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">raise</span> <span style="color: #008000;">NotImplementedError</span>
&nbsp;
    @<span style="color: #008000;">classmethod</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> is_conforming<span style="color: black;">&#40;</span>cls, <span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
        result = <span style="color: #008000;">all</span><span style="color: black;">&#40;</span><span style="color: #008000;">callable</span><span style="color: black;">&#40;</span><span style="color: #008000;">getattr</span><span style="color: black;">&#40;</span><span style="color: #008000;">object</span>, attr_name, <span style="color: #008000;">None</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">for</span> attr_name <span style="color: #ff7700;font-weight:bold;">in</span> cls.<span style="color: black;">required_methods</span><span style="color: black;">&#41;</span>
        <span style="color: #dc143c;">logging</span>.<span style="color: black;">debug</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'%s conforms? %s'</span>, <span style="color: #008000;">object</span>, result<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> result
&nbsp;
<span style="color: #ff7700;font-weight:bold;">assert</span> IMonster.<span style="color: black;">is_conforming</span><span style="color: black;">&#40;</span>IMonster<span style="color: black;">&#41;</span></pre></div></div>

<p>This is a simple little class &#8212; there are better third party libraries to use if you want <a href="http://wiki.zope.org/Interfaces/FrontPage">real interface functionality</a> (i.e. more generic conformity testing and <a href="http://en.wikipedia.org/wiki/Design_by_contract#Description">Design By Contract</a>).</p>
<h3>Enumerating the Classes in the Environment</h3>
<p>All of the modules that have been loaded into the Python environment are placed into <a href='http://docs.python.org/library/sys.html#sys.modules'><tt>sys.modules</tt></a>. By inspecting each of these modules, we can manipulate the classes contained inside if they conform to our monster interface.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">for</span> name, module <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">sys</span>.<span style="color: black;">modules</span>.<span style="color: black;">iteritems</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    extend_monsters<span style="color: black;">&#40;</span>module<span style="color: black;">&#41;</span></pre></div></div>

<p>The <tt>extend_monsters</tt> function is a bit nuanced because immutable modules also live in <tt>sys.modules</tt>. We skip those, along with abstract base classes, which have trouble with <tt>inspect.getmembers()</tt>:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> extend_monsters<span style="color: black;">&#40;</span>module, extension_tag=<span style="color: #483d8b;">'_opinionated_extended'</span><span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Extend monsters in the module's top-level namespace to
    tell if they are liked by the :class:`OpinionatedElf`.
    and tag it with the :param:`extension_tag` as a flag name.
    Do not attempt to extend already-flagged modules.
    Do not clobber existing methods with the extension method name.
&nbsp;
    Warning: swallows exceptional cases where :param:`module`
        is builtin, frozen, or None.
    &quot;&quot;&quot;</span> 
    name = module.__name__ <span style="color: #ff7700;font-weight:bold;">if</span> module <span style="color: #ff7700;font-weight:bold;">else</span> <span style="color: #008000;">None</span>
    <span style="color: #dc143c;">logging</span>.<span style="color: black;">info</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'Instrumenting module %s'</span>, name<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> module <span style="color: #ff7700;font-weight:bold;">or</span> <span style="color: #dc143c;">imp</span>.<span style="color: black;">is_builtin</span><span style="color: black;">&#40;</span>name<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">or</span> <span style="color: #dc143c;">imp</span>.<span style="color: black;">is_frozen</span><span style="color: black;">&#40;</span>name<span style="color: black;">&#41;</span> \
            <span style="color: #ff7700;font-weight:bold;">or</span> <span style="color: #008000;">getattr</span><span style="color: black;">&#40;</span>module, extension_tag, <span style="color: #008000;">False</span><span style="color: black;">&#41;</span>:
        <span style="color: #dc143c;">logging</span>.<span style="color: black;">info</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'Skipping module: %s'</span>, name<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span>
    module._opinionated_instrumented = <span style="color: #008000;">True</span>
    classes = <span style="color: #dc143c;">inspect</span>.<span style="color: black;">getmembers</span><span style="color: black;">&#40;</span>module, <span style="color: #dc143c;">inspect</span>.<span style="color: black;">isclass</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> name, cls <span style="color: #ff7700;font-weight:bold;">in</span> classes:
        <span style="color: #dc143c;">logging</span>.<span style="color: black;">debug</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'%s: %s'</span>, name, cls<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">try</span>:
            conforming = IMonster.<span style="color: black;">is_conforming</span><span style="color: black;">&#40;</span>cls<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">AttributeError</span>, e:
            <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #483d8b;">'__abstractmethods__'</span> <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>e<span style="color: black;">&#41;</span>: <span style="color: #808080; font-style: italic;"># Abstract class.</span>
                <span style="color: #ff7700;font-weight:bold;">continue</span>
            <span style="color: #ff7700;font-weight:bold;">raise</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> conforming:
            <span style="color: #ff7700;font-weight:bold;">continue</span>
        class_name = cls.__name__
        <span style="color: #dc143c;">logging</span>.<span style="color: black;">debug</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'Instrumenting class %s'</span>, class_name<span style="color: black;">&#41;</span>
        attr_name = <span style="color: #483d8b;">'is_liked_by_elf'</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">hasattr</span><span style="color: black;">&#40;</span>cls, attr_name<span style="color: black;">&#41;</span>: <span style="color: #808080; font-style: italic;"># Don't clobber existing methods.</span>
            <span style="color: #dc143c;">logging</span>.<span style="color: black;">warn</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'Method already exists: %s'</span>, cls<span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">continue</span>
        <span style="color: #dc143c;">logging</span>.<span style="color: black;">info</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'Setting %s on %s'</span>, attr_name, class_name<span style="color: black;">&#41;</span>
        <span style="color: #008000;">setattr</span><span style="color: black;">&#40;</span>cls, attr_name,
            <span style="color: #ff7700;font-weight:bold;">lambda</span> <span style="color: #008000;">self</span>: OpinionatedElf.<span style="color: black;">is_liked_by_class_name</span>.<span style="color: black;">get</span><span style="color: black;">&#40;</span>
                <span style="color: #008000;">self</span>.__class__.__name__, <span style="color: #008000;">None</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>If we were going to be thorough, we would recurse on the members of the class to see if the class scope was enclosing any more <tt>IMonster</tt> classes, but you&#8217;re never really going to find them all: if a module defines a monster class in a function-local scope, there&#8217;s no good way to get the local class statement and modify it through inspection.</p>
<p>In any case, we&#8217;re at the point where we can modify all monsters in the top-level namespace of already-loaded modules.  What about modules that we have yet to load?</p>
<h3>Post-import Hook</h3>
<p>There is no standard post-import hook (that I know of) in Python. <a href='http://www.python.org/dev/peps/pep-0369/'>PEP 369</a> looks promising, but I couldn&#8217;t find any record of additional work being done on it. The current import hooks, described in <a href='http://www.python.org/dev/peps/pep-0302/'>PEP 302</a>, are all pre-import hooks. As such, you have to <em>decorate</em> the <tt>__import__</tt> builtin, wrapping the original with your intended post-input functionality, like so: <tt>[^]</tt></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> import_decorator<span style="color: black;">&#40;</span>old_import, post_processor<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;
    :param old_import: The import function to decorate, most likely
        ``__builtin__.__import__``.
    :param post_processor: Function of the form
        `post_processor(module) -&gt; module`.
    :return: A new import function, most likely to be assigned to
        ``__builtin__.__import__``.
    &quot;&quot;&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">assert</span> <span style="color: #008000;">all</span><span style="color: black;">&#40;</span><span style="color: #008000;">callable</span><span style="color: black;">&#40;</span>fun<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> fun <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: black;">&#40;</span>old_import, post_processor<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> new_import<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>args, <span style="color: #66cc66;">**</span>kwargs<span style="color: black;">&#41;</span>:
        module = old_import<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>args, <span style="color: #66cc66;">**</span>kwargs<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> post_processor<span style="color: black;">&#40;</span>module<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> new_import</pre></div></div>

<p>After which we can replace the old <tt>__import__</tt> with its decorated counterpart:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #dc143c;">__builtin__</span>.<span style="color: #008000;">__import__</span> = import_decorator<span style="color: black;">&#40;</span><span style="color: #dc143c;">__builtin__</span>.<span style="color: #008000;">__import__</span>,
    extend_monsters<span style="color: black;">&#41;</span></pre></div></div>

<h3>The Network</h3>
<p>Yegge brings up the issue of dynamically generated classes by mentioning network communications, calling to mind examples such as <a href="http://en.wikipedia.org/wiki/Java_remote_method_invocation">Java&#8217;s RMI</a> and <a href="http://en.wikipedia.org/wiki/CORBA">CORBA</a>. This is a scary place to go, even just conceptualizing. If metaclasses are used, I don&#8217;t see any difficulty in decorating <tt>__new__</tt> with the same kind of inspection we employed above; however, code generation presents potentially insurmountable problems.</p>
<p>Decorating <a href="http://docs.python.org/library/functions.html#eval">the <tt>eval</tt> family of functions</a> to modify new classes created seems <em>possible</em>, but it would be challenging and requires additional research on my part. <tt>exec</tt> is a keyword/statement, which I would think is a hopeless cause.</p>
<h3>Footnotes</h3>
<p><tt>[*]</tt> In accordance with the former, an object can implement many interfaces.<br />
<tt>[^]</tt> This function is actually a generic decorate-with-post-processing closure, but I added the references to import for more explicit documentation.</p>
<img src="http://feeds.feedburner.com/~r/VaporWarning-Python/~4/tMaEEmFVDb4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2009/04/monstrous-polymorphism-and-a-python-post-import-hook-decorator/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
