<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">

<channel>
	<title>Honest to a Segfault</title>
	
	<link>http://blog.cdleary.com</link>
	<description>__author__ = 'Chris Leary'</description>
	<lastBuildDate>Sun, 05 Sep 2010 20:41:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/cdleary" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="cdleary" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>A prototypal binding trap</title>
		<link>http://blog.cdleary.com/2010/09/a-prototypal-binding-trap/</link>
		<comments>http://blog.cdleary.com/2010/09/a-prototypal-binding-trap/#comments</comments>
		<pubDate>Sun, 05 Sep 2010 20:29:30 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Languages]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Prototypes]]></category>
		<category><![CDATA[Trap]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=905</guid>
		<description><![CDATA[It always pains me to explain these little identifier resolution traps: #!/usr/bin/env python3 &#160; class Egg: &#160; _next_id = 1 &#160; def __init__&#40;self&#41;: self.id = self._next_id self._next_id += 1 assert Egg._next_id is self._next_id &#160; &#160; if __name__ == '__main__': f = Egg&#40;&#41; Fails the assertion. It&#8217;s decomposing the assignment-update into its constituent tmp = self._next_id [...]]]></description>
			<content:encoded><![CDATA[<p>It always pains me to explain these little identifier resolution traps:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/env python3</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> Egg:
&nbsp;
    _next_id = <span style="color: #ff4500;">1</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: #008000;">id</span> = <span style="color: #008000;">self</span>._next_id
        <span style="color: #008000;">self</span>._next_id += <span style="color: #ff4500;">1</span>
        <span style="color: #ff7700;font-weight:bold;">assert</span> Egg._next_id <span style="color: #ff7700;font-weight:bold;">is</span> <span style="color: #008000;">self</span>._next_id
&nbsp;
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>:
    f = Egg<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Fails the assertion. It&#8217;s decomposing the assignment-update into its constituent <tt class="docutils literal"><span class="pre">tmp</span> <span class="pre">=</span> <span class="pre">self._next_id</span> <span class="pre">+</span> <span class="pre">1;</span> <span class="pre">self._next_id</span> <span class="pre">=</span> <span class="pre">tmp</span></tt> components, but a programmer could reasonably expect a Lookup/Update hash map ADT operation to occur instead &#8212; get the slot by lookup, mutate the value found, and store back in an atomic sense, clobbering the class member with the updated value &#8212; but that&#8217;s <strong>not</strong> how it works.</p>
<p>This goes with the prototypal territory:</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> Egg<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066; font-weight: bold;">this</span>.<span style="color: #660066;">id</span> <span style="color: #339933;">=</span> <span style="color: #000066; font-weight: bold;">this</span>.<span style="color: #660066;">next_id</span><span style="color: #339933;">;</span>
    <span style="color: #000066; font-weight: bold;">this</span>.<span style="color: #660066;">next_id</span> <span style="color: #339933;">+=</span> <span style="color: #CC0000;">1</span><span style="color: #339933;">;</span>
    assertEq<span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">this</span>.<span style="color: #660066;">next_id</span><span style="color: #339933;">,</span> Egg.<span style="color: #660066;">prototype</span>.<span style="color: #660066;">next_id</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
Egg.<span style="color: #660066;">prototype</span>.<span style="color: #660066;">next_id</span> <span style="color: #339933;">=</span> <span style="color: #CC0000;">1</span><span style="color: #339933;">;</span>
<span style="color: #003366; font-weight: bold;">var</span> e <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">new</span> Egg<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #006600; font-style: italic;">// Error: Assertion failed: got 2, expected 1</span></pre></div></div>

<p>Fortunately, note that this behavior definitely has the semantics you want when you add inheritance to the mix. Is the <tt class="docutils literal"><span class="pre">self.viscosity</span></tt> that this class implementation is referring to a class member or an instance member in the base class?</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/env python3</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> sauce
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> AwesomeSauce<span style="color: black;">&#40;</span>sauce.<span style="color: black;">Sauce</span><span style="color: black;">&#41;</span>:
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">super</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">viscosity</span> += <span style="color: #ff4500;">12</span> <span style="color: #808080; font-style: italic;"># Deca-viscoses per milliliter.</span></pre></div></div>

<p>The answer is that you don&#8217;t care &#8212; it&#8217;s being rebound on the <tt class="docutils literal"><span class="pre">AwesomeSauce</span></tt> instance no matter what.</p>
<p>Moral of the story is to be mindful when using update operations on class members &#8212; if you want to rebind a class member within an instance method, you&#8217;ve got to use the class name instead of <tt>self</tt>.</p>
<img src="http://feeds.feedburner.com/~r/cdleary/~4/QfHA2bgNl5I" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/09/a-prototypal-binding-trap/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>House rental, showers, and auctions</title>
		<link>http://blog.cdleary.com/2010/07/house-rental-showers-and-auctions/</link>
		<comments>http://blog.cdleary.com/2010/07/house-rental-showers-and-auctions/#comments</comments>
		<pubDate>Fri, 30 Jul 2010 07:32:19 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Life]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Auction]]></category>
		<category><![CDATA[Efficiency]]></category>
		<category><![CDATA[Peopleware]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=898</guid>
		<description><![CDATA[I&#8217;m currently in the prologue of the house-rental process. Our sights are set on a four-bedroom house with an interesting property: the room resources are of differing value &#8212; not in terms of square-footage, but other, less easily scalable factors. One bedroom has an attached bathroom and a slightly larger closet (the master bedroom). One [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m currently in the prologue of the house-rental process. Our sights are set on a four-bedroom house with an interesting property: the room resources are of differing value &#8212; <em>not</em> in terms of square-footage, but other, less easily scalable factors.</p>
<ul class="simple">
<li>One bedroom has an attached bathroom and a slightly larger closet (the master bedroom).</li>
<li>One bedroom is on its own floor with a half-bathroom. <a class="footnote-reference" href="#id2" id="id1"><tt>[*]</tt></a></li>
<li>The other two share a full bathroom, whose shower is also a resource shared with the bedroom mentioned above.</li>
</ul>
<p>So how do we dole out the resources of unequal value in a manner that everybody considers fair?</p>
<p>(Talk about first-world problems&#8230;)</p>
<div class="section" id="proposed-solutions">
<h3>Proposed solutions</h3>
<p>Three systems for deciding the room allocations have been proposed, and one has been rejected:</p>
<ul class="simple">
<li>Randomly assign choosing priority: this was dismissed because the person with the last priority inevitably feels &quot;stuck with&quot; a room.</li>
<li>Entirely random: we take four playing cards from a deck, associate each card with a room, shuffle the cards until everybody is satisfied, then deal each person a card. That&#8217;s the room that they get. Everybody is equally &quot;stuck with&quot; a room, and not due to the actions of any other person.</li>
<li>Bidding system: the auction would move from perceived-most-valuable-room to perceived-least-valuable-room. Bidding starts with the master bedroom at the evenly-divided rental rate. Once a winner/price is determined for that room, the process is repeated for the room with the half-bathroom, with a newly-divided rental rate. Once a point is reached where nobody is willing to place a bid on a room, the rest are assigned at random. Each person, through their own decision to bid or not to bid, ends up with the room of their choosing.</li>
</ul>
<p>I&#8217;m personally a fan of the simplicity and total lack of competition in the random system, but if the bidding system is more likely to lead to the optimal outcome, it must be considered! (Of course, care must be taken to address possible social ramifications.)</p>
<p>During casual discussion we noted that the bidding system would be strategically interesting. If a person were not <em>really</em> interested in getting a high-value room but didn&#8217;t care all that much if they did, it is clearly in their favor to drive bids on the early rooms as high as possible &#8212; this would entail a decrease in the payment rate for a later room.</p>
<p>I perused the Wikipedia entry on <a class="reference external" href="http://en.wikipedia.org/wiki/Auction_theory">auction theory</a> and realize that we were assuming an open/ascending bid auction, but a first-price/sealed-bid auction might be less socially stressful. <a class="footnote-reference" href="#id5" id="id3"><tt>[†]</tt></a> My main fear is that open bidding would get carried away (it&#8217;s exciting, after all) and participants would come to regret their bids.</p>
<p>In any case, it&#8217;s an interesting problem I thought I would share. Suggestions are welcome; otherwise, I&#8217;ll write a follow-up entry as to how it turns out.</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id2" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1"><tt>[*]</tt></a></td>
<td>This means it has no shower. Stupid terminology, but I suppose it has a nicer ring than &quot;toilet room&quot;.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id5" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id3"><tt>[†]</tt></a></td>
<td>As an aside, the game theory described on that page also seems to discuss provable equilibrium of a single auction, whereas our auctions have dependent equilibrium values.</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/cdleary/~4/7rBReSGGwUU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/07/house-rental-showers-and-auctions/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Coding style as a feature of language design</title>
		<link>http://blog.cdleary.com/2010/07/coding-style-as-a-feature-of-language-design/</link>
		<comments>http://blog.cdleary.com/2010/07/coding-style-as-a-feature-of-language-design/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 16:00:09 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Languages]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Compilers]]></category>
		<category><![CDATA[Data Structures]]></category>
		<category><![CDATA[Parsing]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=886</guid>
		<description><![CDATA[roc recently posted a thought-provoking entry titled, &#34;Coding Style as a Failure of Language Design&#34;, in which he states: Languages already make rules about syntax that are somewhat arbitrary. Projects imposing additional syntax restrictions indicate that the language did not constrain the syntax enough; if the language syntax was sufficiently constrained, projects would not feel [...]]]></description>
			<content:encoded><![CDATA[<p><a class="reference external" href="http://weblogs.mozillazine.org/roc/">roc</a> recently posted a thought-provoking entry titled, <a class="reference external" href="http://weblogs.mozillazine.org/roc/archives/2010/07/coding_style_as.html">&quot;Coding Style as a Failure of Language Design&quot;</a>, in which he states:</p>
<blockquote><p>
Languages already make rules about syntax that are somewhat arbitrary. Projects imposing additional syntax restrictions indicate that the language did not constrain the syntax enough; if the language syntax was sufficiently constrained, projects would not feel the need to do it. Syntax would be uniform within and across projects, and developers would not need to learn multiple variants of the same language.</p></blockquote>
<p>I totally agree with roc&#8217;s point that there is overhead in learning-and-conforming-to local style guidelines. I also agree that this overhead is unnecessary and that language implementers should find ways to eliminate it; however, I think that imposing additional arbitrary constraints on the syntax is heading in the wrong direction.</p>
<p>Your language&#8217;s execution engine <a class="footnote-reference" href="#id6" id="id3"><tt>[*]</tt></a> already has a method of normalizing crazy styles: it forms an abstract syntax tree. Before the abstract syntax tree (AST) is mutated <a class="footnote-reference" href="#id7" id="id4"><tt>[†]</tt></a> it is in perfect correspondence with the original source text, modulo the infinite number of possible formatting preferences. This <em>is</em> the necessary set of constraints on the syntax that can actually result in your program being executed as it is written. <a class="footnote-reference" href="#id8" id="id5"><tt>[‡]</tt></a></p>
<p>So, why don&#8217;t we just lug <em>that</em> thing around instead of the source text itself?</p>
<div class="section" id="the-dream">
<h3>The dream</h3>
<p>The feature that languages should offer is a <a class="reference external" href="http://en.wikipedia.org/wiki/Multiplexer">mux/demux</a> service: mux an infinite number of formatting preferences into an AST (via a traditional parser); demux the AST into source text via an AST-decompiler, parameterized by an arbitrarily large set of formatting options. Language implementations could ship with a pair of standalone binaries. Seriously, the <em>reference language implementation</em> should understand its own formatting parameters at least as well as Eclipse does. <a class="footnote-reference" href="#id11" id="id9"><tt>[§]</tt></a></p>
<p>Once you have the demux tool, you run it on your AST files as a post-checkout hook in your revision control system for instant style personalization. If the engine accepts the AST directly as input, you would only need to demux the files you planned to work on &#8212; if the engine accepted an AST directly as input in lieu of source text, this could even be an optimization.</p>
<p>Different execution engines are likely to use different ASTs, but there should be little problem with composability: checked-in AST goes through standalone demux with an arbitrary set of preferences, then through the alternate compiler&#8217;s mux. So long as the engines have the same language grammar for the source text, everybody&#8217;s happy, and you don&#8217;t have to waste time writing silly AST-to-AST-prime transforms.</p>
<p>In this model, linters are just composable AST observers/transforms that have no ordering dependencies. You could even offer a service for simple grammatical extensions without going so far as <a class="reference external" href="http://en.wikipedia.org/wiki/Perl_6_rules">language level support</a>. Want a block-end delimiter in the Python code you look at? <a class="footnote-reference" href="#id14" id="id12"><tt>[¶]</tt></a> Why not, just use a transform to rip it out before it leaves the front-end of the execution engine.</p>
</div>
<div class="section" id="reality">
<h3>Reality</h3>
<p>Of course, the set of languages we know and love has some overlap with the set of languages that totally suck to parse, whether due to <a class="reference external" href="http://en.wikipedia.org/wiki/C_preprocessor">preprocessors</a> or <a class="reference external" href="http://en.wikipedia.org/wiki/JavaScript_syntax#Whitespace_and_semicolons">context sensitivity</a> or <a class="reference external" href="http://en.wikipedia.org/wiki/Black_Perl">the desire to parse poems</a>, but I would bet good money that there are solutions for such languages. In any case, the symmetric difference between those two sets could get with it, and new languages would be kind to follow suit. It would certainly be an interesting post-FF4 experiment for SpiderMonkey, as we&#8217;ve got a plan on file to clean up the parser interfaces for an <a class="reference external" href="http://wiki.ecmascript.org/doku.php?id=strawman:ast">intriguing ECMAScript strawman proposal</a> anywho.</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id6" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id3"><tt>[*]</tt></a></td>
<td>Interpreter, compiler, translator, whatever.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id7" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id4"><tt>[†]</tt></a></td>
<td>To do constant folding or what have you.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id8" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id5"><tt>[‡]</tt></a></td>
<td>Oh yeah, and comments. We would have to keep those around too. They&#8217;re easy enough to throw away during the first pass over the AST.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id11" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id9"><tt>[§]</tt></a></td>
<td>Even more ideal, you&#8217;d move all of that formatting and autocompletion code out of IDEs into a language service API.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id14" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id12"><tt>[¶]</tt></a></td>
<td>Presumably because you despise all that is good and righteous in the world? ;-)</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/cdleary/~4/PlNy9Evmaws" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/07/coding-style-as-a-feature-of-language-design/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>B&amp;B++: bed and breakfast for programmers</title>
		<link>http://blog.cdleary.com/2010/07/bb-bed-and-breakfast-for-programmers/</link>
		<comments>http://blog.cdleary.com/2010/07/bb-bed-and-breakfast-for-programmers/#comments</comments>
		<pubDate>Tue, 20 Jul 2010 16:00:15 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Sarcasm]]></category>
		<category><![CDATA[Peopleware]]></category>
		<category><![CDATA[Profit]]></category>
		<category><![CDATA[Sprint]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=875</guid>
		<description><![CDATA[1. Collect background This is the latest in my steal-my-idea-but-give-me-free-stuff-after-you-do series, with slightly more earning potential than my last installment, &#34;Strike a Cord&#34;. I recently spoke to some Mozillians who had participated in a &#34;code retreat&#34; &#8212; I&#8217;d only heard tale of such a thing in lore and folk song, but it seems like a [...]]]></description>
			<content:encoded><![CDATA[<div class="section" id="collect-background">
<h3>1. Collect background</h3>
<p>This is the latest in my steal-my-idea-but-give-me-free-stuff-after-you-do series, with slightly more earning potential than my last installment, <a class="reference external" href="http://www.google.com/buzz/cdleary/UhXogs6mJrd/Chris-crazy-idea-for-a-business-5823-Strike-a-Cord">&quot;Strike a Cord&quot;</a>.</p>
<p>I recently spoke to some Mozillians who had participated in a &quot;code retreat&quot; &#8212; I&#8217;d only heard tale of such a thing in lore and folk song, but it seems like a brilliant concept.</p>
<p>The idea is this: a small think tank (of one or more persons) <em>requires</em> a large amount of code <em>throughput</em> on a task which requires a high degree of <em>focus</em>. To facilitate that, they run far from the middling issues of civilized society and deep into the wilderness <a class="footnote-reference" href="#id5" id="id2"><tt>[*]</tt></a> to code &quot;butt-necked in harmony with Eywa&quot;. <a class="footnote-reference" href="#id6" id="id3"><tt>[†]</tt></a> Through single-minded concentration and a dearth of non-maskable interrupts, they emerge victorious. <a class="footnote-reference" href="#id8" id="id4"><tt>[‡]</tt></a></p>
</div>
<div class="section" id="id11">
<h3>2. ?</h3>
<p>Follow these simple steps to steal my idea:</p>
<ol class="arabic simple" start="0">
<li>Assume that the aforementioned code retreat process is awesome.</li>
<li>Make a bed-and-breakfast in the outskirts of a city that&#8217;s attractive to programmers (for whatever reason).</li>
<li>Offer retreats with high-speed internet access, offices with whiteboards, mirra chairs, height-adjustable desks, pay-as-you-go phone conference equipment, high-res DLP projectors, disco balls, whatever. Make it clearly &quot;the works&quot;. If you want to go <em>even further</em>, mount speakers and sound-proof the walls. <a class="footnote-reference" href="#id14" id="id12"><tt>[§]</tt></a></li>
<li>Make the experience as luxurious and classy as reasonably possible so that the programmers respect the &quot;sanctity&quot; of the retreat: chef-prepared meals, an indisputably good coffee machine, a <a class="reference external" href="http://blog.cdleary.com/2010/07/virtues-of-extreme-programming-practices/">Z80</a> prominently featured as a piece of wall art, and a complimentary bag-o-munchy-chips regimen. Beautiful scenery in which one can walk and think would definitely be a plus, and proximity to a nerd-friendly bar never hurt a nerdy establishment either.</li>
</ol>
<p>The patrons have a good degree of flexibility as a result of this setup. They might hole themselves away in offices 95% of the time, emerging only to sleep, gather delicious food, and scuttle back into their offices. Alternatively, if they&#8217;re on a more casual endeavor (coding vacation?), they might choose to strike up conversations with people at meals and go out to see the sites.</p>
</div>
<div class="section" id="profit">
<h3>3. Profit!</h3>
<p>Please do steal my idea and make a lot of money for yourself (share it with no one!) &#8212; I only ask that you offer me a free stay once you get off the ground.</p>
<p>I&#8217;ll leave you off with a little marketing campaign idea:</p>
<blockquote><p>
B&amp;B++: universally evaluated as the way to B, and, after each bed and breakfast, we get a little bit better. Until we overflow. <a class="footnote-reference" href="#id16" id="id15"><tt>[¶]</tt></a></p></blockquote>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id5" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id2"><tt>[*]</tt></a></td>
<td>Or a hotel.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id6" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id3"><tt>[†]</tt></a></td>
<td>Sadly, <a class="reference external" href="http://en.wikipedia.org/wiki/Kanban">I can&#8217;t take credit</a> for this phrase.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id8" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id4"><tt>[‡]</tt></a></td>
<td>Readers familiar with <a class="reference external" href="http://en.wikipedia.org/wiki/Zilog_Z80">XP</a> may draw a parallel to the practice of <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=561948#c1">Kanban</a>, which has a fascinating backstory, and acknowledges the awesome power of JIT.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id14" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id12"><tt>[§]</tt></a></td>
<td>For the mercy of those who dislike techno.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id16" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id15"><tt>[¶]</tt></a></td>
<td>Hey, I&#8217;m giving this advice away for free, you can&#8217;t expect it to all be good. No company ever survived giving their excellent primary product away for free. <a class="footnote-reference" href="#id18" id="id17"><tt>[#]</tt></a></td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id18" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id17"><tt>[#]</tt></a></td>
<td>Ugh, too much meta-humor. If you&#8217;ve read and understood up to this point, I apologize.</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/cdleary/~4/NaRZl5VmqhA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/07/bb-bed-and-breakfast-for-programmers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Tool teams should work like sleeper cells</title>
		<link>http://blog.cdleary.com/2010/07/tool-teams-should-work-like-sleeper-cells/</link>
		<comments>http://blog.cdleary.com/2010/07/tool-teams-should-work-like-sleeper-cells/#comments</comments>
		<pubDate>Wed, 14 Jul 2010 16:00:05 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Efficiency]]></category>
		<category><![CDATA[Peopleware]]></category>
		<category><![CDATA[Tool Development]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=869</guid>
		<description><![CDATA[I&#8217;ve had some unique experiences interacting-with and participating-in tool development at previous companies that I&#8217;ve worked for, with the quality of those experiences in the broad spectrum from train-wreck to near-satisfactory. From that mental scarring has emerged the weighty goo of an idea, which may be interesting food for thought. [*] How it starts At [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve had some unique experiences interacting-with and participating-in tool development at previous companies that I&#8217;ve worked for, with the quality of those experiences in the broad spectrum from train-wreck to near-satisfactory. From that mental scarring has emerged the weighty goo of an idea, which may be interesting food for thought. <a class="footnote-reference" href="#id2" id="id1"><tt>[*]</tt></a></p>
<div class="section" id="how-it-starts">
<h3>How it starts</h3>
<p>At some point in a company&#8217;s growth, management notices that there is a lot of sub-par, <a class="footnote-reference" href="#id4" id="id3"><tt>[†]</tt></a> redundant, and distributed tool development going on. Employees have been manufacturing quick-and-dirty tools in order to perform their jobs more efficiently.</p>
<p>Management then ponders the benefit of centralizing that tool development. It seems like an easy sell:</p>
<ul class="simple">
<li>Help ensure ongoing productivity gains &#8212; mission-critical tools that stop working or contain heinous bugs are a hidden liability</li>
<li>Foster tool developer expertise on a specialized team</li>
<li>Eliminate needless repetition by creating shared code bases and consolidating infrastructure resources</li>
</ul>
<p>Good management will also consider the negative repercussions of turning distributed and independent resources into a shared and centrally managed resource:</p>
<ul class="simple">
<li>Everybody wants a slice of the tool pie, because tools make our life better, and there always seems to be some stupid crap to automate.</li>
<li>Everybody wants a piece of the compute resources, because more compute makes parallelizable analyses go faster.</li>
<li><em>If you build it, they will come.</em></li>
</ul>
</div>
<div class="section" id="how-i-ve-seen-it-work-warning-depressing-hyperbolic">
<h3>How I&#8217;ve seen it work (warning: depressing, hyperbolic)</h3>
<ol class="arabic">
<li>
<p class="first">A group at the company makes a strong enough case to the centralized-tool-management machinery &#8212; a request for tool development is granted.</p>
</li>
<li>
<p class="first">A series of inevitably painful meetings are scheduled where the customer <em>dictates</em> their requirements, after which the tool team either rejects them or misunderstands/mis-prioritizes them because: a) that&#8217;s not how it works &#8212; they have to actively <em>gather</em> the requirements, and b) they don&#8217;t have enough time to do all the silly little things that the customer wants.</p>
<p>Because people are fighting each other to get what they want, everybody forgets that the customers haven&#8217;t really described the problem domain in any relevant detail.</p>
</li>
<li>
<p class="first">The tool team developers are happy to go code in peace, without going back for more painful meetings. They create a tool according to their understanding of the requirements during the first iteration.</p>
</li>
<li>
<p class="first">The customer has no idea how the tool team came up with a product that was nothing like their expectation. They say something overly dramatic like, &#8220;it&#8217;s all wrong,&#8221; pissing off the tool team, and lose faith in the ability of the tool team to deliver the product they want.</p>
</li>
<li>
<p class="first">The customer goes back to doing it manually or continue to develop their own tools, expecting that the tool team will fail.</p>
</li>
<li>
<p class="first">The tool team fails because the customer lost interest in telling them what they actually needed and giving good feedback. It wasn&#8217;t the tool that anybody was looking for because the process doomed it from the start.</p>
</li>
</ol>
<p>I say that this scenario is depressing because <strong>tool teams exist to make life better for everybody</strong> &#8212; they <em>enjoy</em> writing software that makes <em>your</em> life easier. Working with a tool team should not be painful. You should want to jump for joy when you start working with them and take them out to beers when you&#8217;re finished working with them, because they&#8217;re just <em>that</em> good. I think that, by taking a less traditional approach, you will be able to achieve much better results&#8230;</p>
</div>
<div class="section" id="how-it-should-work">
<h3>How it <em>should</em> work</h3>
<ol class="arabic simple">
<li>A group at the company makes a strong enough case to the centralized-tool-management machinery &#8212; a request for tool development is granted.</li>
<li>A small handful of tool team operatives <a class="footnote-reference" href="#id7" id="id5"><tt>[‡]</tt></a>, probably around two or three people, split off from the rest of the tool team and are placed in the company hierarchy under the team of the customers. They sit the customers&#8217; cube farm, go to their meetings to listen (but no laptops!), etc., just like a typical team member would.</li>
<li>The customer team brings the operatives up to speed on the automatable task that must be performed each day <em>through immersion</em>. Depending on the frequency, breadth, and duration of the manual processes, the operatives must perform this manual process somewhere on the scale from weeks to months, until they develop a full understanding of the variety of manual processes that must be performed. <a class="footnote-reference" href="#id8" id="id6"><tt>[§]</tt></a> All operatives should be 100% assigned to the manual tasks for this duration, temporarily offloading members of customer team after their ramp-up.</li>
<li>Bam! With an unquestionably solid understanding of the problem domain, the tool team sleeper cells activate. 80% of the manual task load is transitioned off of the operatives so that they can begin development work. Agile-style iterations of 1-2 weeks should be used.</li>
<li>After each iteration there must be a usable product (by definition of an iteration). As a result of this, a percentage of the manual task load is shifted back onto the operatives each iteration, augmenting the original 20%. If the tool is actually developing properly, the operatives will be able to cope with the increased load over time.</li>
<li>As the feature set begins to stabilize or the manual task load approaches zero (because it has all been automated), the product is released to the customers for feedback and a limited amount of future-proofing is considered for final iterations.</li>
<li>Most customer feedback is ignored, but a small and reasonable subset is acted on. If the operatives were able to make do with the full task load plus development, it&#8217;s probably a lot better than it used to be, and the customer is just getting greedy.</li>
<li>The customer takes the operatives out for beers, since the tool team saved them a crapload of time and accounted for all the issues in the problem domain.</li>
<li>A single operative hangs back with the customer for a few more iterations to eyeball maintenance concerns and maybe do a little more future-proofing while the rest head back to the tool team. The one who hangs back gets some kind of special reward for being a team player.</li>
</ol>
</div>
<div class="section" id="conclusion">
<h3>Conclusion</h3>
<p>In the sleeper cell approach, the operatives have a clear understanding of what&#8217;s important through <strong>first hand knowledge and experience</strong> and, consequently, know the ways in which the software has to be flexible. It emulates the way that organic tool development is found in the wild, as described in the introductory paragraph, but puts the task of creating the actual tool in the hands of experienced tool developers (our operatives!).</p>
<p>I think it&#8217;s also noteworthy that this approach adheres to a reasonable principle: <strong>to write a good program to automate a task, you have to know/understand the variety of ways in which you might perform that task by hand</strong>, across all the likely variables.</p>
<p>The operatives are forced to live with the fruits of their labor; i.e. a defect like slow load times will be <em>more</em> painful for them, because they have to work with their tool regularly and take on larger workloads on an ongoing basis, before developers can ever get their hands on it.</p>
<p>Notice that there&#8217;s still the benefit through centralization of tool developers: central contact point for tool needs, cultivating expertise in developers, knowledge of shared code base, understanding of infrastructure and contact points for infrastructural resource needs; however, you avoid the weird customer disconnect that comes with time slicing a traditional tool team.</p>
<p>Tools developers may also find that they enjoy the team that they&#8217;re working in so much that they request to stay on that team! How awesome of a pitch is that to new hires? &#8220;Do you have a strong background in software development? Work closely with established software experts, make connections to people who will love you when you&#8217;re done awesome-ing their lives, and take a whirlwind tour of the company within one year.&#8221;</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id2" rules="none">
<colgroup>
<col class="label">
<col></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1"><tt>[*]</tt></a></td>
<td>Yes, I&#8217;m suggesting you digest my mind-goo.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id4" rules="none">
<colgroup>
<col class="label">
<col></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id3"><tt>[†]</tt></a></td>
<td>For some definition of par.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id7" rules="none">
<colgroup>
<col class="label">
<col></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id5"><tt>[‡]</tt></a></td>
<td>I&#8217;m calling them operatives now, because their roles are different from tool developers, as you&#8217;ll see.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id8" rules="none">
<colgroup>
<col class="label">
<col></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id6"><tt>[§]</tt></a></td>
<td>It is beneficial if a small seed of hatred for the manual task begins to develop, though care should be taken not to allow operatives to be consumed by said hatred.</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/cdleary/~4/YNuS3-sRifg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/07/tool-teams-should-work-like-sleeper-cells/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Virtues of Extreme Programming practices</title>
		<link>http://blog.cdleary.com/2010/07/virtues-of-extreme-programming-practices/</link>
		<comments>http://blog.cdleary.com/2010/07/virtues-of-extreme-programming-practices/#comments</comments>
		<pubDate>Mon, 12 Jul 2010 16:00:32 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Extreme Programming]]></category>
		<category><![CDATA[Peopleware]]></category>
		<category><![CDATA[SQRRR]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=867</guid>
		<description><![CDATA[Aside: I&#8217;ve changed the name of my blog to reflect a new writing approach. I&#8217;ve found, with good consistency, that being near-pathologically honest and forward is a boon to my learning productivity. Sometimes it causes temporary setbacks (embarrassment, remorse) when I step into areas that I don&#8217;t fully understand, but the increased rate of progress [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Aside</strong>: I&#8217;ve changed the name of my blog to reflect a new writing approach. I&#8217;ve found, with good consistency, that being near-pathologically honest and forward is a boon to my learning productivity. Sometimes it causes temporary setbacks (embarrassment, remorse) when I step into areas that I don&#8217;t fully understand, but the increased rate of progress is worthwhile. For example, this approach should help me get some <a class="reference external" href="http://en.wikipedia.org/wiki/SQ3R">SQRRR</a> accomplished more readily, as I can get more ideas out in the open and don&#8217;t need to feel like an expert on everything I write about.</p>
<p>In my limited experience with <a class="reference external" href="http://en.wikipedia.org/wiki/Extreme_programming">Extreme Programming</a> (XP) practices, I&#8217;ve felt it was a long-term benefit for myself and my teammates. Unfortunately, because of XP&#8217;s deviation from the more standard programming practices that I was taught, the activities originally carry a certain weirdness and unapproachability about them. Tests up front? Two people writing a single piece of code? Broadcasting the concrete tasks you accomplished on a daily basis?</p>
<p>After shaking off the inevitable willies, I&#8217;ve found that those activities improve relationships between myself and other team members and help to solidify code understanding and emphasize maintainability. From what I&#8217;ve read, this is what the developers of XP were trying to help optimize: the productivity that results from accepting the social aspect of coding. It is strictly more useful to form good working relationships with humans than with <a class="reference external" href="http://www.c2.com/cgi/wiki?RubberDucking">rubber ducks</a>.</p>
<p>A nice secondary effect from the social coding activity is an increased flow of institutional knowledge. Everybody knows little secrets about the corners of your code base or have figured out optimized workflows &#8212; somewhat obviously, interpersonal flow of info helps keep more people in the know. When it takes five to ten minutes to explain a code concept to someone, both parties start to get the feeling it should be documented somewhere.</p>
<p>It reads a bit dramatic, but this snippet from <a class="reference external" href="http://www.extremeprogramming.org/">the XP website</a> has been fairly accurate in my experience:</p>
<blockquote><p>
Extreme Programming improves a software project in five essential ways; communication, simplicity, feedback, respect, and courage. Extreme Programmers constantly communicate with their customers and fellow programmers. They keep their design simple and clean.</p></blockquote>
<p>The cons that I&#8217;ve witnessed are some minor bikeshedding and the increased overhead that seems to accompany these tasks:</p>
<ul class="simple">
<li>Stand-up meetings take time from coding.</li>
<li>Unit tests take time away from &#8220;product code&#8221;.</li>
<li>Pair programming has social transaction costs.</li>
</ul>
<p>On the other hand, I&#8217;ve also witnessed these costs get amortized away:</p>
<ul class="simple">
<li>Small amounts of bikeshedding can get the issue out in the open instead of letting it simmer in a passive-aggressive stew. Reasonable people seem to handle this just fine.</li>
<li>Telling other people on your team are what you&#8217;re doing feels inclusive and keeps everybody in the know. (Plus, you can do it in twenty seconds per person once you&#8217;ve reached the steady state.)</li>
<li>Even a single, representative unit test forces some loosely coupled design-for-testability thinking and shows how you intend for the code to be used.</li>
</ul>
<p>At Mozilla we seem to have a decent code review process down, which is one of my favorite social coding practices when it&#8217;s done well. At the moment, my team doesn&#8217;t seem too keen on some of the other practices I&#8217;ve found helpful, and it&#8217;s certainly not something you should force. In any case, I&#8217;m happy to be the guy who talks about how great I&#8217;ve found these practices when the topic comes up until somebody comes around. ;-)</p>
<img src="http://feeds.feedburner.com/~r/cdleary/~4/25hnOxjn2pI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/07/virtues-of-extreme-programming-practices/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Notes from the JS pit: closure optimization</title>
		<link>http://blog.cdleary.com/2010/05/notes-from-the-js-pit-closure-optimization/</link>
		<comments>http://blog.cdleary.com/2010/05/notes-from-the-js-pit-closure-optimization/#comments</comments>
		<pubDate>Tue, 11 May 2010 10:01:33 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Compilers]]></category>
		<category><![CDATA[Dijkstra display]]></category>
		<category><![CDATA[Funarg Problem]]></category>
		<category><![CDATA[JS Pit]]></category>
		<category><![CDATA[Parsing]]></category>
		<category><![CDATA[Static Analysis]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=848</guid>
		<description><![CDATA[In anticipation of a much-delayed dentist appointment tomorrow morning and under the assumption that hard liquor removes plaque, I&#8217;ve produced [*] an entry in the spirit of Stevey&#8217;s Drunken Blog Rants, s/wine/scotch/g. I apologize for any and all incomprehensibility, although Stevey may not mind since it&#8217;s largely an entry about funargs, which he seems to [...]]]></description>
			<content:encoded><![CDATA[<p>In anticipation of a much-delayed dentist appointment tomorrow morning and under the assumption that hard liquor removes plaque, I&#8217;ve produced <a class="footnote-reference" href="#id4" id="id1"><tt>[*]</tt></a> an entry in the spirit of <a class="reference external" href="http://steve-yegge.blogspot.com/2008_04_01_archive.html">Stevey&#8217;s Drunken Blog Rants</a>, <tt class="docutils literal"><span class="pre">s/wine/scotch/g</span></tt>. I apologize for any and all incomprehensibility, although Stevey may not mind since it&#8217;s largely an entry about funargs, which he seems to have a thing for. (Not that I blame him &#8212; I&#8217;m thinking about them while drinking&#8230;) It also appears I may need to prove myself worthy of emigration to <a class="reference external" href="http://planet.mozilla.org/">planet Mozilla</a>, so hopefully an entry filled with funarg debauchery will serve that purpose as well.</p>
<div class="section" id="background">
<h3>Background</h3>
<p>Lately, I&#8217;ve been doing a little work on closure optimization, as permitted by static analysis; i.e. the parser/compiler marks which functions can be optimized into various closure forms.</p>
<p>In a language that permits nested functions and functions as first-class values, there are a few things you need to ask about each function before you optimize it:</p>
<ul class="simple">
<li>Can it <em>escape</em> (leave the scope it&#8217;s defined in)?</li>
<li>Does it refer to <em>free variables</em> (identifiers declared outside the function definition)?</li>
<li>Are the free variables it refers to potentially <em>redefined</em> after the function definition?</li>
</ul>
</div>
<div class="section" id="function-escape-the-funarg-problem">
<h3>Function escape (the funarg problem)</h3>
<p>If a function can execute outside the scope in which it was lexically defined, it is said to be a &quot;funarg&quot;, a fancy word for &quot;potentially escaping outside the scope where it&#8217;s defined&quot;. We call certain functions in the JS runtime <em>Algol-like closures</em> if they are immediately applied function expressions, like so:</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> outer<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> x <span style="color: #339933;">=</span> <span style="color: #CC0000;">12</span><span style="color: #339933;">;</span>
    <span style="color: #000066; font-weight: bold;">return</span> <span style="color: #009900;">&#40;</span><span style="color: #003366; font-weight: bold;">function</span> cubeX<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #000066; font-weight: bold;">return</span> x <span style="color: #339933;">*</span> x <span style="color: #339933;">*</span> x<span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>The function <tt class="docutils literal"><span class="pre">cubeX</span></tt> can never execute outside the confines of <tt class="docutils literal"><span class="pre">outer</span></tt> &#8212; there&#8217;s no way for the function definition to escape. It&#8217;s as if you just took the expression <tt class="docutils literal"><span class="pre">x</span> <span class="pre">*</span> <span class="pre">x</span> <span class="pre">*</span> <span class="pre">x</span></tt>, wrapped it in a lambda (function expression), and immediately executed that expression. <a class="footnote-reference" href="#id6" id="id5"><tt>[†]</tt></a></p>
<p>Apparently a lot of Algol programmers had the hots for this kinda thing &#8212; the whole function-wrapping thing was totally optional, but you chose to do it, Algol programmers, and we respect your choice.</p>
<p>You can optimize this case through static analysis. As long as there&#8217;s no possibility of escape between a declaration and its use in a nested function, the nested function knows exactly how far to reach up the stack to retrieve/manipulate the variable &#8212; the activation record stack is totally determined at compile time. Because there&#8217;s no escaping, there&#8217;s not even any need to import the upvar into the Algol-like function.</p>
<div class="section" id="dijkstra-s-display-optimization">
<h4>Dijkstra&#8217;s display optimization</h4>
<p>To optimize this Algol-like closure case we used a construct called a &quot;Dijkstra display&quot; (or something named along those lines). You just keep an array of stack frame pointers, with each array slot representing the frame currently executing at that function nesting level. When outer is called in the above, outer&#8217;s stack frame pointer would be placed in the display array at nesting level 0, so the array would look like:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Level 0: &amp;outerStackFrame
Level 1: NULL
Level 2: NULL
...</pre></div></div>

<p>Then, when cubeX is invoked, it is placed at nesting level 1:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Level 0: &amp;outerStackFrame
Level 1: &amp;cubeX
Level 2: NULL
...</pre></div></div>

<p>At parse time, we tell <tt class="docutils literal"><span class="pre">cubeX</span></tt> that it can reach up to level 0, frame slot 0 to retrieve the <tt class="docutils literal"><span class="pre">jsval</span></tt> for x. <a class="footnote-reference" href="#id8" id="id7"><tt>[‡]</tt></a> Even if you have &quot;parent&quot; frame references in each stack frame, this array really helps when a function is reaching up many levels to retrieve an upvar, since you can do a single array lookup instead of an <em>n</em> link parent chain traversal. Note that this is only useful when you know the upvar-referring functions will never escape, because the display can only track stack frames for functions that are currently executing.</p>
<p>There&#8217;s also the possibility that two functions at the same nesting level are executing simultaneously; i.e.</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> outer<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> x <span style="color: #339933;">=</span> <span style="color: #CC0000;">24</span><span style="color: #339933;">;</span>
    <span style="color: #003366; font-weight: bold;">function</span> innerFirst<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #000066; font-weight: bold;">return</span> x<span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span>
    <span style="color: #003366; font-weight: bold;">function</span> innerSecond<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #003366; font-weight: bold;">var</span> x <span style="color: #339933;">=</span> <span style="color: #CC0000;">42</span><span style="color: #339933;">;</span>
        <span style="color: #000066; font-weight: bold;">return</span> innerFirst<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #000066; font-weight: bold;">return</span> innerSecond<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>To deal with this case, each stack frame has a pointer to the &quot;chained&quot; display stack frame for that nesting level, which is restored when the executing function returns. To go through the motions:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Level 0: &amp;outerStackFrame
Level 1: &amp;innerSecond
Level 2: NULL
...</pre></div></div>

<p>Which then activates <tt class="docutils literal"><span class="pre">innerFirst</span></tt> at the same static level (1), which saves the pointer that it&#8217;s clobbering in the display array.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Level 0: &amp;outerStackFrame
Level 1: &amp;innerFirst (encapsulates &amp;innerSecond)
Level 2: NULL
...</pre></div></div>

<p>Then, when <tt class="docutils literal"><span class="pre">innerFirst</span></tt> looks up the static levels for <tt class="docutils literal"><span class="pre">x</span></tt>, it gets the correct value, restoring <tt class="docutils literal"><span class="pre">innerSecond</span></tt> when it&#8217;s done executing in a <tt class="docutils literal"><span class="pre">return</span></tt>-style bytecode (which would be important if there were further function nesting in <tt class="docutils literal"><span class="pre">innerSecond</span></tt>). <a class="footnote-reference" href="#id10" id="id9"><tt>[§]</tt></a></p>
<p>Okay, hopefully I&#8217;ve explained that well enough, because now I get to tell you that we&#8217;ve found this optimization to be fairly useless in SpiderMonkey experimental surveys and we hope to rip it out at some point. The interesting case that we actually care about (flat closures) is discussed in the second to last section.</p>
</div>
</div>
<div class="section" id="free-variable-references">
<h3>Free variable references</h3>
<p>Because JS is a lexically scoped language <a class="footnote-reference" href="#id14" id="id11"><tt>[¶]</tt></a> we can determine which enclosing scope a free variable is defined in. <a class="footnote-reference" href="#id15" id="id12"><tt>[#]</tt></a> If a function&#8217;s free variables only refer to bindings in the global scope, then it doesn&#8217;t need any information from the functions that enclose it. For these functions the set of free variables in nested functions is the null set, so we call it a <em>null closure</em>. Top-level functions are null closures. <a class="footnote-reference" href="#id16" id="id13"><tt>[♠]</tt></a></p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> outer<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066; font-weight: bold;">return</span> <span style="color: #003366; font-weight: bold;">function</span> cube<span style="color: #009900;">&#40;</span>x<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #000066; font-weight: bold;">return</span> x <span style="color: #339933;">*</span> x <span style="color: #339933;">*</span> x<span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span> <span style="color: #006600; font-style: italic;">// Null closure - no upvars.</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Free variables are termed <em>upvars</em>, since they are identifiers that refer to variables in higher (enclosing) scopes. At parse time, when we&#8217;re trying to find a declaration to match up with a use, they&#8217;re called <em>unresolved lexical dependencies</em>. Though JavaScript scopes are less volatile &#8212; and, as some will undoubtedly point out, less flexible &#8212; I believe that the name upvar comes from this construct in Tcl, which lets you inject vars into and read vars from arbitrary scopes as determined by the runtime call stack: <a class="footnote-reference" href="#id18" id="id17"><tt>[♥]</tt></a></p>

<div class="wp_syntax"><div class="code"><pre class="tcl" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">set</span> x <span style="color: #ff4500;">7</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">proc</span> most_outer <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span> <span style="color: black;">&#123;</span>
    <span style="color: #ff7700;font-weight:bold;">proc</span> outer <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span> <span style="color: black;">&#123;</span>
        <span style="color: #ff7700;font-weight:bold;">set</span> x <span style="color: #ff4500;">24</span>
        <span style="color: #ff7700;font-weight:bold;">proc</span> upvar_setter <span style="color: #483d8b;">{level}</span> <span style="color: black;">&#123;</span>
            <span style="color: #ff7700;font-weight:bold;">upvar</span> <span style="color: #ff3333;">$level</span> x x
            <span style="color: #ff7700;font-weight:bold;">set</span> x <span style="color: #ff4500;">42</span>
        <span style="color: black;">&#125;</span>
        <span style="color: #ff7700;font-weight:bold;">proc</span> upvar_printer <span style="color: #483d8b;">{level}</span> <span style="color: black;">&#123;</span>
            <span style="color: #ff7700;font-weight:bold;">upvar</span> <span style="color: #ff3333;">$level</span> x x
            <span style="color: #008000;">puts</span> <span style="color: #ff3333;">$x</span>
        <span style="color: black;">&#125;</span>
        upvar_printer <span style="color: #ff4500;">1</span>
        upvar_setter <span style="color: #ff4500;">1</span>
        upvar_printer <span style="color: #ff4500;">1</span>
        upvar_setter <span style="color: #ff4500;">2</span>
        upvar_printer <span style="color: #ff4500;">2</span>
        upvar_printer <span style="color: #ff4500;">3</span>
        upvar_setter <span style="color: #ff4500;">3</span>
        upvar_printer <span style="color: #ff4500;">3</span>
    <span style="color: black;">&#125;</span>
    outer
<span style="color: black;">&#125;</span>
most_outer <span style="color: #808080; font-style: italic;"># Yields the numbers 24, 42, 42, 7, and 42.</span></pre></div></div>

</div>
<div class="section" id="upvar-redefinitions">
<h3>Upvar redefinitions</h3>
<p>If you know that the upvar is never redefined after the nested function is created, it is effectively <em>immutable</em> &#8212; similar to the effect of Java&#8217;s partial closures in anonymous inner classes via the <tt class="docutils literal"><span class="pre">final</span></tt> keyword. In this case, you can create an optimized closure in a form we call a <em>flat closure</em> &#8212; if, during static analysis, you find that none of the upvars are redefined after the function definition, you can <em>import</em> the upvars into the closure, effectively copying the immutable <tt class="docutils literal"><span class="pre">jsval</span></tt>s into extra function slots.</p>
<p>On the other hand, if variables in enclosing scopes <em>are</em> (re)defined after the function definition (and thus, don&#8217;t appear immutable to the function), a shared environment object has to be created so that nested functions can correctly see when the updates to the <tt class="docutils literal"><span class="pre">jsval</span></tt>s occur. Take the following example:</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> outer<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> communicationChannel <span style="color: #339933;">=</span> <span style="color: #CC0000;">24</span><span style="color: #339933;">;</span>
    <span style="color: #003366; font-weight: bold;">function</span> innerGetter<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000066; font-weight: bold;">return</span> communicationChannel<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #003366; font-weight: bold;">function</span> innerSetter<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        communicationChannel <span style="color: #339933;">=</span> <span style="color: #CC0000;">42</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #000066; font-weight: bold;">return</span> <span style="color: #009900;">&#91;</span>innerGetter<span style="color: #339933;">,</span> innerSetter<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

</div>
<div class="section" id="closing-over-references">
<h3>Closing over references</h3>
<p>In this case, <tt class="docutils literal"><span class="pre">outer</span></tt> must create an environment record outside of the stack so that when <tt class="docutils literal"><span class="pre">innerGetter</span></tt> and <tt class="docutils literal"><span class="pre">innerSetter</span></tt> escape on return, they can see both communicate through the upvar. This is the nice encapsulation-effect you can get through closure-by-reference, and is often used in the JS &quot;constructor-pattern&quot;, like so:</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> MooCow<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> hasBell <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">false</span><span style="color: #339933;">;</span>
    <span style="color: #003366; font-weight: bold;">var</span> noise <span style="color: #339933;">=</span> <span style="color: #3366CC;">&quot;Moo.&quot;</span><span style="color: #339933;">;</span>
    <span style="color: #000066; font-weight: bold;">return</span> <span style="color: #009900;">&#123;</span>
        pontificate<span style="color: #339933;">:</span> <span style="color: #003366; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #000066; font-weight: bold;">return</span> hasBell<span style="color: #339933;">?</span> noise <span style="color: #339933;">+</span> <span style="color: #3366CC;">&quot; &lt;GONG!&gt;&quot;</span> <span style="color: #339933;">:</span> noise<span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span>
        giveBell<span style="color: #339933;">:</span> <span style="color: #003366; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> hasBell <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">true</span><span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>It&#8217;s interesting to note that all the languages I work with these days perform closure-by-reference, as opposed to closure-by-value. In constrast, closure-by-value would snapshot all identifiers in the enclosing scope, so immutable types (strings, numbers) would be impossible to change.</p>
<p>Sometimes, closure-by-reference can produce side effects that <a class="reference external" href="http://math.andrej.com/2009/04/09/pythons-lambda-is-broken/">surprise developers</a>, such as:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> surprise<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    funs = <span style="color: black;">&#91;</span><span style="color: #ff7700;font-weight:bold;">lambda</span>: x <span style="color: #66cc66;">**</span> <span style="color: #ff4500;">2</span> <span style="color: #ff7700;font-weight:bold;">for</span> x <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">6</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">assert</span> funs<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">25</span></pre></div></div>

<p>This occurs because <tt class="docutils literal"><span class="pre">x</span></tt> is bound in function-local scope, and all the lambdas close over it by reference. When <tt class="docutils literal"><span class="pre">x</span></tt> is mutated in further iterations of the list comprehension (at least in Python 2.x), the lambdas are closed over the environment record of surprise, and <em>all of them</em> see the last value that <tt class="docutils literal"><span class="pre">x</span></tt> was updated to.</p>
<p>I can sympathize. In fact, I&#8217;ve wrote a program to do so:</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">var</span> lambdas <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #003366; font-weight: bold;">var</span> condolences <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">&quot;You're totally right&quot;</span><span style="color: #339933;">,</span>
        <span style="color: #3366CC;">&quot;and I understand what you're coming from, but&quot;</span><span style="color: #339933;">,</span>
        <span style="color: #3366CC;">&quot;this is how closures work nowadays&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #000066; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #003366; font-weight: bold;">var</span> i <span style="color: #339933;">=</span> <span style="color: #CC0000;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&lt;</span> condolences.<span style="color: #660066;">length</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> condolence <span style="color: #339933;">=</span> condolences<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    lambdas.<span style="color: #660066;">push</span><span style="color: #009900;">&#40;</span><span style="color: #003366; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #000066; font-weight: bold;">return</span> condolence<span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #000066;">print</span><span style="color: #009900;">&#40;</span>lambdas<span style="color: #009900;">&#91;</span><span style="color: #CC0000;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Keep in mind that <tt class="docutils literal"><span class="pre">var</span></tt> delcarations are hoisted to function scope in JS.</p>
<p>I implore you to note that comments will most likely be received while I&#8217;m sober.</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id4" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1"><tt>[*]</tt></a></td>
<td>Vomited?</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id6" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id5"><tt>[†]</tt></a></td>
<td>Cue complaints about the imperfect lambda abstraction in JavaScript. Dang Ruby kids, go play with your blocks! ;-)</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id8" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id7"><tt>[‡]</tt></a></td>
<td>Roughly. Gory details left out for illustrative purposes.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id10" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id9"><tt>[§]</tt></a></td>
<td>There&#8217;s also the case where the display array runs out of space for the array. I believe we emit unoptimized name-lookups in this case, but I don&#8217;t entirely recall.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id14" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id11"><tt>[¶]</tt></a></td>
<td>With a few insidious dynamic scoping constructs thrown in. I&#8217;ll get to that in a later entry.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id15" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id12"><tt>[#]</tt></a></td>
<td>Barring enclosing <tt class="docutils literal"><span class="pre">with</span></tt> statements and injected eval scopes.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id16" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id13"><tt>[♠]</tt></a></td>
<td>Unless they contain an <tt class="docutils literal"><span class="pre">eval</span></tt> or <tt class="docutils literal"><span class="pre">with</span></tt>, in which case we call them &quot;heavyweight&quot; &#8212; though they still don&#8217;t need information from enclosing functions, they must carry a stack of environment records, so they&#8217;re not optimal. I love how many footenotes I make when I talk about the JavaScript language. ;-)</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id18" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id17"><tt>[♥]</tt></a></td>
<td>As a result, it&#8217;s extremely difficult to optimize accesses like these without whole propgram analysis.</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/cdleary/~4/pOthPWE_hnI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/05/notes-from-the-js-pit-closure-optimization/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Notes from the JS pit: lofty goals, humble beginnings</title>
		<link>http://blog.cdleary.com/2010/05/notes-from-the-js-pit-lofty-goals-humble-beginnings/</link>
		<comments>http://blog.cdleary.com/2010/05/notes-from-the-js-pit-lofty-goals-humble-beginnings/#comments</comments>
		<pubDate>Sun, 02 May 2010 06:40:14 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Languages]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Benchmarks]]></category>
		<category><![CDATA[Compilation]]></category>
		<category><![CDATA[JS Pit]]></category>
		<category><![CDATA[Parsing]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=801</guid>
		<description><![CDATA[I&#8217;ve been working at Mozilla for about two months on the JavaScript (JS) engine team, the members of which sit in an area affectionately known as the &#34;JS pit&#34;. Mozillians appear to try to blog on a regular basis, so I&#8217;ll be starting a series of entries prefixed &#34;notes from the JS pit&#34; to explain [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working at Mozilla for about two months on the JavaScript (JS) engine team, the members of which sit in an area affectionately known as the &quot;JS pit&quot;.</p>
<p>Mozillians appear to try to blog on a regular basis, so I&#8217;ll be starting a series of entries prefixed &quot;notes from the JS pit&quot; to explain what I&#8217;ve been working on and/or thinking about.</p>
<p>Notably, I feel fortunate to work for a company that encourages this kind of openness.</p>
<div class="section" id="goals">
<h3>Goals</h3>
<p>I always feel stupid writing down goals &#8212; they seem so self-evident; however, it helps to put the work I&#8217;m doing into perspective and gives me something that I can to refer back to.</p>
<p>I&#8217;m also a big believer in the effectiveness of public accountability, so publishing those goals seems prudent &#8212; my notes from the JS pit are poised to help me stay motivated more than anything else.</p>
<p>My goals are to:</p>
<ul class="simple">
<li>Do my part to meet and exceed performance targets in the JS engine. It&#8217;s an exciting time with lots of healthy competition to motivate this.</li>
<li>Immerse myself in language design concepts and seek out important implementation details, even if those details don&#8217;t pertain to something that I&#8217;m working on directly. This also entails some independent research and small projects to gain better understanding of foreign language concepts.</li>
<li>Deliberately take time to think constructively about alternative or innovative ways to accomplish our language goals. Taking things at face value as &quot;the way we&#8217;ve always done it&quot; is the way that innovative things get overlooked &#8212; this is relatively easy to do with a pair of fresh eyes, but similarly difficult because you&#8217;re the team noob.</li>
</ul>
<p>Working with compiler engineers from diverse language backgrounds, it&#8217;s prime time for sucking knowledge out of people&#8217;s heads, comparing and contrasting it, and listening to them argue with each other. Heck, just look at the concepts behind JS: an imperative, Scheme-inspired, prototypal, C-and-Java-syntax conforming language that&#8217;s irrevocably tied to a practical platform, the web. It&#8217;s bound to be a fun ride.</p>
</div>
<div class="section" id="from-start-to-present">
<h3>From start to present</h3>
<p>I started off implementing the simpler opcodes for <a class="reference external" href="https://wiki.mozilla.org/JaegerMonkey">JaegerMonkey</a> (JM) and getting an understanding the JM code base. Not too long into it, I was told that looking into quick-and-easy parser optimizations was a priority &#8212; somebody had reported that a significant fraction of the GMail load bar time could be attributed to JS parsing. <a class="footnote-reference" href="#id2" id="id1"><tt>[*]</tt></a></p>
<p>Now, JavaScript isn&#8217;t the easiest language in the world to parse; for example, <a class="reference external" href="http://en.wikipedia.org/wiki/JavaScript_syntax#Whitespace_and_semicolons">automatic semicolon insertion</a> creates some non-traditional obstacles for generated shift/reduce parsers <a class="footnote-reference" href="#id4" id="id3"><tt>[†]</tt></a> &#8212; it effectively makes an error correction algorithm part of the normal parse procedure. The details are for another entry, but suffice it to say that our recursive descent parser code gets complex, especially due to our <a class="reference external" href="https://developer.mozilla.org/en/E4X">E4X support</a> and some of the static analyses we perform for optimizations before bytecode emission.</p>
<p>In pursuing JS parser optimization I assembled a suite of parsing benchmarks from sites on the web with &quot;large&quot; JS payloads &#8212; I call this suite <em>parsemark</em>. After getting some speedups from simple inlining, I attempted a somewhat fundamental change to the parser to reduce the number of branch mispredictions, in converting it to always have a token &quot;pre-lexed&quot; as opposed to the prior &quot;lex-on-demand&quot; model. Roughly, this required adding a &quot;there&#8217;s always a lexed token&quot; invariant to the lexer and hoisting lexer calls/modesets from substitution rules into their referring nonterminals in the parser. The details for this are also entry fodder. Sadly, it demonstrated negligible performance gains for the increase in complexity. Sure taught me a lot about our parser, though.</p>
<p>The biggest performance win was obtained through a basic fix to our parser arena-allocation chunk sizing. <a class="reference external" href="http://blog.mozilla.com/rob-sayre/">sayrer</a> noticed that a surprising amount of time was being spent in kernel space, so we tracked the issue down. It was frustrating to work for a few weeks on a fundamental change and then realize that multiplying a constant by four can get you a 20% parsing speedup, but I&#8217;ve certainly learned to look a lot more closely at the vmem subsystem when things are slow. I have some speedup statistics and a comparison to V8 (with all its lazy parsing and parse-caching bits ripped out), but I don&#8217;t have much faith that my environment hasn&#8217;t changed in the course of all the historical data measurements &#8212; writing a script to verify speedup over changesets seems like a viable option for future notes.</p>
<p>In the miscellany department, I&#8217;ve been trying to do a good amount of work <a class="reference external" href="http://www.artima.com/intv/fixit.html">fixing broken windows</a> via cleanup patches. I&#8217;m finding it difficult to strike a balance here, since there&#8217;s a lot of modularity-breaking interdependencies in the code base &#8212; what appear to be simple cleanups tend to unravel into large patches that get stale easily. However, cleanup does force you to read through the code you&#8217;re modifying, which is always good when you&#8217;re learning a new code base.</p>
<p>Looking back on it, it doesn&#8217;t seem like a lot of work; of course, my hope is that the time I spend up-front getting accustomed to the codebase will let me make progress on my goals more rapidly.</p>
<p>Stay tuned for more JS pittage &#8212; unpredictable time, but predictable channel.</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id2" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1"><tt>[*]</tt></a></td>
<td>To date, I haven&#8217;t looked into this myself. Ideally, I should have verified it before starting on the parser work, but I was eager to start working on things rather than investigate the reasons behind them.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id4" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id3"><tt>[†]</tt></a></td>
<td>Though I&#8217;ve seen that Webkit&#8217;s JavaScriptCore uses Bison &#8212; I&#8217;m going to have to check out that implementation at some point.</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/cdleary/~4/8aDmap0extM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/05/notes-from-the-js-pit-lofty-goals-humble-beginnings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Efficiency of list comprehensions</title>
		<link>http://blog.cdleary.com/2010/04/efficiency-of-list-comprehensions/</link>
		<comments>http://blog.cdleary.com/2010/04/efficiency-of-list-comprehensions/#comments</comments>
		<pubDate>Fri, 30 Apr 2010 19:31:08 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Efficiency]]></category>
		<category><![CDATA[List Comprehensions]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=782</guid>
		<description><![CDATA[I&#8217;m psyched about the awesome comments on my previous entry, Python by example: list comprehensions. Originally this entry was just a response to those comments, but people who stumbled across this entry on the interwebz found the response format too confusing, so I&#8217;ve restructured it for posterity. Efficiency of the more common usage Let&#8217;s look [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m psyched about the awesome comments on my previous entry, <a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/">Python by example: list comprehensions</a>. Originally this entry was just a response to those comments, but people who stumbled across this entry on the interwebz found the response format too confusing, so I&#8217;ve restructured it for posterity.</p>
<div class="section" id="efficiency-of-the-more-common-usage">
<h3>Efficiency of the more common usage</h3>
<p>Let&#8217;s look at the efficiency of list comprehensions in the more common usage, where the comprehension&#8217;s list result is actually relevant (or, in compiler-speak, live-out).</p>
<p>Using the following program, you can see the time spent in each implementation and the corresponding bytecode sequence:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">dis</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">inspect</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">timeit</span>
&nbsp;
&nbsp;
programs = <span style="color: #008000;">dict</span><span style="color: black;">&#40;</span>
    loop=<span style="color: #483d8b;">&quot;&quot;&quot;
result = []
for i in range(20):
    result.append(i * 2)
&quot;&quot;&quot;</span>,
   loop_faster=<span style="color: #483d8b;">&quot;&quot;&quot;
result = []
add = result.append
for i in range(20):
    add(i * 2)
&quot;&quot;&quot;</span>,
    comprehension=<span style="color: #483d8b;">'result = [i * 2 for i in range(20)]'</span>,
<span style="color: black;">&#41;</span>
&nbsp;
&nbsp;
<span style="color: #ff7700;font-weight:bold;">for</span> name, text <span style="color: #ff7700;font-weight:bold;">in</span> programs.<span style="color: black;">iteritems</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> name, <span style="color: #dc143c;">timeit</span>.<span style="color: black;">Timer</span><span style="color: black;">&#40;</span>stmt=text<span style="color: black;">&#41;</span>.<span style="color: #dc143c;">timeit</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">code</span> = <span style="color: #008000;">compile</span><span style="color: black;">&#40;</span>text, <span style="color: #483d8b;">'&lt;string&gt;'</span>, <span style="color: #483d8b;">'exec'</span><span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">dis</span>.<span style="color: black;">disassemble</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">code</span><span style="color: black;">&#41;</span></pre></div></div>


<div class="wp_syntax"><div class="code"><pre class="python-disasm" style="font-family:monospace;">loop 11.1495118141
  2           0 BUILD_LIST               0
              3 STORE_NAME               0 (result)
&nbsp;
  3           6 SETUP_LOOP              37 (to 46)
              9 LOAD_NAME                1 (range)
             12 LOAD_CONST               0 (20)
             15 CALL_FUNCTION            1
             18 GET_ITER
        &gt;&gt;   19 FOR_ITER                23 (to 45)
             22 STORE_NAME               2 (i)
&nbsp;
  4          25 LOAD_NAME                0 (result)
             28 LOAD_ATTR                3 (append)
             31 LOAD_NAME                2 (i)
             34 LOAD_CONST               1 (2)
             37 BINARY_MULTIPLY
             38 CALL_FUNCTION            1
             41 POP_TOP
             42 JUMP_ABSOLUTE           19
        &gt;&gt;   45 POP_BLOCK
        &gt;&gt;   46 LOAD_CONST               2 (None)
             49 RETURN_VALUE
loop_faster 8.36096310616
  2           0 BUILD_LIST               0
              3 STORE_NAME               0 (result)
&nbsp;
  3           6 LOAD_NAME                0 (result)
              9 LOAD_ATTR                1 (append)
             12 STORE_NAME               2 (add)
&nbsp;
  4          15 SETUP_LOOP              34 (to 52)
             18 LOAD_NAME                3 (range)
             21 LOAD_CONST               0 (20)
             24 CALL_FUNCTION            1
             27 GET_ITER
        &gt;&gt;   28 FOR_ITER                20 (to 51)
             31 STORE_NAME               4 (i)
&nbsp;
  5          34 LOAD_NAME                2 (add)
             37 LOAD_NAME                4 (i)
             40 LOAD_CONST               1 (2)
             43 BINARY_MULTIPLY
             44 CALL_FUNCTION            1
             47 POP_TOP
             48 JUMP_ABSOLUTE           28
        &gt;&gt;   51 POP_BLOCK
        &gt;&gt;   52 LOAD_CONST               2 (None)
             55 RETURN_VALUE
comprehension 7.08145213127
  1           0 BUILD_LIST               0
              3 DUP_TOP
              4 STORE_NAME               0 (_[1])
              7 LOAD_NAME                1 (range)
             10 LOAD_CONST               0 (20)
             13 CALL_FUNCTION            1
             16 GET_ITER
        &gt;&gt;   17 FOR_ITER                17 (to 37)
             20 STORE_NAME               2 (i)
             23 LOAD_NAME                0 (_[1])
             26 LOAD_NAME                2 (i)
             29 LOAD_CONST               1 (2)
             32 BINARY_MULTIPLY
             33 LIST_APPEND
             34 JUMP_ABSOLUTE           17
        &gt;&gt;   37 DELETE_NAME              0 (_[1])
             40 STORE_NAME               3 (result)
             43 LOAD_CONST               2 (None)
             46 RETURN_VALUE</pre></div></div>

<p>List comprehensions perform better here because you don’t need to load the append attribute off of the list (loop program, bytecode 28) and call it as a function (loop program, bytecode 38). Instead, in a comprehension, a specialized <tt class="docutils literal"><span class="pre">LIST_APPEND</span></tt> bytecode is generated for a fast append onto the result list (comprehension program, bytecode 33).</p>
<p>In the <tt class="docutils literal"><span class="pre">loop_faster</span></tt> program, you avoid the overhead of the <tt class="docutils literal"><span class="pre">append</span></tt> attribute lookup by hoisting it out of the loop and placing the result in a fastlocal (bytecode 9-12), so it loops more quickly; however, the comprehension uses a specialized <tt class="docutils literal"><span class="pre">LIST_APPEND</span></tt> bytecode instead of incurring the overhead of a function call, so it still trumps.</p>
</div>
<div class="section" id="using-list-comprehensions-for-side-effects">
<h3>Using list comprehensions for side effects</h3>
<p>I want to address a point that was brought up in the <a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comments">previous entry</a> as to the efficiency of for loops versus list comprehensions when used purely for side effects, but I&#8217;ll discuss the subjective bit first, since that&#8217;s the least sciency part.</p>
<div class="section" id="readability">
<h4>Readability</h4>
<blockquote>
<p>Simple test – if you did need the result would the comprehension be easily understood? If the answer is yes then removing the assignment on the left hand side doesn’t magically make it less readable…</p>
<p class="attribution">&mdash;<a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comment-1539">Michael Foord</a></p>
</blockquote>
<p>First of all, thanks to Michael for his excellent and thought provoking comment!</p>
<p>My response is that removing the use of the result does indeed make it less readable, precisely <em>because</em> you&#8217;re using a result-producing control flow construct where the result is not needed. I suppose I&#8217;m positing that it&#8217;s inherently confusing to do that with your syntax: there&#8217;s a looping form that doesn&#8217;t produce a result, so that should be used instead. It&#8217;s expressing your semantic intention via syntax.</p>
<p>For advanced Pythonistas it&#8217;s easy for figure out what&#8217;s going on at a glance, but comprehension-as-loop definitely has a &quot;there&#8217;s more than one way to do it&quot; smell about it, which also makes it less amenable to people learning the language.</p>
<p>With a viable comprehension-as-loop option, every time a user goes to write a loop that doesn&#8217;t require a result they now ask themselves, &quot;Can I fit this into the list comprehension form?&quot; Those mental branches are, to me, what &quot;one way to do it&quot; is designed to avoid. When I read Perl code, I take &quot;mental exceptions&quot; all the time because the author didn&#8217;t use the construct that I would have used in the same situation. Minimizing that is a good thing, so I maintain that &quot;no result needed&quot; should automatically imply a loop construct.</p>
</div>
<div class="section" id="efficiency">
<h4>Efficiency</h4>
<p>Consider two functions, <tt class="docutils literal"><span class="pre">comprehension</span></tt> and <tt class="docutils literal"><span class="pre">loop</span></tt>:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> loop<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    accum = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">20</span><span style="color: black;">&#41;</span>:
        accum.<span style="color: black;">append</span><span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> accum
&nbsp;
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> comprehension<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    accum = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: black;">&#91;</span>accum.<span style="color: black;">append</span><span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">20</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> accum</pre></div></div>

<p>N.B. This example is comparing the efficiency of a list comprehension <strong>where the result of the comprehension is ignored</strong> to a for loop that produces no result, as is discussed in the referenced entry, <a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/">Python by example: list comprehensions</a>.</p>
<p><a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comment-1539">Michael Foord</a> comments:</p>
<blockquote><p>
Your alternative for the single line, easily readable, list comprehension is four lines that are less efficient because the loop happens in the interpreter rather than in C.</p></blockquote>
<p>However, the disassembly, obtained via <a class="reference external" href="http://docs.python.org/library/dis.html#dis.dis">dis.dis(func)</a> looks like the following for the loop:</p>

<div class="wp_syntax"><div class="code"><pre class="python-disasm" style="font-family:monospace;">2           0 BUILD_LIST               0
            3 STORE_FAST               0 (accum)
&nbsp;
3           6 SETUP_LOOP              33 (to 42)
            9 LOAD_GLOBAL              0 (range)
           12 LOAD_CONST               1 (20)
           15 CALL_FUNCTION            1
           18 GET_ITER
      &gt;&gt;   19 FOR_ITER                19 (to 41)
           22 STORE_FAST               1 (i)
&nbsp;
4          25 LOAD_FAST                0 (accum)
           28 LOAD_ATTR                1 (append)
           31 LOAD_FAST                1 (i)
           34 CALL_FUNCTION            1
           37 POP_TOP
           38 JUMP_ABSOLUTE           19
      &gt;&gt;   41 POP_BLOCK
&nbsp;
5     &gt;&gt;   42 LOAD_FAST                0 (accum)
           45 RETURN_VALUE</pre></div></div>

<p>And it looks like the following for the comprehension:</p>

<div class="wp_syntax"><div class="code"><pre class="python-disasm" style="font-family:monospace;">2           0 BUILD_LIST               0
            3 STORE_FAST               0 (accum)
&nbsp;
3           6 BUILD_LIST               0
            9 DUP_TOP
           10 STORE_FAST               1 (_[1])
           13 LOAD_GLOBAL              0 (range)
           16 LOAD_CONST               1 (20)
           19 CALL_FUNCTION            1
           22 GET_ITER
      &gt;&gt;   23 FOR_ITER                22 (to 48)
           26 STORE_FAST               2 (i)
           29 LOAD_FAST                1 (_[1])
           32 LOAD_FAST                0 (accum)
           35 LOAD_ATTR                1 (append)
           38 LOAD_FAST                2 (i)
           41 CALL_FUNCTION            1
           44 LIST_APPEND
           45 JUMP_ABSOLUTE           23
      &gt;&gt;   48 DELETE_FAST              1 (_[1])
           51 POP_TOP
&nbsp;
4          52 LOAD_FAST                0 (accum)
           55 RETURN_VALUE</pre></div></div>

<p>By looking at the bytecode instructions, we see that the list comprehension is, at a language level, actually just &quot;syntactic sugar&quot; for the <tt class="docutils literal"><span class="pre">for</span></tt> loop, as mentioned by <a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comment-1540">nes</a> &#8212; they both lower down into the same control flow construct at a virtual machine level, at least in CPython.</p>
<p>The primary difference between the two disassemblies is that a superfluous list comprehension result is stored into fastlocal 1, which is loaded (bytecode 29) and appended to (bytecode 44) each iteration, creating some additional overhead &#8212; it&#8217;s simply deleted in bytecode 48. Unless the <tt class="docutils literal"><span class="pre">POP_BLOCK</span></tt> operation (bytecode 41) of the loop disassembly is very expensive (I haven&#8217;t looked into its implementation), the comprehension disassembly is guaranteed to be less efficient.</p>
<p>Because of this, I believe that Michael was mistaken in referring to an overhead that results from use of a <tt class="docutils literal"><span class="pre">for</span></tt> loop versus a list comprehension for CPython. It would be interesting to perform a survey of the list comprehension optimization techniques used in various Python implementations, but optimization seems difficult outside of something like a special <a class="reference external" href="http://www.cython.org/">Cython</a> construct, because <tt class="docutils literal"><span class="pre">LOAD_GLOBAL</span> <span class="pre">range</span></tt> could potentially be changed from the builtin range function. Various issues of this kind are discussed in the (very interesting) paper <a class="reference external" href="http://portal.acm.org/citation.cfm?id=1534530.1534550">The effect of unrolling and inlining for Python bytecode optimizations</a>.</p>
</div>
</div>
<img src="http://feeds.feedburner.com/~r/cdleary/~4/eaAep1EWpOY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/04/efficiency-of-list-comprehensions/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Learning Python by example: list comprehensions</title>
		<link>http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/</link>
		<comments>http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comments</comments>
		<pubDate>Thu, 29 Apr 2010 16:00:59 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Introductory]]></category>
		<category><![CDATA[List Comprehensions]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=762</guid>
		<description><![CDATA[My friend, who is starting to learn Python 2.x, asked me what this snippet did: def collapse&#40;seq&#41;: # Preserve order. uniq = &#91;&#93; &#91;uniq.append&#40;item&#41; for item in seq if not uniq.count&#40;item&#41;&#93; return uniq This is not a snippet that should be emulated (i.e. it&#8217;s bad); however, it makes me happy: there are so many things [...]]]></description>
			<content:encoded><![CDATA[<p>My friend, who is starting to learn Python 2.x, asked me what this snippet did:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> collapse<span style="color: black;">&#40;</span>seq<span style="color: black;">&#41;</span>:
    <span style="color: #808080; font-style: italic;"># Preserve order.</span>
    uniq = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: black;">&#91;</span>uniq.<span style="color: black;">append</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> item <span style="color: #ff7700;font-weight:bold;">in</span> seq <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> uniq.<span style="color: black;">count</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> uniq</pre></div></div>

<p>This is not a snippet that should be emulated (i.e. <em>it&#8217;s bad</em>); however, it makes me happy: there are so many things that can be informatively corrected!</p>
<div class="section" id="what-is-a-list-comprehension">
<h3>What is a list comprehension?</h3>
<p>A list comprehension is a special brackety syntax to perform a <em>transform</em> operation with an optional <em>filter</em> clause that always produces a new sequence (list) object as a <em>result</em>. To break it down visually, you perform:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">new_range = <span style="color: black;">&#91;</span>i <span style="color: #66cc66;">*</span> i          <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span>   <span style="color: #ff7700;font-weight:bold;">if</span> i <span style="color: #66cc66;">%</span> <span style="color: #ff4500;">2</span> == <span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span></pre></div></div>

<p>Which corresponds to:</p>

<div class="wp_syntax"><div class="code"><pre class="pseudocode" style="font-family:monospace;">*result*  = [*transform*    *iteration*         *filter*     ]</pre></div></div>

<p>The <em>filter</em> piece answers the question, &quot;should this item be transformed?&quot; If the answer is yes, then the <em>transform</em> piece is evaluated and becomes an element in the <em>result</em>. The <em>iteration</em> <a class="footnote-reference" href="#id2" id="id1"><tt>[*]</tt></a> order is preserved in the <em>result</em>.</p>
<p>Go ahead and figure out what you expect <tt class="docutils literal"><span class="pre">new_range</span></tt> to be in the prior example. You can double check me in the Python shell, but I think it comes out to be:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> new_range = <span style="color: black;">&#91;</span>i <span style="color: #66cc66;">*</span> i <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">if</span> i <span style="color: #66cc66;">%</span> <span style="color: #ff4500;">2</span> == <span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">print</span> new_range
<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">16</span><span style="color: black;">&#93;</span></pre></div></div>

<p>If it still isn&#8217;t clicking, we can try to make the example less noisy by getting rid of the transform and filter &#8212; can you tell what this will produce?</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> new_range = <span style="color: black;">&#91;</span>i <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span></pre></div></div>

</div>
<div class="section" id="so-what-s-wrong-with-that-first-snippet">
<h3>So what&#8217;s wrong with that first snippet?</h3>
<p>As we observed in the previous section, a list comprehension always produces a <em>result</em> list, where the elements of the result list are the <em>transformed</em> elements of the <em>iteration</em>. That means, if there&#8217;s no <em>filter</em> piece, there are exactly as many <em>result</em> elements as there were <em>iteration</em> elements.</p>
<p>Weird thing number one about the snippet &#8212; the list comprehension <em>result</em> is unused. It&#8217;s created, mind you &#8212; list comprehension always create a value, even if you don&#8217;t care what it is &#8212; but it just goes off to oblivion. (In technical terms, it becomes <em>garbage</em>.) When you don&#8217;t need the <em>result</em>, just use a <tt class="docutils literal"><span class="pre">for</span></tt> loop! This is better:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> colapse<span style="color: black;">&#40;</span>seq<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Preserve order.&quot;&quot;&quot;</span>
    uniq = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> item <span style="color: #ff7700;font-weight:bold;">in</span> seq:
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> uniq.<span style="color: black;">count</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span>:
            uniq.<span style="color: black;">append</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> uniq</pre></div></div>

<p>It&#8217;s two more lines, but it&#8217;s less weird looking and wasteful. &quot;Better for everybody who reads <strong>and</strong> runs your code,&quot; means <a class="reference external" href="http://steve-yegge.blogspot.com/2008/09/programmings-dirtiest-little-secret.html">you should do it</a>.</p>
<p>Moral of the story: a list comprehension isn&#8217;t just, &quot;shorthand for a loop.&quot; It&#8217;s shorthand for a transform from an input sequence to an output sequence with an optional filter. If it gets too complex or weird looking, just make a loop.  It&#8217;s not that hard and readers of your code will thank you.</p>
<p>Weird thing number two: the transform, <tt class="docutils literal"><span class="pre">list.append(item)</span></tt>, produces <tt class="docutils literal"><span class="pre">None</span></tt> as its output value, because the return value from <tt class="docutils literal"><span class="pre">list.append</span></tt> is always <tt class="docutils literal"><span class="pre">None</span></tt>. Therefore, the <em>result</em>, even though it isn&#8217;t kept anywhere, is a list of <tt class="docutils literal"><span class="pre">None</span></tt> values of the same length as <tt class="docutils literal"><span class="pre">seq</span></tt> (notice that there&#8217;s no filter clause).</p>
<p>Weird thing number three: <tt class="docutils literal"><span class="pre">list.count(item)</span></tt> iterates over every element in the <tt class="docutils literal"><span class="pre">list</span></tt> looking for things that <tt class="docutils literal"><span class="pre">==</span></tt> to <tt class="docutils literal"><span class="pre">item</span></tt>. If you think through the case where you call <tt class="docutils literal"><span class="pre">collapse</span></tt> on an entirely unique sequence, you can tell that the collapse algorithm is O(n<sup>2</sup>).  In fact, it&#8217;s even worse than it may seem at first glance, because <tt class="docutils literal"><span class="pre">count</span></tt> will keep going all the way to the end of <tt class="docutils literal"><span class="pre">uniq</span></tt>, even if it finds <tt class="docutils literal"><span class="pre">item</span></tt> in the first index of <tt class="docutils literal"><span class="pre">uniq</span></tt>. What the original author really wanted was <tt class="docutils literal"><span class="pre">item</span> <span class="pre">not</span> <span class="pre">in</span> <span class="pre">uniq</span></tt>, which bails out early if it finds <tt class="docutils literal"><span class="pre">item</span></tt> in <tt class="docutils literal"><span class="pre">uniq</span></tt>.</p>
<p>Also worth mentioning for the computer-sciency folk playing along at home: if all elements of the sequence are comparable, you can bring that down to O(n * log n) by using a &quot;shadow&quot; sorted sequence and <a class="reference external" href="http://docs.python.org/library/bisect.html#module-bisect">bisecting</a> to test for membership. If the sequence is hashable you can bring it down to O(n), perhaps by using the <a class="reference external" href="http://docs.python.org/library/stdtypes.html#set">set</a> datatype if you are in Python &gt;= 2.3. Note that the common cases of strings, numbers, and tuples (any built-in immutable datatype, for that matter) are hashable.</p>
</div>
<div class="section" id="from-python-history">
<h3>From Python history</h3>
<p>It&#8217;s interesting to note that <a class="reference external" href="http://www.python.org/dev/peps/pep-0270/">Python Enhancement Proposal (PEP) #270</a> considered putting a <tt class="docutils literal"><span class="pre">uniq</span></tt> function into the language distribution, but withdrew it with the following statement:</p>
<blockquote><p>
Removing duplicate elements from a list is a common task, but there are only two reasons I can see for making it a built-in. The first is if it could be done much faster, which isn&#8217;t the case.  The second is if it makes it significantly easier to write code.  The introduction of <tt class="docutils literal"><span class="pre">sets.py</span></tt> eliminates this situation since creating a sequence without duplicates is just a matter of choosing a different data structure: a set instead of a list.</p></blockquote>
<p>Remember that sets can only contain hashable elements (same policy as dictionary keys) and are therefore not suitable for all uniq-ifying tasks, as mentioned in the last paragraph of the previous section.</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id2" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1"><tt>[*]</tt></a></td>
<td>&quot;Iteration&quot; is just a fancy word for &quot;step through the sequence, element by element, and give that element a name.&quot; In our case we&#8217;re giving the name <tt class="docutils literal"><span class="pre">i</span></tt>.</td>
</tr>
</tbody>
</table>
</div>
<img src="http://feeds.feedburner.com/~r/cdleary/~4/U8CApnhMnc4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
