<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" version="2.0">

<channel>
	<title>xion.log</title>
	
	<link>http://xion.org.pl</link>
	<description>Programming on (many) different levels of abstraction</description>
	<lastBuildDate>Tue, 14 Feb 2012 20:32:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/xion" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="xion" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><geo:lat>53.25</geo:lat><geo:long>21</geo:long><creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" /><image><link>http://xion.org.pl</link><url>http://xion.org.pl/wp-content/themes/xion.log/images/rss-logo.jpg</url><title>xion.log</title></image><item>
		<title>Adventures in Haskelland</title>
		<link>http://xion.org.pl/2012/02/14/adventures-in-haskelland/</link>
		<comments>http://xion.org.pl/2012/02/14/adventures-in-haskelland/#comments</comments>
		<pubDate>Tue, 14 Feb 2012 20:31:51 +0000</pubDate>
		<dc:creator>Xion</dc:creator>
				<category><![CDATA[Computer Science & IT]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[functional programming]]></category>
		<category><![CDATA[functors]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[monads]]></category>

		<guid isPermaLink="false">http://xion.org.pl/?p=3371</guid>
		<description><![CDATA[In a post from top of the year I mentioned that I was looking into Haskell programming language, and promised to give some insight on how it fares compared to others. Well, I think it's time to deliver on that promise, for the topic of this particular language - and functional programming in general - [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://xion.org.pl/2012/01/01/looking-back-and-maybe-even-forward/">a post from top of the year</a> I mentioned that I was looking into Haskell programming language, and promised to give some insight on how it fares compared to others. Well, I think it's time to deliver on that promise, for the topic of this particular language - and functional programming in general - is indeed very insightful.</p>
<p><img style="float:right" src="http://xion.org.pl/wp-content/uploads/2012/01/haskell-logo.png" alt="" title="Haskell logo" />I do not claim to have achieved any kind of proficiency in Haskell, of course; I might very well be just scratching the surface. However, this is exactly the sort of perspective I wanted to employ when evaluating language usefulness: a practical standpoint, held by programmer who is looking to use it for actual tasks, without having mastered it in great detail - at least initially. This is, by the way, a pretty common setting when tackling all new things related to coding, be it frameworks, software platforms or languages.</p>
<p>Still, I knew it was almost <em>supposed</em> to be a rough ride. A language whose tutorials purposefully shy away from classic <em>Hello World</em> example (typically inserting a factorial function instead) looks like something designed specifically to melt brains of poor programmers who dare to venture forth to investigate it. Those few who make their way back are told about in folk tales, portrayed as mildly crazy types who profess this weird idea of "purity", and utter the word 'monad' with great contempt.</p>
<p>Okay, I may be exaggerating... slightly. But there is no denying that Haskell attained <a href="http://www.xent.com/pipermail/fork/Week-of-Mon-20070219/044101.html">a certain kind of reputation</a>, something like a quirky cousin in the big and diverse family of languages. Unlike Lisp, he isn't picked on due to any (particular (and (rather) obvious)) property. There is just this general aura of weirdness and impracticality that he supposedly radiates, repelling all but the most adventurous of coders.</p>
<p>If it really exists, then pity: it certainly failed to repel <strong>me</strong>. As a result, I may have a story (or two) to share with those who want to learn what's this functional business is really about, and why should they care.</p>
<p>Grab a chair and something to drink - it's going to take a while. But in the end, you shouldn't regret staying.<br />
<span id="more-3371"></span></p>
<h3>Introduction of main protagonist</h3>
<p>Regardless of what prior knowledge you may already have, it is certainly useful to begin with the basics.</p>
<div style="float:right; margin:5px; font-size: small; text-align:center"><img src="http://xion.org.pl/wp-content/uploads/2012/02/haskell-curry.jpeg" alt="" title="Haskell Curry" /><br/>Haskell Curry (1900-1982)</div>
<p><a href="http://www.haskell.org">Haskell</a> is a general purpose, compiled, statically (and strongly) typed, purely functional programming language with lazy evaluation, expansive and modular standard library, a rich <a href="http://www.haskell.org/hoogle/">repository</a> of external packages, support for all major operating systems via the <a href="http://www.haskell.org/ghc/">GHC compiler</a>, and peculiar ability of making blogposts sound like <a href="http://en.wikipedia.org/wiki/Haskell">Wikipedia articles</a>. It doesn't seem to have a single creator (in a way that, for instance, Python has Guido von Rossum), but its namesake is <a href="http://en.wikipedia.org/wiki/Haskell_Curry">Haskell Curry</a>, a 20th century mathematician and logician.</p>
<p>As for popularity, it currently sports a not-very-impressive 42th place on the <a href="http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html">Tiobe Index</a> - notably above Go, Clojure, Dart and Scala, and unsurprisingly below most languages you would think about. GitHub currently hosts around <a href="https://github.com/search?q=language%3AHaskell&#038;type=Everything&#038;repo=&#038;langOverride=&#038;start_value=1">6700 repositories</a> whose projects use Haskell in any way - compared to, say, about 80,000 for Python, 40,000 for C and 160,000 for Ruby (yikes!). A little below 6000 questions tagged <em>haskell</em> has been asked on <a href="http://stackoverflow.com/questions/tagged/haskell">StackOverflow</a>, putting this tag on the same page as <em>boost</em>, <em>jquery-plugins</em>, <em>version-control</em> and <em>soap</em> (eww!).</p>
<h3>Motivation</h3>
<p>So it goes without saying that Haskell seems to be rather hip..., <em>ahem</em>, niche language. In the programming world, this should be considered a liability, since larger community around a language naturally results in more third-party libraries, open source projects and yes, job offers. On the other hand, we know the old saying about quantity triumphing quality, which we are inclined to believe at least in some cases (<em>*cough*</em> Java).</p>
<p>If not popularity, then what are the Haskell's immediately recognizable treats that could appeal to newcomers?... For one, it's the GHC package, which stands for Glasgow Haskell Compiler. Easily available on any desktop platform, it's not only a compiler but also an interpreter which can run scripts (via <kbd>runghc</kbd> command), and act as a REPL (Read-Eval-Print Loop) for quick experiments (via <kbd>ghci</kbd>). That last thing is especially important, as I've grown to perceive languages that refuse to be poked at with their own shell as just plain barbaric. Haskell proves it's entirely possible for a compiled language to offer sensible, efficient and useful REPL.</p>
<h3>Beware of big concepts in small pieces</h3>
<p>What about writing real code? As for actual programs, they are organized into modules (.<em>hs</em> files) which form a nice hierarchy of namespaces, just like you would expect when coming from Java, .NET, etc. One neat thing here is how external packages fit into the whole picture. Rather than rolling out an independent root namespace for every custom package, convention orders to place third-party modules within <a href="http://lambda.haskell.org/platform/doc/current/index.html">existing tree</a>. As a result, stuff related to networking goes always into <code>Network</code> namespace, text utilities (e.g. parsers) into <code>Text</code>, data structures into <code>Data</code>, OS stuff into <code>System</code>, and so on.</p>
<p>On more local level, the most basic unit of code structure is of course a <em>function</em>. Haskell functions are slightly different than usual, because they are required to be <strong>referentially transparent</strong>: they must always return the same value for the same arguments. Also, they cannot have the so-called <strong>side effects</strong>, which excludes pretty much anything outside of actual computation - most notably I/O.<br />
How this can be reconciled with being able to, well, <em>do</em> anything practical shall be left for now. (Some may know that it's about the infamous <a href="http://en.wikipedia.org/wiki/Monad_%28functional_programming%29">M-word</a>). What I'd like to point out instead is that a function in Haskell is just a name for (parametrized) expression. If you're a Python programmer, you can imagine functions  being always defined with <code>lambda</code> rather than typical <code>def</code>. Because there really isn't much you can cram into a single expression, you are basically required to aggressively <strong>split things</strong> into smaller and smaller bits, using <code>let</code>-<code>in</code> or <code>where</code> syntax. Here's a <a href="https://github.com/Xion/wikilate/blob/40ac1b40cdd87bda5912d2e061bf5a35ec5a2e83/Wikilate.hs#L121">simple example</a> of such splitting, done for a function that is already pretty trivial - it just formats an URL:</p>
<div class="syntax_hilite">
<div id="code-11">
<div class="code">wikipediaUrl :: <span style="">String</span> -&gt; <span style="">String</span> -&gt; <span style="">Maybe</span> String -&gt; <span style="">URI</span><br />
wikipediaUrl sourceLang phrase continue =<br />
&nbsp; &nbsp; fromJust $ parseURI url<br />
&nbsp; &nbsp; where<br />
&nbsp; &nbsp; &nbsp; &nbsp; url = concat <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#CC0000;">"http://"</span>, sourceLang, <span style="color:#CC0000;">".wikipedia.org/w/api.php?"</span>, urlEncodeVars urlArgs<span style="color:#006600; font-weight:bold;">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; urlArgs = <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC0000;">"action"</span>, <span style="color:#CC0000;">"query"</span><span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC0000;">"prop"</span>, <span style="color:#CC0000;">"langlinks"</span><span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC0000;">"format"</span>, <span style="color:#CC0000;">"json"</span><span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC0000;">"titles"</span>, phrase<span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ++ maybe <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006600; font-weight:bold;">&#93;</span> continueArg continue<br />
&nbsp; &nbsp; &nbsp; &nbsp; continueArg c = <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC0000;">"llcontinue"</span>, c<span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#93;</span></div>
</div>
</div>
<p>
Hopefully I don't have to elaborate overmuch on how this is a <em>good</em> thing, to have your code organized into many small, manageable parts that clearly relate to each other. In Haskell, this is almost the only way you can actually produce anything, which is both marvelous and scary at the same time. There are simply no excuses for quick &#038; dirty hacks: refactoring happens <em>while</em> you are coding, not afterwards.</p>
<h3>Types of types</h3>
<p>Besides algorithms (functions), we know that programs are composed of data structures, and in case of Haskell they translate into data types. Quite amusingly, if we don't count aliases, there is just one sort of types that can be defined here. It's called <em>algebraic data type</em> (ADT) and it's entirely sufficient, because it simply has <em>everything</em>. And by that I mean literally everything, because it allows to define stuff like enums:</p>
<div class="syntax_hilite">
<div id="code-12">
<div class="code">data Axis = XAxis | YAxis | ZAxis</div>
</div>
</div>
<p>
as well as structures (records):</p>
<div class="syntax_hilite">
<div id="code-13">
<div class="code">data Point2D = Point2D Int Int<br />
data Contact = Contact <span style="color:#006600; font-weight:bold;">&#123;</span><br />
&nbsp; &nbsp; contactFirstName :: <span style="">String</span>,<br />
&nbsp; &nbsp; contactLastName :: <span style="">String</span>,<br />
&nbsp; &nbsp; contactEmail :: <span style="">String</span><br />
<span style="color:#006600; font-weight:bold;">&#125;</span></div>
</div>
</div>
<p>
and unions:</p>
<div class="syntax_hilite">
<div id="code-14">
<div class="code">data Shape = Rect <span style="color:#006600; font-weight:bold;">&#123;</span><br />
&nbsp; &nbsp; rectLeft :: <span style="">Int</span>,<br />
&nbsp; &nbsp; rectTop :: <span style="">Int</span>,<br />
&nbsp; &nbsp; rectRight :: <span style="">Int</span>,<br />
&nbsp; &nbsp; rectBottom :: <span style="">Int</span><br />
<span style="color:#006600; font-weight:bold;">&#125;</span> | Circle <span style="color:#006600; font-weight:bold;">&#123;</span><br />
&nbsp; &nbsp; circleMiddle :: <span style="">Point2D</span>,<br />
&nbsp; &nbsp; circleRadius :: <span style="">Float</span><br />
<span style="color:#006600; font-weight:bold;">&#125;</span></div>
</div>
</div>
<p>
Not only that - it also permits types to have <em>parameters</em> (with constraints):</p>
<div class="syntax_hilite">
<div id="code-15">
<div class="code">data Num a =&gt; Interval a = Interval a a<br />
data Tree t = Leaf t | Node <span style="color:#006600; font-weight:bold;">&#40;</span>Tree t<span style="color:#006600; font-weight:bold;">&#41;</span> t <span style="color:#006600; font-weight:bold;">&#40;</span>Tree t<span style="color:#006600; font-weight:bold;">&#41;</span></div>
</div>
</div>
<p>
which easily satisfies the definition of 'template', 'generic type' and 'concept' from other languages. And if you haven't noticed, we just combined half of semantics of C and the most powerful feature of C++ into <strong>one syntactical construct</strong> - and rather simple one, too. I don't know about you, but for me it just oozes awesomeness.</p>
<h3>Really strong typing</h3>
<p>The perks of Haskell type system are not limited to ADTs, though. As I have mentioned earlier, Haskell is a compiled and statically typed language. For many programmers this may bring immediate associations with cryptic error messages and <a href="http://www.youtube.com/watch?v=5kj5ApnhPAE#t=131s">pointless things littering the code</a> just to please the compiler. In reality, static typing was always meant as a tool to reinforce code correctness; but often, it just didn't seem to bring enough benefits to justify the headaches involved.</p>
<p>Haskell appears to be offering a proper balance. Because it's very good at figuring out the types by itself (via <em>type inference</em>), it seldom requires explicit type declarations. (They are still recommended for top-level functions, though). And since the type system is extremely powerful, it can often serve as a sufficient proof of program's correctness. Basically: if it compiles, it has pretty high chance of working - much higher than in other statically typed languages, at least.</p>
<p>To be honest, this happens not only due to elaborate mixing and matching of types, but also because of referential transparency. Constraining the class of possible actions that a particular piece of code can undertake goes a long way towards preventing errors. Furthermore, it greatly simplifies the testing process, up to the point of rendering many typical kinds of unit tests obsolete. Indeed, Haskell's testing facility known as <a href="http://en.wikipedia.org/wiki/QuickCheck">QuickCheck</a> can <strong>generate test cases automatically</strong>, based on a set of properties that we expect our functions to hold.</p>
<h3>You don't think FP-dimensionally</h3>
<p>Going through all this features and learning about those wonders of modern technology, you might be left thinking: where's the trick here?... It's wise to not let yourself be caught off-guard, for indeed there are some downsides. Several of them, actually.</p>
<p>An obvious one seems to be the functional paradigm itself, especially if coming from a language that wasn't significantly influenced by FP concepts. However, the pool of languages that still lack at least basic functional features is steadily shrinking. A relatively recent example could be C++, which has just gained a superior support for closures and functions as first-order values, supplementing its already existing capabilities. Others - like popular scripting languages - are usually going even further in their support for functional style.</p>
<p>So if you have done some serious coding in any of those languages, you had good chance of seeing the elements of FP in action. Heck, there is even a non-zero probability that you have <em>used</em> them, which should definitely help with cushioning the Haskell's blow. And if structuring your code using maps, folds, pattern matching and guards turns out to be insufficient, there is always a place for plain ol' loops. Sure, they are called "tail recursion" and may not have an obvious 1:1 correspondence to iterative constructs from imperative languages, but they are close enough to work in every reasonable case - if only somewhat inelegantly.</p>
<h3>Which style do you choose?</h3>
<p>Unfortunately, getting past the initial obstacle and learning to frame your thoughts into purely functional code is just a start. See, on some level Haskell can be viewed as conglomerate of several sublanguages, defined by functions and operators applicable to specific <em>typeclasses</em>. A <a href="http://en.wikipedia.org/wiki/Type_class">typeclass</a> is rather simple concept here; it can be explained as slightly more powerful version of an <strong>interface</strong> from traditional OOP. Concrete types can be instances of one or more typeclass, just like "normal" classes are said to be implementing interfaces.</p>
<p>In Haskell, however, <a href="http://www.haskell.org/haskellwiki/Typeclassopedia">some typeclasses</a> are so influential that their operators and functions essentially dictate the overall shape of a piece of code. Therefore, you can often hear about few different "styles" of Haskell code, and the most prominent ones seem to be: applicative, arrow-based and monadic style.</p>
<p>When writing any more complicated piece of logic, you are generally required to decide early on which style you are going to use. The choice is often non-obvious, as the foundational bases behind different styles are closely related, often through inclusion (like applicative functors and monads). Each style has a distinct "feel" to it, though, and correspondingly, it looks noticeably different in actual code. And while it's entirely possible to mix several styles together, it requires great care if result is to be in any way comprehensible later on.</p>
<h3>It's not hard; it's abstract</h3>
<p>At this point it's quite natural to ask: what are those things anyway, those functors, arrows and <em>*deep breath*</em> monads?... Actually, I could add a few more to the mix. Meet categories, foldables, traversables, monad transformers and monoids; I'm sure it's a beginning of a long-lasting friendship!</p>
<p>Jokes aside: the answer is both simple and complicated. These are abstract concepts, originating mostly from <a href="http://en.wikipedia.org/wiki/Category_theory">category theory</a> and pertaining to either computational processes, or entities that those processes are operating on. Note that by 'abstract', I mean <strong>real</strong>, mathematical abstraction - not the flamboyant, object-oriented stuff that's making wimpy coders shudder because they've seen too many service provider factories. That's just flexibility at best, and complexity at worst - but nothing there requires cracking big, mental roadblocks in order to understand.</p>
<p>With terms I've listed above, it's entirely different situation. Sure, they are all built from quite simple, elementary building blocks: functions, their combinations and parametrized data types. But somewhere along the way up, there is always an initial mental gap which has to be filled with our own, individual analogies that allow us to grasp these abstract concepts. Those analogies are likely specific to given individual and might very well be impossible to put into words. Because of that, there is no royal road to enlightenment: one must simply bash their head against those ideas long enough for those helpful analogies to finally emerge.</p>
<p>Oh, and don't think you'll be able to communicate them to others. It's been tried enough times as to attain a dedicated term: the <a href="http://byorgey.wordpress.com/2009/01/12/abstraction-intuition-and-the-monad-tutorial-fallacy/"><em>monad tutorial fallacy</em></a>.</p>
<h3>parseJSON :: Result Headache</h3>
<p>You may argue - and be absolutely correct - that this situation isn't really specific to Haskell or functional programming. Indeed, it happens when we start to tackle any new programming paradigm, and especially when just learning to code for the first time. But many of us simply aren't used to be at that level anymore, especially if their last time was few years ago or even further in the past. Thus, in a sense, taking up Haskell often begins with a lesson in humility. Hardly a pleasant experience, but apparently a necessary one.</p>
<p>I would be totally fine with intricate semantics, though, if things that are obviously fixable weren't getting in the way so very often. One of those things is the miserable excuse for "documentation" that many Haskell packages are being shipped with. And I mean packages with absolutely crucial functionality, such as <a href="http://lukeplant.me.uk/blog/posts/haskell-api-docs-suck-a-lot/">regular expressions' library</a> or - my personal nemesis - <a href="http://hackage.haskell.org/package/json-0.4.3">JSON parser package</a>. In the latter case, the <a href="http://hackage.haskell.org/packages/archive/json/0.4.3/doc/html/Text-JSON.html">documentation</a> is so overwhelmingly useless that I had to spend better part of an hour just to figure out any kind of starting point to work my way from. Then, fleshing out the details for a final parsing logic of a <a href="https://github.com/Xion/wikilate">a very simple project</a> took <strong>more than one evening</strong>.</p>
<p>To emphasize: I'm talking about handling JSON here, a bread and butter of most contemporary applications that have anything to do with the Internet. This is something that should be as trivial as putting a string together, or sorting a collection via custom comparator. It's not like there aren't any <a href="http://docs.python.org/library/json.html">precedents</a> here, showing that it is entirely doable, and thus a reasonable thing to request from a language or its standard library. Yet it is apparently not the case in Haskell, which is just shameful and sad.</p>
<h3>How much magic can you take?</h3>
<p>Naturally, this isn't something that cannot be amended, and I hope for Haskell packages' docs to steadily improve over time. Besides, hiccups like that are not nearly enough to dispel the irrefutable "magic" that Haskell emanates - although admittedly, not every shade of it is something to be satisfied with either.</p>
<p>Surely, there's plenty of magic of <a href="http://en.wikipedia.org/wiki/Clarke%27s_Three_Laws">the Arthur C. Clarke type</a>. Some of this good stuff I briefly touched upon in the beginning, but there's a lot more things worthy of attention here. One of those may be an out-of-the-box concurrency, which allows to automatically parallelize computing of pure code. Combined with support for <a href="http://www.haskell.org/haskellwiki/Software_transactional_memory"><em>software transactional memory</em></a> (STM), it can turn hard problems of concurrent programming into something much more manageable.</p>
<p>But unfortunately, it seems that the other kind of coding magic - of bad type - is also abundant in Haskell. It's the magic of doing a whole lot of stuff <strong>behind the scenes</strong>, out of programmer's sight. It is present in the very idea of lazy evaluation, resulting in detachment between the place where expression is put in code and the actual moment where it is computed. This may sometimes result in unwanted performance implications, forcing us to use <code>seq</code> or bangs (e.g. <code>$!</code> for function application) in order to regain some control over evaluation.</p>
<p>Hidden magic can also lie in syntax, obscuring the details of what particular piece of code really does. Sometimes it might even allow for one snippet to do several totally different things, as if you had overloaded the semicolon "operator" in C-like language. 'Programmable semicolons' is actually one of the many epithets associated with monads, and the reason for that can be easily seen in the following function <code>f</code>:</p>
<div class="syntax_hilite">
<div id="code-16">
<div class="code">-- a monadic function<br />
f x y = do<br />
&nbsp; &nbsp; a &lt;- x<br />
&nbsp; &nbsp; b &lt;- y<br />
&nbsp; &nbsp; return <span style="color:#006600; font-weight:bold;">&#40;</span>a, b<span style="color:#006600; font-weight:bold;">&#41;</span></div>
</div>
</div>
<p>
If we execute it within context of <code>IO</code> monad, we will get rather predictable result. (And, as a bonus, we'll finally see that I/O in Haskell is very possible):</p>
<div class="syntax_hilite">
<div id="code-17">
<div class="code">main = do<br />
&nbsp; &nbsp; x &lt;- getLine&nbsp; &nbsp; -- read line from stdin<br />
&nbsp; &nbsp; y &lt;- getLine<br />
&nbsp; &nbsp; print $ f x y&nbsp; &nbsp;-- print value to stdout</p>
<p>-- alternatively<br />
main = print =&lt;&lt;<span style="color:#006600; font-weight:bold;">&#40;</span>f getLine getLine<span style="color:#006600; font-weight:bold;">&#41;</span></div>
</div>
</div>
<p>
But if we use <code>f</code> on two lists, we will get something entirely different:</p>
<div class="syntax_hilite">
<div id="code-18">
<div class="code">main = print $ f <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#800000;">1</span>..<span style="color:#800000;">2</span><span style="color:#006600; font-weight:bold;">&#93;</span> <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#800000;">1</span>..<span style="color:#800000;">3</span><span style="color:#006600; font-weight:bold;">&#93;</span><br />
-- prints: <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#800000;">1</span>,<span style="color:#800000;">1</span><span style="color:#006600; font-weight:bold;">&#41;</span>,<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#800000;">1</span>,<span style="color:#800000;">2</span><span style="color:#006600; font-weight:bold;">&#41;</span>,<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#800000;">1</span>,<span style="color:#800000;">3</span><span style="color:#006600; font-weight:bold;">&#41;</span>,<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#800000;">2</span>,<span style="color:#800000;">1</span><span style="color:#006600; font-weight:bold;">&#41;</span>,<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#800000;">2</span>,<span style="color:#800000;">2</span><span style="color:#006600; font-weight:bold;">&#41;</span>,<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#800000;">2</span>,<span style="color:#800000;">3</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#93;</span></div>
</div>
</div>
<p>
By some inexplicable sleight of hand, the function has been executed six times, resulting in a Cartesian product of its arguments. For an untrained eye, this has to be really bizarre sort of polymorphism, with no obvious explanation (as I imagine that "list is also a monad" isn't <em>nearly</em> enough).</p>
<h3>Eighteen-hole course</h3>
<p>You may be blinking now, and wondering: who could have used that in real code, anyway? After all, it's an incredibly effective technique to rapidly decrease the quality of code, as measured by t<a href="http://www.osnews.com/story/19266/WTFs_m">he one and only, true metric</a>. Surely, real haskellers must have much more sanity than that, right?...</p>
<p>By the way, did you notice the names I've used in the preceding samples? Not counting library functions and mandatory <code>main</code>, they were all one-letter names: <code>f</code>, <code>x</code>, <code>y</code>, etc. I'm sure you have thought it's only because this is just an example, unlikely to be replicated in actual code. I must therefore inform you right way: you're greatly mistaken.<br />
Using one-letter identifiers looks like a long-standing Haskell tradition, up to the point of utter confusion. When <code>f</code> can stand for both function or functor (not nearly the same thing); <code>m</code> could be a monoid or monad; <code>t</code> may be traversable, monad transformer or just an arbitrary type - it's hardly surprising to find yourself staring at piece of code in dazing bewilderment.</p>
<p>It won't be <em>only</em> because of cryptic identifiers, though. I noticed that haskellers exhibit a strange fondness to <em>very</em> terse code, dubbing this property "expressiveness". For instance, the following function could be considered relatively typical:</p>
<div class="syntax_hilite">
<div id="code-19">
<div class="code">-- Retrieves a list of meaningful filesystem entries in given directory<br />
getDirEntries :: <span style="">FilePath</span> -&gt; <span style="">IO</span> <span style="color:#006600; font-weight:bold;">&#91;</span>String<span style="color:#006600; font-weight:bold;">&#93;</span><br />
getDirEntries = liftM <span style="color:#006600; font-weight:bold;">&#40;</span>filter <span style="color:#006600; font-weight:bold;">&#40;</span>`notElem` <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#CC0000;">"."</span>, <span style="color:#CC0000;">".."</span><span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span> . <span style="">getDirectoryContents</span></div>
</div>
</div>
<p>
Notice how this function must take an argument (directory's path):</p>
<div class="syntax_hilite">
<div id="code-20">
<div class="code">ghci&gt; getDirEntries <span style="color:#CC0000;">"/home/xion/experiments/haskell"</span><br />
<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#CC0000;">"Test.hs"</span>,<span style="color:#CC0000;">"bouncing-balls"</span>,<span style="color:#CC0000;">"coded4"</span><span style="color:#006600; font-weight:bold;">&#93;</span></div>
</div>
</div>
<p>
yet there is no identifier that would stand for it in function's declaration or body. It is defined entirely in terms of applying some <em>combinators</em> to functions treated as first-order values, which is often referred to as <strong>point-free style</strong>, and greatly encouraged. Since there exists plethora of different combinators, this technique can be - and often is - taken really far, exceeding the readability threshold of all but the most seasoned hackers.</p>
<p>This is called "code golf" - a term I've heard for the first time only when diving into Haskell. Frankly, other languages are <a href="http://www.ioccc.org/">much more honest</a> when talking about similar practices.</p>
<h3>Jumping to conclusions</h3>
<p>So, is Haskell beyond salvation?... Hardly. In terms of applications, few areas are already explored very well, with concurrent programs, distributed computing and web programming (yes!) being probably the most important ones. Some others are still relatively pristine, and some can be placed in between: where serious effort is being made, but there aren't yet any universally acclaimed solutions and practices (e.g. programming graphical UIs).</p>
<p>But despite obvious capabilities of performing very practical tasks, Haskell ecosystem as a whole still appears to suffer from the ivory tower syndrome. There is little doubt that learning Haskell is an overall mind-expanding experience, ultimately leading to better code in any language you choose to write in. However, <em>using</em> Haskell for everyday programming tasks is often striding through unknown territory, exploring strange new paradigms, seeking out new techniques and methodologies, to boldly go where no monad has gone before.</p>
<img src="http://feeds.feedburner.com/~r/xion/~4/B08B3r9SggY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://xion.org.pl/2012/02/14/adventures-in-haskelland/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Generator Pitfalls</title>
		<link>http://xion.org.pl/2012/02/08/generator-pitfalls/</link>
		<comments>http://xion.org.pl/2012/02/08/generator-pitfalls/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 18:34:53 +0000</pubDate>
		<dc:creator>Xion</dc:creator>
				<category><![CDATA[Computer Science & IT]]></category>
		<category><![CDATA[caveats]]></category>
		<category><![CDATA[collections]]></category>
		<category><![CDATA[generators]]></category>
		<category><![CDATA[higher-order functions]]></category>
		<category><![CDATA[iterables]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://xion.org.pl/?p=3296</guid>
		<description><![CDATA[I'm a big fan of Python generators. With seemingly little effort, they often allow to greatly reduce memory overhead. They do so by completely eliminating intermediate results of list manipulations. Along with functions defined within itertools package, generators also introduce a basic lazy computation capabilities into Python. This combination of higher-order functions and generators is [...]]]></description>
			<content:encoded><![CDATA[<p>I'm a big fan of Python generators. With seemingly little effort, they often allow to greatly reduce memory overhead. They do so by completely eliminating intermediate results of list manipulations. Along with functions defined within <code>itertools</code> package, generators also introduce a basic <em>lazy computation</em> capabilities into Python.<br />
This combination of higher-order functions and generators is truly remarkable. Why? Because compared to ordinary loops, it gains us both speed <strong>and</strong> readability <strong>at the same time</strong>. That's quite an achievement; typically, optimizations sacrifice one for the other.</p>
<p>All this goodness, however, comes with few strings attached. There are some ways in which we can be bitten by improperly used generators, and I think it's helpful to know about them. So today, I'll try to outline some of those traps.</p>
<h3>Generators are lazy</h3>
<p>You don't say, eh? That's one of the main reasons we are so keen on using them, after all! But what I'm implying is the following: if a generator-based code is to actually <strong>do</strong> anything, there must be a point where all this laziness "bottoms out". Otherwise, you are just building expression thunks and not really evaluating anything.</p>
<p>How such circumstances might occur, though? One case involves using generators for their <em>side effects</em> - a rather questionable practice, mind you:</p>
<div class="syntax_hilite">
<div id="python-26">
<div class="python"><span style="color: #ff7700;font-weight:bold;">def</span> process_kvp<span style="color: black;">&#40;</span>key, value=<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">print</span> key,<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> value: <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">"="</span>, value</p>
<p><span style="color: #dc143c;">itertools</span>.<span style="color: black;">starmap</span><span style="color: black;">&#40;</span>process_kvp, some_dict.<span style="color: black;">iteritems</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></div>
</div>
</div>
<p>
The <code>starmap</code> call does not print anything here, because it just constructs a generator. Only when this generator is iterated over, dictionary elements would be fed to <code>process_kvp</code> function. Such iteration would of course require a loop (or <a href="http://docs.python.org/library/itertools.html#recipes"><code>consume</code> recipe</a> from <code>itertools</code> docs), so we might as well do away with generator altogether and just stick with plain old <code>for</code>:</p>
<div class="syntax_hilite">
<div id="python-27">
<div class="python"><span style="color: #ff7700;font-weight:bold;">for</span> kv <span style="color: #ff7700;font-weight:bold;">in</span> some_dict.<span style="color: black;">iteritems</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; process_kvp<span style="color: black;">&#40;</span>*kv<span style="color: black;">&#41;</span></div>
</div>
</div>
<p>
In real code the judgement isn't so obvious, of course. We'll most likely receive the generator from some external function - a one that probably refers to its result only as an <em>iterable</em>. As usual in such cases, we must not <strong>assume anything</strong> beyond what is explicitly stated. Specifically, we cannot expect that we have any kind of existing, tangible collection with actual elements. An iterable can very well be just a generator, whose items are yet to be evaluated. This fact can impact performance by moving some potentially heavy processing into unexpected places, resulting in e.g. executing database queries while rendering website's template.</p>
<h3>Generators are truthy</h3>
<p>The above point was concerned mostly with performance, but what about correctness? You may think it's not hard to conform to iterables' contract, which should in turn guarantee that they don't backfire on us with unexpected behavior. Yet how many times did you find yourself reading (or writing!) code such as this:</p>
<div class="syntax_hilite">
<div id="python-28">
<div class="python"><span style="color: #ff7700;font-weight:bold;">def</span> do_stuff<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; items = get_stuff_to_do<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;"># returns iterable</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> items:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #dc143c;">logging</span>.<span style="color: black;">debug</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">"No stuff to do!"</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff4500;">0</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;"># ...do something with 'items'... </span></div>
</div>
</div>
<p>
It's small and may look harmless, but it still manages to make an ungrounded (thus potentially false) assumption about an iterable. The problem lies in <code>if</code> condition, meant to check whether we have any <code>items</code> to do stuff with. It <em>would</em> work correctly if <code>items</code> were a list, <code>dict</code>, <code>deque</code> or any other actual collection. But for generators, it <strong>will not</strong> work that way - even though they are still undoubtedly iterable!</p>
<p>Why? Because generators are not collections; they are just suppliers. We can only tell them to return their <code>next</code> value, but we cannot peek inside to see if the value is really there. As such, generators are not susceptible to boolean coercion in the same way that collections are; it's not possible to check their "emptiness". They behave like regular Python objects and are <strong>always truthy</strong>, even if it's "obvious" they won't yield any value when asked:</p>
<div class="syntax_hilite">
<div id="code-29">
<div class="code">&gt;&gt;&gt; g = <span style="color:#006600; font-weight:bold;">&#40;</span>x for x in <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#41;</span><br />
&gt;&gt;&gt; if g: print <span style="color:#CC0000;">"Truthy!"</span><br />
Truthy!</div>
</div>
</div>
<p>
Going back to previous sample, we can see that <code>if</code> block won't be executed in case of <code>get_stuff_to_do</code> returning an "empty" generator. Consequences of this fact may vary from barely noticeable to disastrous, depending on how the rest of <code>do_stuff</code> function looks like. Nevertheless, that code will run with one of its invariants violated: a fertile ground for any kind of unintended effects.</p>
<h3>Generators are once-off</h3>
<p>An intuitive, informal understanding of the term 'iterable' is likely to include one more unjustified assumption: that it can iterated over, and over, i.e. multiple times. Again, this is very much true if we're dealing with a collection, but generators simply don't carry enough information to repeat the sequence they yield. In other words, they cannot be rewound: once we go through a generator, it's stuck in its final state, not usable for anything more.</p>
<p>Just like with previous caveats, failure to account for that can very well go unnoticed - at least until we spot some weird behavior it results in. Continuing our superficial example from preceding paragraph, let's pretend the rest of <code>do_stuff</code> function requires going over <code>items</code> at least twice. It could be, for example, an iterable of objects in a game or physics simulation; objects that can potentially move <em>really fast</em> and thus require some more sophisticated collision detection (based on e.g. intersection of segments):</p>
<div class="syntax_hilite">
<div id="python-30">
<div class="python">new_positions = calculate_displacement<span style="color: black;">&#40;</span>items, delta_time<span style="color: black;">&#41;</span><br />
collision_pairs, rest = detect_collisions<span style="color: black;">&#40;</span>items, new_positions<span style="color: black;">&#41;</span><br />
collided = compute_collision_response<span style="color: black;">&#40;</span>collision_pairs<span style="color: black;">&#41;</span><br />
<span style="color: #ff7700;font-weight:bold;">for</span> item <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">itertools</span>.<span style="color: black;">chain</span><span style="color: black;">&#40;</span>collided, rest<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; item.<span style="color: black;">draw</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></div>
</div>
</div>
<p>
Even assuming the incredible coincidence of getting all the required math right (;-)), we wouldn't see any action whatsoever if <code>items</code> is a generator. The reason for that is simple: when <code>calculate_displacement</code> goes through <code>items</code> once (vigorously applying the Eulerian integration, most likely), it fully expends the generator. For any subsequent traversal (like the one in <code>detect_collitions</code>), the generator will appear empty. In this particular snippet, it will most likely result in blank screen, which hopefully is enough of a hint to figure out what's going on :P</p>
<h3>Generators are not lists</h3>
<p>An overarching conclusion of the above-mentioned pitfalls is rather evident and seemingly contrary to statement from the beginning. Indeed, generators may not be a drop-in replacement for lists (or other collections) if we are very much relying on their "listy" behavior. And unless memory overhead proves troublesome, it's also not worth it to inject them into older code that already uses lists.</p>
<p>For new code, however, sticking with generators right off the bat has numerous benefits which I mentioned at the start. What it requires, though, is evicting some habits that might have emerged after we spent some time manipulating lists. I think I managed to pinpoint the most common ways in which those habits result in incorrect code. Incidentally, they all origin from accidental, unfounded expectations towards iterables in general. That's no coincidence: generators simply happen to be the "purest" of iterables, supporting only the bare minimum of required operations.</p>
<img src="http://feeds.feedburner.com/~r/xion/~4/3yIkGV7Sfdc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://xion.org.pl/2012/02/08/generator-pitfalls/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Checking Whether IP is Within a Subnet</title>
		<link>http://xion.org.pl/2012/02/04/checking-whether-ip-is-within-a-subnet/</link>
		<comments>http://xion.org.pl/2012/02/04/checking-whether-ip-is-within-a-subnet/#comments</comments>
		<pubDate>Sat, 04 Feb 2012 21:11:28 +0000</pubDate>
		<dc:creator>Xion</dc:creator>
				<category><![CDATA[Computer Science & IT]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[computer networks]]></category>
		<category><![CDATA[IP address]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://xion.org.pl/?p=3361</guid>
		<description><![CDATA[Recently I had to solve a simple but very practical coding puzzle. The task was to check whether an IPv4 address (given in traditional dot notation) - is within specified network, described as a CIDR block. You may notice that this is almost as ubiquitous as a programming problem can get. Implementations of its simplified [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I had to solve a simple but very practical coding puzzle. The task was to check whether an IPv4 address (given in traditional dot notation) - is within specified network, described as a <a href="http://en.wikipedia.org/wiki/CIDR">CIDR</a> block.<br />
You may notice that this is almost as ubiquitous as a programming problem can get. Implementations of its simplified version are executed literally millions (if not billions) of times per second, for every IP packet at every junction of this immensely vast Internet. Yet, when it comes to doing such thing in Python, one is actually left without much help from the standard library and must simply <em>Do It Yourself</em>.</p>
<p>It's not particularly hard, of course. But before jumping to a solution, let's look how we expect our function to behave:</p>
<div class="syntax_hilite">
<div id="code-33">
<div class="code">&gt;&gt;&gt; ip_in_network<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC0000;">"10.0.0.0"</span>, <span style="color:#CC0000;">"10.0.0.0/8"</span><span style="color:#006600; font-weight:bold;">&#41;</span><br />
True<br />
&gt;&gt; ip_in_network<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC0000;">"127.0.56.34"</span>, <span style="color:#CC0000;">"127.0.0.0/8"</span><span style="color:#006600; font-weight:bold;">&#41;</span><br />
True<br />
&gt;&gt; ip_in_network<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC0000;">"192.168.0.0"</span>, <span style="color:#CC0000;">"192.168.224.0/24"</span><span style="color:#006600; font-weight:bold;">&#41;</span><br />
False</div>
</div>
</div>
<p>
Pretty obvious, isn't it? Now, if you recall how packet routing works under the hood, you might remember that there are some bitwise operations involved. They are necessary to determine whether specific IP address can be found within given network. However low-level this may sound, there is really no escape from doing it this way. Yes, even in Python :P</p>
<p>It goes deeper, though. Most of the interface of <a href="http://docs.python.org/library/socket.html">Python's <code>socket</code> module</a> is actually carbon-copied from POSIX <em>socket.h</em> and related headers, down to exact names of particular functions. As a result, solving our task in C isn't very different from doing it in Python. I'd even argue that the C version is clearer and easier to follow. This is how it could look like:</p>
<div class="syntax_hilite">
<div id="c-34">
<div class="c"><span style="color: #339933;">#include &lt;stdbool.h&gt;</span><br />
<span style="color: #339933;">#include &lt;stdlib.h&gt;</span><br />
<span style="color: #339933;">#include &lt;string.h&gt;</span><br />
<span style="color: #339933;">#include &lt;arpa/inet.h&gt;</span></p>
<p>bool ip_in_network<span style="color: #66cc66;">&#40;</span><span style="color: #993333;">const</span> <span style="color: #993333;">char</span>* addr, <span style="color: #993333;">const</span> <span style="color: #993333;">char</span>* net<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span><br />
&nbsp; &nbsp; <span style="color: #993333;">struct</span> in_addr ip_addr;<br />
&nbsp; &nbsp; <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span>!inet_aton<span style="color: #66cc66;">&#40;</span>addr, &amp;ip_addr<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>&nbsp;<span style="color: #b1b100;">return</span> <span style="color: #000000; font-weight: bold;">false</span>;<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span style="color: #993333;">char</span> network<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">32</span><span style="color: #66cc66;">&#93;</span>;<br />
&nbsp; &nbsp; strncpy<span style="color: #66cc66;">&#40;</span>network, net, strlen<span style="color: #66cc66;">&#40;</span>net<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span style="color: #993333;">char</span>* slash = strstr<span style="color: #66cc66;">&#40;</span>network, <span style="color: #ff0000;">"/"</span><span style="color: #66cc66;">&#41;</span>;<br />
&nbsp; &nbsp; <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span>!slash<span style="color: #66cc66;">&#41;</span>&nbsp;<span style="color: #b1b100;">return</span> <span style="color: #000000; font-weight: bold;">false</span>;<br />
&nbsp; &nbsp; <span style="color: #993333;">int</span> mask_len = atoi<span style="color: #66cc66;">&#40;</span>slash + <span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span>;</p>
<p>&nbsp; &nbsp; *slash = <span style="color: #ff0000;">'<span style="color: #000099; font-weight: bold;">\0</span>'</span>;&nbsp; <br />
&nbsp; &nbsp; <span style="color: #993333;">struct</span> in_addr net_addr;<br />
&nbsp; &nbsp; <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span>!inet_aton<span style="color: #66cc66;">&#40;</span>network, &amp;net_addr<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>&nbsp;<span style="color: #b1b100;">return</span> <span style="color: #000000; font-weight: bold;">false</span>;<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span style="color: #993333;">unsigned</span> ip_bits = ip_addr.<span style="color: #202020;">s_addr</span>;<br />
&nbsp; &nbsp; <span style="color: #993333;">unsigned</span> net_bits = net_addr.<span style="color: #202020;">s_addr</span>;<br />
&nbsp; &nbsp; <span style="color: #993333;">unsigned</span> netmask = net_bits &amp; <span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span> &lt;&lt;mask_len<span style="color: #66cc66;">&#41;</span> - <span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span>;<br />
&nbsp; &nbsp; <span style="color: #b1b100;">return</span> <span style="color: #66cc66;">&#40;</span>ip_bits &amp; netmask<span style="color: #66cc66;">&#41;</span> == net_bits;<br />
<span style="color: #66cc66;">&#125;</span></div>
</div>
</div>
<p>
A similar thing <a href="http://stackoverflow.com/a/819420/434799">done in Python</a> obviously requires less scaffolding, but it also introduces its own intricacies via <a href="http://docs.python.org/library/struct.html"><code>struct</code> module</a> (for unpacking bytes). All in all, it seems like there is not much to be gained here from Python's high level of abstraction.</p>
<p>And that's perfectly OK: no language is a silver bullet. Sometimes we need to do things the quirky way.</p>
<img src="http://feeds.feedburner.com/~r/xion/~4/51Lq8X5JJ_8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://xion.org.pl/2012/02/04/checking-whether-ip-is-within-a-subnet/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The “Wat” of Python</title>
		<link>http://xion.org.pl/2012/01/31/the-wat-of-python/</link>
		<comments>http://xion.org.pl/2012/01/31/the-wat-of-python/#comments</comments>
		<pubDate>Tue, 31 Jan 2012 20:19:43 +0000</pubDate>
		<dc:creator>Xion</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[caveats]]></category>
		<category><![CDATA[memes]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[talks]]></category>

		<guid isPermaLink="false">http://xion.org.pl/?p=3347</guid>
		<description><![CDATA[It is quite likely you are familiar with the Wat talk by Gary Bernhardt. It is sweeping through the Internet, giving some good laugh to pretty much anyone who watches it. Surely it did to me! The speaker is making fun of Ruby and JavaScript languages (although mostly the latter, really), showing totally unexpected and [...]]]></description>
			<content:encoded><![CDATA[<p>It is quite likely you are familiar with the <a href="https://www.destroyallsoftware.com/talks/wat"><em>Wat</em> talk</a> by Gary Bernhardt. It is sweeping through the Internet, giving some good laugh to pretty much anyone who watches it. Surely it did to me!<br />
The speaker is making fun of Ruby and JavaScript languages (although mostly the latter, really), showing totally unexpected and baffling results of some seemingly trivial operations - like adding two arrays. It turns out that in JavaScript, the result is an empty string. (And <a href="http://stackoverflow.com/a/9033306/434799">the reasons for that</a> provoke even bigger "wat").</p>
<p><img style="float:right; margin-left:8px" src="http://xion.org.pl/wp-content/uploads/2012/01/wat-gigantic-duck.jpg" alt="" title="Wat the duck!" />After watching the talk for about five times (it hardly gets old), I started to wonder whether it is only those two languages that exhibit similarly confusing behavior... The answer is of course "No", and that should be glaringly obvious to anyone who knows at least a bit of C++ ;) But beating on <em>that</em> horse would be way too easy, so I'd rather try something more ambitious.<br />
Hence I ventured forth to search for "wat" in Python 2.x. The journey wasn't short enough to stop at mere addition operator but nevertheless - and despite me being nowhere near Python expert - I managed to find some candidates rather quickly.</p>
<p>I strove to keep with the original spirit of Gary's talk, so I only included those quirks that can be easily shown in interactive interpreter. The final result consists of three of them, arranged in the order of increasing puzzlement. They are given without explanation or rationale, hopefully to encourage some thought beyond amusement :)</p>
<p>Behold, then, the <em>Wat</em> of Python!</p>
<p><span id="more-3347"></span></p>
<h3>3. Strange loop</h3>
<div class="syntax_hilite">
<div id="code-38">
<div class="code">&gt;&gt;&gt; f = lambda: map<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#40;</span>yield<span style="color:#006600; font-weight:bold;">&#41;</span>, range<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#800000;">10</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span><br />
&gt;&gt;&gt; for x in f<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span>: print x<br />
...<br />
<span style="">None</span></div>
</div>
</div>
<p>
<code>None</code> shall <code>pass</code>!</p>
<h3>2. Can't catch me</h3>
<div class="syntax_hilite">
<div id="code-39">
<div class="code">&gt;&gt;&gt; try: raise KeyError<br />
... <span style="">except</span> ValueError, KeyError:<br />
...&nbsp; &nbsp; &nbsp;<span style="">print</span> <span style="color:#CC0000;">"Batman!"</span><br />
...<br />
<span style="">Traceback</span> <span style="color:#006600; font-weight:bold;">&#40;</span>most recent call last<span style="color:#006600; font-weight:bold;">&#41;</span>:<br />
&nbsp; File <span style="color:#CC0000;">"&lt;stdin&gt;"</span>, line <span style="color:#800000;">1</span>, in &lt;module&gt;<br />
KeyError</div>
</div>
</div>
<p>
It takes much more to summon Batman, you know.</p>
<h3>1. The truth is out there</h3>
<div class="syntax_hilite">
<div id="code-40">
<div class="code">&gt;&gt;&gt; True, False = False, True<br />
&gt;&gt;&gt; True<br />
False</div>
</div>
</div>
<p>
You had probably seen it coming.... Yeah, you certainly had.</p>
<p><small>If you know any similar quirks of Python, be sure to present them in comments!</small></p>
<img src="http://feeds.feedburner.com/~r/xion/~4/_ajxnrjWKs4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://xion.org.pl/2012/01/31/the-wat-of-python/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Python’s setdefault Considered Harmful</title>
		<link>http://xion.org.pl/2012/01/28/pythons-setdefault-considered-harmful/</link>
		<comments>http://xion.org.pl/2012/01/28/pythons-setdefault-considered-harmful/#comments</comments>
		<pubDate>Sat, 28 Jan 2012 18:26:52 +0000</pubDate>
		<dc:creator>Xion</dc:creator>
				<category><![CDATA[Computer Science & IT]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[defaultdict]]></category>
		<category><![CDATA[dictionaries]]></category>
		<category><![CDATA[memoization]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[setdefault]]></category>

		<guid isPermaLink="false">http://xion.org.pl/?p=3341</guid>
		<description><![CDATA[Python dictionaries have an inconspicuous method named setdefault. Asking for its description, we'll be presented with rather terse interpretation: &#62;&#62;&#62; help&#40;dict.setdefault&#41; setdefault&#40;...&#41; &#160; &#160; D.setdefault&#40;k&#91;,d&#93;&#41; -&#62; D.get&#40;k,d&#41;, also set D&#91;k&#93;=d if k not in D While it might not be immediately obvious what we could use this method for, we can actually find quite a [...]]]></description>
			<content:encoded><![CDATA[<p>Python dictionaries have an inconspicuous method named <code>setdefault</code>. Asking for its description, we'll be presented with rather terse interpretation:</p>
<div class="syntax_hilite">
<div id="code-52">
<div class="code">&gt;&gt;&gt; help<span style="color:#006600; font-weight:bold;">&#40;</span>dict.<span style="">setdefault</span><span style="color:#006600; font-weight:bold;">&#41;</span></p>
<p>setdefault<span style="color:#006600; font-weight:bold;">&#40;</span>...<span style="color:#006600; font-weight:bold;">&#41;</span><br />
&nbsp; &nbsp; D.<span style="">setdefault</span><span style="color:#006600; font-weight:bold;">&#40;</span>k<span style="color:#006600; font-weight:bold;">&#91;</span>,d<span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#41;</span> -&gt; <span style="">D</span>.<span style="">get</span><span style="color:#006600; font-weight:bold;">&#40;</span>k,d<span style="color:#006600; font-weight:bold;">&#41;</span>, also set D<span style="color:#006600; font-weight:bold;">&#91;</span>k<span style="color:#006600; font-weight:bold;">&#93;</span>=d if k not in D</div>
</div>
</div>
<p>
While it might not be immediately obvious what we could use this method for, we can actually find quite a lot of applications if we only pay a little attention. The main advantage of <code>setdefault</code> seems to lie in elimination of <code>if</code>s; a feat we can feel quite smug about. As an example, consider a function which groups list of key-value pairs (with possible duplicate keys) into a dictionary of lists:</p>
<div class="syntax_hilite">
<div id="python-53">
<div class="python"><span style="color: #ff7700;font-weight:bold;">def</span> group_assoc_list_<span style="color: black;">&#40;</span>a_list<span style="color: black;">&#41;</span>: <span style="color: #808080; font-style: italic;"># with if</span><br />
&nbsp; &nbsp; res = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">for</span> k, v <span style="color: #ff7700;font-weight:bold;">in</span> a_list:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> k <span style="color: #ff7700;font-weight:bold;">in</span> res:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; res<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>v<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; res<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span> = <span style="color: black;">&#91;</span>v<span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> res</p>
<p><span style="color: #ff7700;font-weight:bold;">def</span> group_assoc_list<span style="color: black;">&#40;</span>a_list<span style="color: black;">&#41;</span>: <span style="color: #808080; font-style: italic;"># with setdefault</span><br />
&nbsp; &nbsp; res = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">for</span> k, v <span style="color: #ff7700;font-weight:bold;">in</span> a_list:<br />
&nbsp; &nbsp; &nbsp; &nbsp; res.<span style="color: black;">setdefault</span><span style="color: black;">&#40;</span>k, <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>v<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> res</div>
</div>
</div>
<p>
This is something which we could do when parsing query string of an URL, if there weren't a readily available API for that:</p>
<div class="syntax_hilite">
<div id="python-54">
<div class="python"><span style="color: #ff7700;font-weight:bold;">def</span> parse_query_string<span style="color: black;">&#40;</span>qs<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#91;</span>arg.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'='</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> arg <span style="color: #ff7700;font-weight:bold;">in</span> qs.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'&amp;'</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span></p>
<p>&gt;&gt;&gt; group_assoc_list<span style="color: black;">&#40;</span>parse_query_string<span style="color: black;">&#40;</span><span style="color: #483d8b;">'a=1&amp;b=2&amp;c=3&amp;b=4&amp;a=5'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
<span style="color: black;">&#123;</span><span style="color: #483d8b;">'a'</span>: <span style="color: black;">&#91;</span><span style="color: #483d8b;">'1'</span>, <span style="color: #483d8b;">'5'</span><span style="color: black;">&#93;</span>, <span style="color: #483d8b;">'b'</span>: <span style="color: black;">&#91;</span><span style="color: #483d8b;">'2'</span>, <span style="color: #483d8b;">'4'</span><span style="color: black;">&#93;</span>, <span style="color: #483d8b;">'c'</span>: <span style="color: black;">&#91;</span><span style="color: #483d8b;">'3'</span><span style="color: black;">&#93;</span><span style="color: black;">&#125;</span></div>
</div>
</div>
<p>
So, with <code>setdefault</code> we can get the same job done more succinctly. It would seem that it is a nifty little function, and something we should keep in mind as part of our toolbox. Let's remember it and move on, shall we?...</p>
<p>Actually, no. <code>setdefault</code> is really <strong>not</strong> a good piece of <code>dict</code>'s interface, and the main reason we should remember about it is mostly for its caveats. Indeed, there are quite a few of them, enough to shrink the space of possible applications to rather tiny size. As a result, we should be cautious whenever we see (or write) <code>setdefault</code> in our own code.</p>
<p>Here's why.<br />
<span id="more-3341"></span></p>
<h3>This is not the setter you are looking for</h3>
<p>A small and obvious, yet important detail: the name of <code>setdefault</code> begins with <code>set</code>. To an untrained eye, it is therefore evident that the method is a kind of <em>setter</em>, and that we can expect a certain behavior to occur if we choose to use it. Specifically, we would assume the method will actually <em>set</em> something, not happily ignore everything we pass to it and do nothing.</p>
<p>But this is <em>exactly</em> what <code>setdefault</code> does, in vast majority of cases! For any pair of arguments <code>k</code> and <code>v</code>, the call:</p>
<div class="syntax_hilite">
<div id="python-55">
<div class="python">d.<span style="color: black;">setdefault</span><span style="color: black;">&#40;</span>k, v<span style="color: black;">&#41;</span></div>
</div>
</div>
<p>
is almost guaranteed to not do anything. It's only the very first invocation for given <code>k</code> that <em>may</em> actually have any effect - unless, of course, the <code>k</code> is already present in <code>d</code>. Provided we do not remove it, any subsequent call will not really set anything, and this is hardly a behavior we would expect from a <code>set<em>X</em></code> method.</p>
<p>What should <code>setdefault</code> do, then? Given its name, a more reasonable functionality could revolve around the concept of "default value" for given key, independent from its <em>actual</em> value. It would be returned by <code>[]</code> operator if <code>k</code> were not in the dictionary, instead of <code>KeyError</code> being raised. And any call to <code>setdefault(k, ...)</code> would <em>overwrite</em> the <code>k</code>'s default value, whether or not it currently has an actual value. This way, <code>setdefault</code> would deliver on its promise of behaving like a setter.<br />
Of course, this whole notion of "default value" is mighty confusing and doesn't seem useful at all. Even if it was, it could be easily emulated by two <code>dict</code>s working together. In fact, this would be a preferred way of handling such arcane use case; there is simply no room for it in standard library.</p>
<h3>Are we really getting it?</h3>
<p>As described above, <code>setdefault</code> looks like a deeply flawed version of something which strives to be a setter. But at the same time, it is almost equally eager to pretend to be a <em>getter</em>, for it is often its return value that we are really interested in.</p>
<p>Take another look at the following line in <code>group_assoc_list</code> function:</p>
<div class="syntax_hilite">
<div id="python-56">
<div class="python">res.<span style="color: black;">setdefault</span><span style="color: black;">&#40;</span>k, <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>v<span style="color: black;">&#41;</span></div>
</div>
</div>
<p>
Notice how we have a <code>set<em>X</em></code> function which does not only <em>return some value</em>, but definitely a non-trivial one: list! (We are <code>append</code>ing to it, after all.) It is unclear where this complex value really came from, unless we are aware that <code>setdefault</code> is actually a variant of <code>dict.get</code> method. In fact, it has exactly the same interface as <code>get</code> - including the return value, which is either an object stored under given key, or a specific default value. Yes, you've heard that correctly: on the surface, <code><strong>set</strong>default</code> is really the same as <strong><code>get</code></strong>! The only difference is in side effects, which the latter does not produce.</p>
<p>Granted, in <code>group_assoc_list</code> we are vitally interested in those side effects. After all, they build up our dictionary of lists: the very structure this function is about. This might not always be the case, though. Quite often, we will be concerned only with value for particular key, dictionary itself being just incidental.<br />
And this brings us to the next point.</p>
<h3>Memoize this!</h3>
<p>There is this simple yet powerful - and extremely general - optimization technique, called <strong>memoization</strong>. The idea behind it is to introduce a layer of "persistent caching" in front of computationally expensive operation. As a result, such operation is performed only once for particular set of input data, with result saved into cache for cheap future access. When we ask for it, the cache is always polled first; if (and only if) result is not found there, it is computed from scratch and then cached (memoized).</p>
<p>What it has to do with dictionaries and <code>setdefault</code>, though?... Well, the common way for implementing memoization is nothing else but a dictionary. Suppose that we have a following expensive function (unary one, for the sake of simplicity):</p>
<div class="syntax_hilite">
<div id="python-57">
<div class="python"><span style="color: #ff7700;font-weight:bold;">def</span> computation<span style="color: black;">&#40;</span>arg<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;"># (expensive calculations here) </span></div>
</div>
</div>
<p>
It looks like an easy task to slap a memoization facility over it. And if we indeed use a dictionary, this also seems like a great application of <code>setdefault</code>, for it does the exact operation we want to perform on our cache: check value for given key, and optionally store one. Overall, it could look a lot like this:</p>
<div class="syntax_hilite">
<div id="python-58">
<div class="python">_cache = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span></p>
<p><span style="color: #ff7700;font-weight:bold;">def</span> computation<span style="color: black;">&#40;</span>arg<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> _cache.<span style="color: black;">setdefault</span><span style="color: black;">&#40;</span>arg, _computation<span style="color: black;">&#40;</span>arg<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></p>
<p><span style="color: #ff7700;font-weight:bold;">def</span> _computation<span style="color: black;">&#40;</span>arg<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;"># (expensive calculation here) </span></div>
</div>
</div>
<p>
Hopefully we have profiled our code before and after this supposed optimization, because what we are about to find is <em>no performance gain whatsoever</em>! This is easily explicable by the fact that Python is not a <strong>lazy</strong> language, as it requires all function arguments to be evaluated prior to the call. What it means is that expesive <code>_computation(arg)</code> has to be calculated <em>always</em>, whether or not <code>arg</code> is present in the <code>_cache</code>. It is only afterwards when <code>setdefault</code> performs its check, and that's already too late for it to have any positive impact on performance.</p>
<p>So despite a seemingly perfect interface, <code>setdefault</code> is actually ill-suited for memoization or caching. Changing that would require it to be promoted into full-fledged language mechanism, similar to <code>if-else</code> ternary expressions. This is both extremely unlikely and totally unnecessary, given the narrow scope of resulting construct. As for caching, there are plenty of (correct) alternatives, such as the following:</p>
<div class="syntax_hilite">
<div id="python-59">
<div class="python"><span style="color: #ff7700;font-weight:bold;">def</span> computation<span style="color: black;">&#40;</span>arg<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">try</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> _cache<span style="color: black;">&#91;</span>arg<span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">KeyError</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; _cache<span style="color: black;">&#91;</span>arg<span style="color: black;">&#93;</span> = val = _computation<span style="color: black;">&#40;</span>arg<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> val</div>
</div>
</div>
<p></p>
<h3>The last stand</h3>
<p>So we have ruled out caching, but what about our original example? While <code>setdefault</code> may indeed suffer from very poor naming, there is little doubt that in <code>group_assoc_list</code> it is the best tool for the job. Or even more generally, <code>setdefault</code> still looks very useful if we want to make sure that specific keys exist in the dictionary - as in this code:</p>
<div class="syntax_hilite">
<div id="python-60">
<div class="python">DEFAULT_CONFIG = <span style="color: black;">&#123;</span> ... <span style="color: black;">&#125;</span></p>
<p><span style="color: #ff7700;font-weight:bold;">def</span> read_config<span style="color: black;">&#40;</span>config_file<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; cfg = read_config_from_file<span style="color: black;">&#40;</span>config_file<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">for</span> k, v <span style="color: #ff7700;font-weight:bold;">in</span> DEFAULT_CONFIG.<span style="color: black;">iteritems</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; cfg.<span style="color: black;">setdefault</span><span style="color: black;">&#40;</span>k, v<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> cfg</div>
</div>
</div>
<p>
But often it's just a matter of looking at things from different perspective; we can easily stumble on alternative (and arguably better) phrasing:</p>
<div class="syntax_hilite">
<div id="python-61">
<div class="python"><span style="color: #ff7700;font-weight:bold;">def</span> read_config<span style="color: black;">&#40;</span>config_file<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; cfg = DEFAULT_CONFIG.<span style="color: #dc143c;">copy</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; cfg.<span style="color: black;">update</span><span style="color: black;">&#40;</span>read_config_from_file<span style="color: black;">&#40;</span>config_file<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> cfg</div>
</div>
</div>
<p>
Finally, even a seemingly ideal solution might be superseded by a more tailored one. Starting from Python 2.5, we have access to the <a href="http://docs.python.org/library/collections.html#collections.defaultdict"><code>collections.defaultdict</code> utility class</a> - one that readily provides dictionary with defaults. If we use it instead of the ordinary <code>dict</code>, we can pretend like it had all the values already inserted; <code>defaultdict</code> will just create them on the fly.<br />
Here's how our <code>group_assoc_list</code> function benefits from using this class:</p>
<div class="syntax_hilite">
<div id="python-62">
<div class="python"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">collections</span> <span style="color: #ff7700;font-weight:bold;">import</span> defaultdict<br />
<span style="color: #ff7700;font-weight:bold;">def</span> group_assoc_list<span style="color: black;">&#40;</span>a_list<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; res = defaultdict<span style="color: black;">&#40;</span><span style="color: #008000;">list</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">for</span> k, v <span style="color: #ff7700;font-weight:bold;">in</span> a_list:<br />
&nbsp; &nbsp; &nbsp; &nbsp; res<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>v<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> res</div>
</div>
</div>
<p>
Note that <code>defaultdict</code> is still a pretty much <code>dict</code>, so users of above function are unlikely to be surprised by unfamiliar interface.</p>
<h3>Final words: be careful</h3>
<p>Phew, that was a long writeup, and a summary is in order.</p>
<p>As I've shown, the <code>setdefault</code> method is full of surprises and inconsistencies, causing noticeable confusion. Yet it often <a href="https://github.com/search?langOverride=&#038;language=Python&#038;q=%22setdefault%28%22">creeps up</a> in code written by experienced pythonistas. Sometimes the reasons for it <a href="http://stackoverflow.com/a/3483890/434799">look legitimate</a>, even in face of everything I've said here.<br />
But my statement from the beginning still stands. If you encounter <code>setdefault</code>, raise your alertness - it is might be indirect indicator of <a href="http://en.wikipedia.org/wiki/Code_smell">code smell</a>. If you find yourself <em>using</em> <code>setdefault</code>, consider thrice whether it's really the best solution. And be especially mindful of the erroneous caching pattern that I gave a lengthy description of.</p>
<img src="http://feeds.feedburner.com/~r/xion/~4/CGSHKPQ4Ow4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://xion.org.pl/2012/01/28/pythons-setdefault-considered-harmful/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How IT Failed to Make People Care</title>
		<link>http://xion.org.pl/2012/01/24/how-it-failed-to-make-people-care/</link>
		<comments>http://xion.org.pl/2012/01/24/how-it-failed-to-make-people-care/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 23:28:23 +0000</pubDate>
		<dc:creator>Xion</dc:creator>
				<category><![CDATA[Computer Science & IT]]></category>
		<category><![CDATA[Culture]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[ACTA]]></category>
		<category><![CDATA[intellectual property]]></category>
		<category><![CDATA[PIPA]]></category>
		<category><![CDATA[piracy]]></category>
		<category><![CDATA[society]]></category>
		<category><![CDATA[SOPA]]></category>

		<guid isPermaLink="false">http://xion.org.pl/?p=3333</guid>
		<description><![CDATA[Unless you were living under an enormous, media-impervious rock for at least few weeks, you must have heard of the recent ruckus about laws concerned with the so-called intellectual property. It was mostly due to infamous SOPA and PIPA bills that were to pass in the US, and the temporary "blackout" of various big websites [...]]]></description>
			<content:encoded><![CDATA[<p>Unless you were living under an enormous, media-impervious rock for at least few weeks, you must have heard of the recent ruckus about laws concerned with the so-called intellectual property. It was mostly due to infamous SOPA and PIPA bills that were to pass in the US, and the temporary "blackout" of various big websites whose owners voiced their protest against those bills. But we also have something of a <a href="http://ireport.cnn.com/docs/DOC-734863">local spin-off</a>, relevant to the EU and specifically Poland: an outcry against ACTA, a similar albeit slightly less ridiculous piece of law, which (as of this writing) is due to be signed in a few days.</p>
<p>Even though we do not really know yet how those issues are going to be resolved, there are already few important lessons to be learned here. And I think that the biggest one is not actually concerned with the merit of laws in question. Instead, it draws attention to the problem of <strong>communication</strong> between IT industry and general public.</p>
<h3>It's no longer OK to be ignorant of the Internet</h3>
<p>It never was, really. If anything, recent events served as a good wake-up call, reminding that the Internet and all related technological infrastructure is something that we take for granted way too often. While I might exaggerate a bit, I don't think it's very far-fetched to say that for average person, the Internet is pretty much <em>magic</em>. You pop up "the Internet app", type whatever you are looking for, and few moments later, voilà: you just got it, by the sheer magic of intertubes. Assuming, of course, that there is actually something specific you have in mind; otherwise, you can always look at what your friends have "shared", and from there start your clicky journey through the nether. It's awesome, virtually boundless, and it just works... right?</p>
<p>So far, consequences of such technical ignorance were also mostly technical, surfacing as security issues, loss of data, malware spread and so on. But that's alright! We're dabbling into arcane and invoking supernatural powers, so it's no wonder we sometimes accidentally summon few annoying daemons. Should that happen, we can always call an exorcist in a form of friendly geek-next-door, or (at worst) tech support.</p>
<p>Now, however, failure to grasp the fundamental nature of the Internet can result in much more dramatic backlash. See, this technical stuff underneath is not just a plumbing that can be safely ignored. The foundational, idealistic principles of the 'net - decentralization, freedom, knowledge, progress, innovation, flexibility, and so on - are woven into its very fabric. It is by exposure to those "boring details" that those ideas may influence and reshape minds, helping to do away with flawed and outdated notions - including, for example, the 19th century's concept of intellectual property. I cannot even fathom how someone with reasonable understanding of how software, Internet and IT work could conceive something as outrageous as those infamous IP laws. <em>It just doesn't compute</em>.</p>
<h3>It is no longer OK to be ignorant of the society</h3>
<p>Yet it was conceived, put into words, formed into a legal document and officially proposed - more than once, in fact. Obviously, such things don't happen by itself, especially when powerful interest groups are at play. And that's precisely the reason why we have various public institutions, from parliaments to international organizations: to weed out blatantly bad solutions, and sometimes even let the good ones pass through.</p>
<p>At least, that's the theory. It's naive to postulate virtues among politicians (i.e. those willingly aspiring for power), so we make them cling to one value they'll always embrace, for it is needed to maintain a ruling position: <em>popularity</em>. This should roughly translate to caring for the same things the voters care for. Roughly, because there will be always some disconnection due to effects of scale, perception, biases and multitudes of other factors.</p>
<p>Still, this is pretty much how it works. In order to push an agenda we are vitally interested in (or obstruct one we are strongly against), we need to gain enough <em>publicity</em>. We need to make people share our values and consider as utility those things that we consider as such. That's how we set the appropriate casual chain in motion, eventually leading to fulfillment of our goals.</p>
<p>And this is where the IT industry has failed miserably. It's the reason why we're frantically looking for support, rolling out our biggest cannons, hitting the news with blackouts of absolutely crucial Internet services and DDoS attacks on government sites. We have failed in making the society <em>share our values</em> in elegant, gradual and systematic manner, so now we need to condone a shock therapy to compensate for this negligence. Actually, in some cases we have been actively making things <em>worse</em>, professing the exact opposites of those values mentioned before: centralization, limitation, monocultures, stagnation and rehashing of old concepts.</p>
<p>Now we are just reaping what we have sown.</p>
<img src="http://feeds.feedburner.com/~r/xion/~4/zWukKueETXg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://xion.org.pl/2012/01/24/how-it-failed-to-make-people-care/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On Clever Usage of Python ‘with’ Clauses</title>
		<link>http://xion.org.pl/2012/01/21/on-clever-usage-of-python-with-clauses/</link>
		<comments>http://xion.org.pl/2012/01/21/on-clever-usage-of-python-with-clauses/#comments</comments>
		<pubDate>Sat, 21 Jan 2012 19:21:25 +0000</pubDate>
		<dc:creator>Xion</dc:creator>
				<category><![CDATA[Computer Science & IT]]></category>
		<category><![CDATA[control structures]]></category>
		<category><![CDATA[Google App Engine]]></category>
		<category><![CDATA[goto]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[resources]]></category>
		<category><![CDATA[with]]></category>

		<guid isPermaLink="false">http://xion.org.pl/?p=3323</guid>
		<description><![CDATA[As the Python docs state, with statement is used to encapsulate common patterns of try-except-finally constructs. It is most often associated with managing external resources in order to ensure that they are properly freed, released, closed, disconnected from, or otherwise cleaned up. While at times it is sufficient to just write the finally block directly, [...]]]></description>
			<content:encoded><![CDATA[<p>As the Python docs <a href="http://docs.python.org/reference/compound_stmts.html#the-with-statement">state</a>, <code>with</code> statement is used to encapsulate common patterns of <code>try</code>-<code>except</code>-<code>finally</code> constructs. It is most often associated with managing external <em>resources</em> in order to ensure that they are properly freed, released, closed, disconnected from, or otherwise cleaned up. While at times it is sufficient to just write the <code>finally</code> block directly, repeated occurrences ask for using this language goodness more consciously, including writing our own <em>context managers</em> for specialized needs.</p>
<p>Those managers - basically a <code>with</code>-enabled, helper objects - are strikingly similar to small local objects involved in the RAII technique from C++. The acronym expands to <em>Resource Acquisition Is Initialization</em>, further emphasizing the resource management part of this pattern. But let us not be constrained by that notion: the usage space of <code>with</code> is much wider.<br />
<span id="more-3323"></span></p>
<h3>Interlude on basics</h3>
<p>To create our own context manager, we can implement a class with <code>__enter__</code> and <code>__exit__</code> methods. The former is called when <code>with</code> block is about to be entered, which is typically associated with obtaining an external resource: file handle, database connection, native memory pool, and so on.</p>
<p>Similarly, once we are about to leave the <code>with</code> block, the <code>__exit__</code> method will be invoked. If it was an exception that has kicked us out, relevant information will be passed to <code>__exit__</code> as its arguments; we'll be able to inspect them if necessary.</p>
<p>More details on both of these methods can be of course found in the <a href="http://docs.python.org/reference/datamodel.html#object.__enter__">documentation</a>, which I encourage to peek at.</p>
<h3>Universal cleanup</h3>
<p>It is perfectly valid for either of those methods to be a no-op, although there is seldom a point in having only <code>__enter__</code>, Context manager with just <code>__exit__</code>, on the other hand, can be quite useful for executing some finishing action that absolutely <strong>must</strong> be done. If we also have to account for possible errors, custom <code>with</code> is good alternative to doubly nested <code>try</code> statements that would have to be used instead.</p>
<p>Let's look at a practical example. On App Engine, there is a <a href="http://code.google.com/intl/pl/appengine/docs/python/blobstore/">Blobstore</a> service which allows to upload &#038; store big files that are infeasible to keep as text or binary data in database. When it comes to uploading them, most of the relevant logic is already present, so we only have to implement the <code>BlobstoreUploadHandler</code>:</p>
<div class="syntax_hilite">
<div id="python-70">
<div class="python"><span style="color: #ff7700;font-weight:bold;">from</span> google.<span style="color: black;">appengine</span>.<span style="color: black;">ext</span>.<span style="color: black;">webapp</span> <span style="color: #ff7700;font-weight:bold;">import</span> blobstore_handlers</p>
<p><span style="color: #ff7700;font-weight:bold;">class</span> UploadHandler<span style="color: black;">&#40;</span>blobstore_handlers.<span style="color: black;">BlobstoreUploadHandler</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> post<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; upload = <span style="color: #008000;">self</span>.<span style="color: black;">get_uploads</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;"># ...do something with uploaded file...</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">self</span>.<span style="color: black;">redirect</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'/upload_complete?status=ok'</span><span style="color: black;">&#41;</span></div>
</div>
</div>
<p>
The required part here is performing the HTTP <code>redirect</code> at the end, <em>regardless</em> of whether we successfully did something with the uploaded blob (file) or not. But if "doing something" fails, we would like to delete the blob; otherwise, it will just needlessly take space.</p>
<p>Such behavior can be rather easily contained within a specialized context manager with just the <code>__exit__</code> function:</p>
<div class="syntax_hilite">
<div id="python-71">
<div class="python"><span style="color: #ff7700;font-weight:bold;">class</span> blobstore_guard<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, handler<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">self</span>.<span style="color: black;">handler</span> = handler<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> __exit__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, exc_type, exc_value, <span style="color: #dc143c;">traceback</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> exc_type:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">for</span> _, u <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">self</span>.<span style="color: black;">handler</span>.<span style="color: black;">get_uploads</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; u.<span style="color: black;">delete</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; status = <span style="color: #483d8b;">'error'</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; status = <span style="color: #483d8b;">'ok'</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">self</span>.<span style="color: black;">handler</span>.<span style="color: black;">redirect</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'/upload_complete?status='</span> + status<span style="color: black;">&#41;</span></div>
</div>
</div>
<p>
If we now put the upload handler's code inside a <code>with</code> statement with above guard, we will always have a redirect issued, and our blobs will be cleaned up properly in case of error:</p>
<div class="syntax_hilite">
<div id="python-72">
<div class="python"><span style="color: #ff7700;font-weight:bold;">class</span> UploadHandler<span style="color: black;">&#40;</span>blobstore_handlers.<span style="color: black;">BlobstoreUploadHandler</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> post<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; with blobstore_guard<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; upload = <span style="color: #008000;">self</span>.<span style="color: black;">get_uploads</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;"># ... </span></div>
</div>
</div>
<p></p>
<h3>The great escape (from <code>with</code> block)</h3>
<p>Typical uses of <code>with</code> take advantage of its <code>as</code> part, which is almost always used to capture some kind of resource handle. A <code>file</code> object is a canonical example of such practice:</p>
<div class="syntax_hilite">
<div id="python-73">
<div class="python">with <span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'foo.txt'</span>, <span style="color: #483d8b;">'r'</span><span style="color: black;">&#41;</span> as f:<br />
&nbsp; &nbsp; content = f.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></div>
</div>
</div>
<p>
If we want to take advantage of this behavior for our own context managers, we shall return a result from the <code>__enter__</code> method. It will then be given a name (specified after <code>as</code>), and we'll be able to use it within the <code>with</code> block, as shown above.</p>
<p>But referencing a resource is by no means the only legitimate use for name declared via <code>as</code>. It can actually point to <em>any</em> object, no matter what it is or does.<br />
To illustrate this point, I have came up with rather devious semantical construct which I dubbed a "section". In some way it's the closest thing to <code>goto</code> that we can have in Python without resorting to dirty bytecode hacks. It doesn't support actual jumps (i.e. moving instruction pointer into arbitrary place), but allows to prematurely "bail" out of the <code>with</code> scope, if desired. Hence, it is analogous to the <code>break <em>label</em></code> mechanism from Java, while also serving similar purpose to infamous <a href="http://blog.slickedit.com/2007/05/do-while-false/"><code>do-while(false)</code> construct</a>:</p>
<div class="syntax_hilite">
<div id="python-74">
<div class="python">with section<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> as s:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">xrange</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">for</span> j <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">xrange</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> i * j&gt; <span style="color: #ff4500;">20</span>: s.<span style="color: black;">bail</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></p>
<p><span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">"This will be executed after bail()"</span></div>
</div>
</div>
<p>
Of course, there aren't many uses for such a thing, and I would prefer <em>not</em> to maintain code that actually employs this trick. But it's entirely sufficient as an example - albeit somewhat silly. Here's how it could be implemented:</p>
<div class="syntax_hilite">
<div id="python-75">
<div class="python"><span style="color: #ff7700;font-weight:bold;">class</span> section<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">class</span> Bail<span style="color: black;">&#40;</span><span style="color: #008000;">Exception</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, s<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">self</span>.<span style="color: black;">section</span> = s<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> __enter__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> __exit__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, exc_type, exc_value, <span style="color: #dc143c;">traceback</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> exc_type == Bail <span style="color: #ff7700;font-weight:bold;">and</span> exc_value.<span style="color: black;">section</span> == <span style="color: #008000;">self</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> bail<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">raise</span> Bail<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span></div>
</div>
</div>
<p>
As you can see, we're using a special exception as means for "bailing out" of the section. The interesting bits happen mostly inside <code>__exit__</code>, where we examine the type of exception (if any) and the section it came from (to account for possible nesting). We are then returning a boolean value, indicating whether the exception should be squelched.<br />
The <code>__enter__</code> function is also notable, since it's quite common to just <code>return self</code> from it. This way, the context manager itself becomes the <code>as</code>-object. Not accidentally, this is also the case with the <code>file</code> type, which is what allows for <code>open</code> function to be used both normally and in <code>with</code>-<code>as</code> clause. For <code>section</code>, such behavior is not required, but <code>__enter__</code> returning <code>self</code> is simply the most straightforward solution.</p>
<h3>More ideas</h3>
<p>Looking at slightly bigger picture, we can see that <code>with</code> statements create something which might be called a <em>programmable scope</em>. Thanks to <code>__enter__</code> and <code>__exit__</code> methods, we are able to capture the events of going in and out of these blocks, and act upon them.</p>
<p>I suspect that when used cleverly, this may result in some pretty powerful and impressive programming techniques, not really related to any sort of resource management. I tried to show a glimpse, but I believe there are many more fruits to be picked from that tree - including:</p>
<ul>
<li><strong>Intrinsic profiling</strong>. For one, it wouldn't be hard to implement something like a frame rate counter for logic contained in a <code>with</code> block. This could be further expanded into measurements of varying granularity, thanks to <code>with</code> blocks' nesting.</li>
<li><strong>Parsing</strong>. Done by hand, parsing involves rather cumbersome state management. We have to save current position within parsed text, and restore it should we need to try a different variant. This looks very much like a pattern to be encapsulated in specialized <code>with</code> blocks. If done correctly, it could result in nice and neat API that wouldn't hide the control flow of parsing process - unlike most (all?) libraries that rely on explicit specification of grammar.</li>
<li><strong>Testing utilities</strong>. As one almost obvious example, <code>with</code> could be used to check whether test code throws an expected exception. It would be much more convenient (and readable) than <code>assertRaises</code> method from standard <code>unittest</code> package.</li>
<li><strong>"Important" assignments</strong>. As purely syntactic device, we could use <code>with</code> to put emphasis on certain notable <em>assignments</em> in our code. This is vaguely similar to <code>let</code>-<code>in</code> or <code>where</code> clauses from functional languages, but of course it doesn't have any purity implications. The downside: we would often need to use a "fake" context manager if object being assigned isn't already one:
<div class="syntax_hilite">
<div id="python-76">
<div class="python"><span style="color: #ff7700;font-weight:bold;">class</span> var<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, obj<span style="color: black;">&#41;</span>: <span style="color: #008000;">self</span>.<span style="color: black;">obj</span> = obj<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> __enter__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">obj</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> __exit__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, et, ev, t<span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">pass</span></div>
</div>
</div>
<p></li>
</ul>
<p>This list is of course nowhere near complete. I encourage to look at the <code>with</code> statement in similarly unorthodox way, for there are likely many semantic gems (or at least polished rocks ;]) to be uncovered here.</p>
<img src="http://feeds.feedburner.com/~r/xion/~4/7X1ItB68IMA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://xion.org.pl/2012/01/21/on-clever-usage-of-python-with-clauses/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Self-Replacing Script Blocks for Dynamic Lists</title>
		<link>http://xion.org.pl/2012/01/17/self-replacing-script-blocks-for-dynamic-lists/</link>
		<comments>http://xion.org.pl/2012/01/17/self-replacing-script-blocks-for-dynamic-lists/#comments</comments>
		<pubDate>Tue, 17 Jan 2012 07:52:19 +0000</pubDate>
		<dc:creator>Xion</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[AJAX]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Jinja]]></category>
		<category><![CDATA[jQuery]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[templates]]></category>

		<guid isPermaLink="false">http://xion.org.pl/?p=3316</guid>
		<description><![CDATA[On contemporary websites and web applications, it is extremely common task to display a list of items on page. In any reasonable framework and/or templating engine, this can be accomplished rather trivially. Here's how it could look like in Jinja or Django templates: &#60;ul&#62; &#123;% for item in items %&#125; &#160; &#160; &#60;li&#62;&#123;&#123; item &#125;&#125;&#60;/li&#62; [...]]]></description>
			<content:encoded><![CDATA[<p>On contemporary websites and web applications, it is extremely common task to display a list of items on page. In any reasonable framework and/or templating engine, this can be accomplished rather trivially. Here's how it could look like in Jinja or Django templates:</p>
<div class="syntax_hilite">
<div id="code-82">
<div class="code">&lt;ul&gt;<br />
<span style="color:#006600; font-weight:bold;">&#123;</span>% for item in items %<span style="color:#006600; font-weight:bold;">&#125;</span><br />
&nbsp; &nbsp; &lt;li&gt;<span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">&#123;</span> item <span style="color:#006600; font-weight:bold;">&#125;</span><span style="color:#006600; font-weight:bold;">&#125;</span>&lt;/li&gt;<br />
<span style="color:#006600; font-weight:bold;">&#123;</span>% endfor %<span style="color:#006600; font-weight:bold;">&#125;</span><br />
&lt;/ul&gt;</div>
</div>
</div>
<p>
But it's usually not too long before our list grows big and it becomes unfeasible to render it all on the server. Or maybe, albeit less likely, we want it to be updated in real-time, without having to reload the whole page. Either case requires to incorporate some JavaScript code, talking to the server and obtaining next portion of data whenever it's needed.</p>
<p>Obviously, that data has to be rendered as well, and there is one option of doing it on the server side, serving actual HTML directly to JS. An arguably better solution is to respond with JSON or similar representation of our items. This is conceptually simpler, feels less messy and is potentially reusable as a part of website's external API. There is just one drawback: it forces rendering to be done also in JavaScript.<br />
<span id="more-3316"></span><br />
While it isn't a hindrance by itself (there are many great tools to help, including <a href="https://github.com/janl/mustache.js">Mustache</a> or - sadly discontinued - <a href="http://api.jquery.com/category/plugins/templates/">jQuery Templates</a>), it raises an issue of code duplication. Ideally, we would still want some initial data to be displayed just when the page itself is loaded. And at first, this seems as to require <em>both</em> server-side <em>and</em> client-side rendering functions... I hope I do not really have to explain why it would be a mighty bad idea to create, use and (try to) maintain them.</p>
<h3>Client is king</h3>
<p>That's why we would opt for rendering exclusively in the browser. Fortunately, it is not only possible, but also quite easy if we already have the logic for handling AJAX calls. Using it, we can prepare initial items in JSON format and put them directly inside some <code>&lt;script></code> block:</p>
<div class="syntax_hilite">
<div id="javascript-83">
<div class="javascript">&lt;script type=<span style="color: #3366CC;">"text/javascript"</span>&gt;<br />
&nbsp; &nbsp; $<span style="color: #66cc66;">&#40;</span>document<span style="color: #66cc66;">&#41;</span>.<span style="color: #006600;">ready</span><span style="color: #66cc66;">&#40;</span><span style="color: #003366; font-weight: bold;">function</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #003366; font-weight: bold;">var</span> items = <span style="color: #66cc66;">&#123;</span><span style="color: #66cc66;">&#123;</span> items|json <span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#125;</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; renderItems<span style="color: #66cc66;">&#40;</span>$<span style="color: #66cc66;">&#40;</span><span style="color: #3366CC;">'#items'</span><span style="color: #66cc66;">&#41;</span>, items<span style="color: #66cc66;">&#41;</span>;<br />
&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#41;</span>;<br />
&lt;/script&gt;</div>
</div>
</div>
<p>
Here, the <code>|json</code> filter represents the before mentioned logic, while <code>renderItems</code> draws the objects-turned items as children of given element. What it does inside can vary from manipulating HTML strings directly (eww!), creating DOM objects (slightly better) or invoking some JS templating engine (best).</p>
<h3>Striving for modularity</h3>
<p>A simple solution, outlined above, looks sufficient for many practical applications. It is far from ideal, though. For one, it suffers from an acute disconnection, since the actual place where the data is displayed (<code>$('#items')</code>) can be arbitrarily far from the <code>&lt;script></code> tag. But more importantly, it scales poorly and lacks modularity: should we need to use our list on more than one site, we may need to carefully tailor several moving parts - such as the <code>'#items'</code> selector.</p>
<p>Incidentally, both of those issues can be resolved at once. Let's create a <em>self-replacing</em> <code>&lt;script></code>: quite similar to the one above, but with some interesting twists. Rather than referencing some DOM element via hard-coded selector, it shall render the items "where they stand": exactly where the very <code>&lt;script></code> is located. And just as a final touch, the script will even remove itself from the DOM tree when it's no longer needed.</p>
<h3>The solution...</h3>
<p>Sounds like a pretty clever trick, but it's not exactly complicated at all. In its entirety, it can look somewhat like this:</p>
<div class="syntax_hilite">
<div id="javascript-84">
<div class="javascript">&lt;script type=<span style="color: #3366CC;">"text/javascript"</span>&gt;<br />
&nbsp; &nbsp; <span style="color: #66cc66;">&#40;</span><span style="color: #003366; font-weight: bold;">function</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #003366; font-weight: bold;">var</span> $script = $<span style="color: #66cc66;">&#40;</span><span style="color: #3366CC;">'script'</span><span style="color: #66cc66;">&#41;</span>.<span style="color: #006600;">last</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; $<span style="color: #66cc66;">&#40;</span>document<span style="color: #66cc66;">&#41;</span>.<span style="color: #006600;">ready</span><span style="color: #66cc66;">&#40;</span><span style="color: #003366; font-weight: bold;">function</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #003366; font-weight: bold;">var</span> items = <span style="color: #66cc66;">&#123;</span><span style="color: #66cc66;">&#123;</span> items|json <span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#125;</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; renderItems<span style="color: #66cc66;">&#40;</span>$script.<span style="color: #006600;">parent</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>, items<span style="color: #66cc66;">&#41;</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $script.<span style="color: #006600;">remove</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#41;</span>;<br />
&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;<br />
&lt;/script&gt;</div>
</div>
</div>
<p>
What makes it work is the fact that <code>var $script = ...;</code> line lies <em>outside</em> of the <code>$document.ready</code> function. Thanks to that, it is executed as soon as the <code>&lt;script></code> block is parsed - before the rest of document is processed, and way before the <code>$(document).ready</code> callback is invoked.<br />
At that early time, this script is therefore the last <code>&lt;script></code> element in the DOM tree. In other words, it's exactly what <code>$('script').last()</code> will return! What we are doing in the first line is therefore nothing else, but obtaining a reference to the enclosing <code>&lt;script></code>.</p>
<p>How's that useful, though?... Well, if we have indeed dropped this <code>&lt;script></code> right where we wanted our dynamic list to display, its <code>parent()</code> would be the element to render our items in. And because we don't mind cleaning up after ourselves when done, we finish this by <code>remove()</code>-ing the script. Nice and tidy.</p>
<h3>...and how to use it</h3>
<p>What makes it a pretty powerful technique is easy and flexible integration with modern templating engines. We can almost effortlessly made this snippet into a reusable template part, such as Jinja macro:</p>
<div class="syntax_hilite">
<div id="javascript-85">
<div class="javascript"><span style="color: #66cc66;">&#123;</span>% macro render<span style="color: #66cc66;">&#40;</span>items<span style="color: #66cc66;">&#41;</span> %<span style="color: #66cc66;">&#125;</span><br />
&lt;script type=<span style="color: #3366CC;">"text/javascript"</span>&gt;<br />
&nbsp; &nbsp; <span style="color: #66cc66;">&#40;</span><span style="color: #003366; font-weight: bold;">function</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #003366; font-weight: bold;">var</span> $script = $<span style="color: #66cc66;">&#40;</span><span style="color: #3366CC;">'script'</span><span style="color: #66cc66;">&#41;</span>.<span style="color: #006600;">last</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; $<span style="color: #66cc66;">&#40;</span>document<span style="color: #66cc66;">&#41;</span>.<span style="color: #006600;">ready</span><span style="color: #66cc66;">&#40;</span><span style="color: #003366; font-weight: bold;">function</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #003366; font-weight: bold;">var</span> items = <span style="color: #66cc66;">&#123;</span><span style="color: #66cc66;">&#123;</span> items|json <span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#125;</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; renderItems<span style="color: #66cc66;">&#40;</span>$script.<span style="color: #006600;">parent</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>, items<span style="color: #66cc66;">&#41;</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $script.<span style="color: #006600;">remove</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#41;</span>;<br />
&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;<br />
&lt;/script&gt;<br />
<span style="color: #66cc66;">&#123;</span>% endmacro %<span style="color: #66cc66;">&#125;</span></div>
</div>
</div>
<p>
The final usage would be then extremely clean and simple:</p>
<div class="syntax_hilite">
<div id="html-86">
<div class="html"><span style="color: #009900;"><a href="http://december.com/html/4/element/div.html"><span style="color: #000000; font-weight: bold;">&lt;div</span></a> <span style="color: #000066;">class</span>=<span style="color: #ff0000;">"items"</span><span style="color: #000000; font-weight: bold;">&gt;</span></a></span><br />
&nbsp; &nbsp; {{ render(items) }}<br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/div&gt;</span></span></div>
</div>
</div>
<p></p>
<img src="http://feeds.feedburner.com/~r/xion/~4/tcKma1iCJTs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://xion.org.pl/2012/01/17/self-replacing-script-blocks-for-dynamic-lists/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Executors in Java</title>
		<link>http://xion.org.pl/2012/01/12/using-executors-in-java/</link>
		<comments>http://xion.org.pl/2012/01/12/using-executors-in-java/#comments</comments>
		<pubDate>Thu, 12 Jan 2012 20:17:06 +0000</pubDate>
		<dc:creator>Xion</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[executors]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[threads]]></category>

		<guid isPermaLink="false">http://xion.org.pl/?p=3309</guid>
		<description><![CDATA[When thinking about concurrent programs, we are sometimes blinded by the notion of bare threads. We create them, start them, join them, and sometimes even interrupt them, all by operating directly on those tiny little abstractions over several paths of simultaneous execution. At the same time we might be extremely reluctant to directly use synchronization [...]]]></description>
			<content:encoded><![CDATA[<p>When thinking about concurrent programs, we are sometimes blinded by the notion of bare threads. We create them, start them, join them, and sometimes even <code>interrupt</code> them, all by operating directly on those tiny little abstractions over several paths of simultaneous execution. At the same time we might be extremely reluctant to directly use synchronization primitives (semaphores, mutexes, etc.), preferring more convenient and tailored solutions - such as thread-safe containers. And this is great, because synchronization is probably the most difficult aspect of concurrent programming. Any place where we can avoid it is therefore one less place to be infested by nasty bugs it could ensue.</p>
<p>So why we still cling to somewhat low-level <code>Thread</code>s for actual execution, while having no problems with specialized solutions for concurrent data exchange and synchronization?... Well, we can be simply unaware that "getting code to execute in parallel" is also something that can benefit from safety and clarity of more targeted approach. In Java, one such approach is oh-so-object-orientedly called <strong>executors</strong>.</p>
<p>As we might expect, an <code>Executor</code> is something that executes, i.e. runs, code. Pieces of those code are given it in a form of <code>Runnable</code>s, just like it would happen for regular <code>Thread</code>s:</p>
<div class="syntax_hilite">
<div id="java-88">
<div class="java">executor.<span style="color: #006600;">execute</span><span style="color: #66cc66;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> <a href="http://www.google.com/search?q=allinurl%3ARunnable+java.sun.com&amp;bntl=1"><span style="color: #aaaadd; font-weight: bold;">Runnable</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span><br />
&nbsp; &nbsp; @Override <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #993333;">void</span> run<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; calculatePiToDecimalPlaces<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">10000000</span><span style="color: #66cc66;">&#41;</span>;<br />
&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span><br />
<span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#41;</span>;</div>
</div>
</div>
<p>
<code>Executor</code> itself is an abstract class, so it could be used without any knowledge about queuing policy, scheduling algorithms and any other details of the way it conducts execution of tasks. While this seems feasible in some real cases - such as servicing incoming network requests - executors are useful mainly because they are quite diverse in kind. Their complex and powerful variants are also relatively easy to use.</p>
<h3>Let's play in pool</h3>
<p>Simple functions for creating different types of executors are contained within the auxiliary <a href="http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/Executors.html"><code>Executors</code></a> class. Behind the scenes, most of them have a <strong>thread pool</strong> which they pull threads from when they are needed to process tasks. This pool may be of fixed or variable size, and can reuse a thread for more than one task,</p>
<p>Depending on how much load we expect and how many threads can we afford to create, the choice is usually between <code>newCachedThreadPool</code> and <code>newFixedThreadPool</code>. There is also peculiar (but useful) <code>newSingleThreadExecutor</code>, as well as time-based <code>newScheduledThreadPool</code> and <code>newSingleThreadScheduledExecutor</code>, allowing to specify delay for our <code>Runnable</code>s by passing them to <code>schedule</code> method instead of <code>execute</code>.</p>
<h3>Swapping them</h3>
<p>There is one case where the abstract nature of base <code>Executor</code> class comes handy: testing and performance tuning. A certain types of executors can serve as good approximation of some common concurrency scenarios.</p>
<p>Suppose that we are normally handling our tasks using a pool with fixed number of threads, but we are not sure whether it's actually the most optimal number. If our tasks appear to be mostly I/O-bound, it could be good idea to increase the thread count, seeing that threads waiting for I/O operations simply lay dormant for most of the time.<br />
To see if our assumptions have grounds, and how big the increase can be, we can temporarily switch to cached thread pool. By experimenting with different levels of throughput and observing the average execution time along with numbers of threads used by application, we can get a sense of optimal number of threads for our fixed pool.<br />
Similarly, we can adjust and possibly decrease this number for tasks that appear to be mostly CPU-bound. </p>
<p>Finally, it might be also sensible to use the single-threaded executor as a sort of "sanity check" for our complicated, parallel program. What we are checking this way is both <em>correctness</em> and <em>performance</em>, in rather simple and straightforward way.<br />
For starters, our program should still compute correct results. Failing to do so serves as indication that seemingly correct behavior in multi-threaded setting may actually be an accidental side effect of unspotted hazards. In other words, threads might "align just right" if there is more than one running, and this would hide some insidious race conditions which we failed to account for.</p>
<p>As for performance, we should expect the single-thread code to run for longer time than its multi-thread variant. This is somewhat obvious observation that we might carelessly take for granted and thus never verify explicitly - and that's a mistake. Indeed, it's not unheard of to have parallelized algorithms which are actually <strong>slower</strong> than their serial counterparts. Throwing some threads is not a magic bullet, unfortunately: concurrency is still hard.</p>
<img src="http://feeds.feedburner.com/~r/xion/~4/EIlvW5gIl4M" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://xion.org.pl/2012/01/12/using-executors-in-java/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Curious Case of Letter Case</title>
		<link>http://xion.org.pl/2012/01/08/a-curious-case-of-letter-case/</link>
		<comments>http://xion.org.pl/2012/01/08/a-curious-case-of-letter-case/#comments</comments>
		<pubDate>Sun, 08 Jan 2012 17:32:33 +0000</pubDate>
		<dc:creator>Xion</dc:creator>
				<category><![CDATA[Computer Science & IT]]></category>
		<category><![CDATA[Culture]]></category>
		<category><![CDATA[languages]]></category>
		<category><![CDATA[lowercase]]></category>
		<category><![CDATA[strings]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[titlecase]]></category>
		<category><![CDATA[uppercase]]></category>

		<guid isPermaLink="false">http://xion.org.pl/?p=3300</guid>
		<description><![CDATA[An extremely common programming exercise - popping up usually as an interview question - is to write a function that turns all characters in a string into uppercase. As you may know or suspect, such task is not really about the problem stated explicitly. Instead, it's meant to probe if you can program at all, [...]]]></description>
			<content:encoded><![CDATA[<p>An extremely common programming exercise - popping up usually as an interview question - is to write a function that turns all characters in a string into uppercase. As you may know or suspect, such task is not really about the problem stated explicitly. Instead, it's meant to probe if you can program at all, and whether you remember about handling special subsets of input data. That's right: the actual problem is almost insignificant; it's all about the necessary plumbing. Without a need for it, the task becomes awfully easy, especially in certain kind of languages:</p>
<div class="syntax_hilite">
<div id="code-92">
<div class="code">toUpperCase :: <span style="">String</span> -&gt; <span style="">String</span><br />
toUpperCase s = map toUpper s</div>
</div>
</div>
<p>
This simplicity may be a cause of misconception that the whole problem of <strong>letter case</strong> is similarly trivial. Actually, I would not be surprised if the notion of having any sort of real 'problem' here is baffling to some. After all, every self-respecting language has those <code>toLowerCase</code>/<code>toUpperCase</code> functions built-in, right?...</p>
<p>Sure it has. But even assuming they work correctly, they are usually the only case-related transformations available out of the box. As it turns out, it's hardly uncommon to need something way more sophisticated.<br />
<span id="more-3300"></span></p>
<h3>How Unicode helps (sort of)</h3>
<p>Let's do a quick check first. Fire up a <abbr title="Read-Evan-Print Loop">REPL</abbr> for your favorite programming language (or editor with a compiler nearby) and try to uppercase all characters in any of the following strings:</p>
<ul>
<li>Falsches Üben von Xylophonmusik quält jeden größeren Zwerg.</li>
<li>Voix ambiguë d'un cœur qui au zéphyr préfère les jattes de kiwi.</li>
<li>Pchnąć w tę łódź jeża lub ośm skrzyń fig.</li>
<li>Экс-граф? Плюш изъят. Бьём чуждый цен хвощ!</li>
</ul>
<p>These are the <a href="http://en.wikipedia.org/wiki/List_of_pangrams">pangrams</a>: short and often nonsensical sentences which contain all letters from their respective languages. Since many of those letters are outside of the <em>a</em>-<em>z</em> range, their uppercase counterparts cannot be obtained by simply subtracting a magic number <code>32</code> from their ASCII code. Because they are scattered over much wider span of character table, this simple approach doesn't work.</p>
<p>But programming languages usually promise to handle all the complexities behind upper- and lowercasing - if not for strings, then at least for individual characters. Many of them, however, fail to deliver on that promise in their standard strings API. It's the Unicode, of course, that is supposed to deal with such issues. Using Unicode strings by default is rare among popular languages, though; Python 3.x, Java and Haskell come into mind as honorable exceptions. Others typically offer alternate string types (such as <code>std::wstring</code> in C++ or <code>unicode</code> in Python 2.x) and various encoding &#038; decoding mechanisms to cooperate with code that uses standard, ANSI-only strings.</p>
<h3>Case in point</h3>
<p>Still, we may consider this an appropriate abstraction when it comes to handling letter case of individual characters. However, it isn't nearly enough for actual <strong>texts</strong>. The issue of proper capitalization is grammatical one; and whenever grammar of natural languages is involved, we can be sure that any related computing problems will be notoriously difficult to solve.</p>
<p>I stumbled upon this very topic when <a href="http://xion.org.pl/2011/12/08/hello-there/">making the switch</a> to blogging in English. It turns out that English has quite amusing rules regarding capitalization of titles - or <strong>title case</strong> for short. Its simplest variant is to capitalize every single word:</p>
<div class="syntax_hilite">
<div id="python-93">
<div class="python"><span style="color: #ff7700;font-weight:bold;">def</span> title_case<span style="color: black;">&#40;</span>text<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; words = <span style="color: black;">&#40;</span>word.<span style="color: black;">capitalize</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> word <span style="color: #ff7700;font-weight:bold;">in</span> text.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">' '</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>words<span style="color: black;">&#41;</span></div>
</div>
</div>
<p></p>
<div class="syntax_hilite">
<div id="code-94">
<div class="code">&gt;&gt;&gt; title_case<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC0000;">"This is a title"</span><span style="color:#006600; font-weight:bold;">&#41;</span><br />
<span style="color:#CC0000;">'This Is A Title'</span></div>
</div>
</div>
<p>
<a href="http://www.searchenginejournal.com/title-capitalization-in-the-english-language/4882/">Conventions</a> which are <a href="http://www.cumbrowski.com/CarstenC/articles/20070623_Title_Capitalization_in_the_English_Language.asp">actually used</a> are much more complex: they introduce several exceptions, and exceptions to exceptions, so it all contributes to rather fine mess. For example, it is common to leave certain "small" words lowercased: prepositions (<em>for</em>, <em>by</em>, <em>on</em>, ...), articles (<em>a</em>, <em>an</em>, <em>the</em>) and conjunctions (<em>and</em>, <em>or</em>, <em>but</em>, etc.). Nevertheless, the first word is always capitalized - and depending on particular convention, the last one may be too. Additionally, if the "small" word grows big enough, it might also warrant capitalizing; an excessively long preposition such as <em>without</em> could be a good example. Finally, punctuation may also play its role, forcing any word after certain marks (such as colon) to be capitalized as well.</p>
<p>That's rather convoluted, isn't it? Setting aside any concerns about how arbitrary these rules are, we can observe that even an undisputed standard built upon them is unlikely to be easy to implement as an algorithm. Anything that requires recognizing parts of speech is almost certain to be AI-hard. Although a simple exclusion list (<code>['a', 'an', 'but', ...]</code>) works correctly for many inputs, it fails for texts containing e.g. certain phrasal verbs. So in order to properly capitalize a title, one should know a semantic context of every word in it. That's easy for humans, but not so for computers.</p>
<h3>In any case...</h3>
<p>I hope I have successfully made a point that changing letter case may be almost as far from trivial as a computing problem can get. At the very least, we should be extremely wary of any related operations on texts in foreign languages - including, for example, a case-insensitive comparison. And even for English texts, the task may border on impossible if it's something just slightly more complicated than straightforward substitution of characters.</p>
<p>For more information about the phenomenon of letter case, it's probably best to consult the indispensable <a href="http://en.wikipedia.org/wiki/Letter_case">Wikipedia</a>.</p>
<img src="http://feeds.feedburner.com/~r/xion/~4/ONPSum8J3CE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://xion.org.pl/2012/01/08/a-curious-case-of-letter-case/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

