<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:georss="http://www.georss.org/georss" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"><channel><atom:id>tag:blogger.com,1999:blog-18359540</atom:id><lastBuildDate>Fri, 13 Nov 2009 01:58:14 +0000</lastBuildDate><title>Exploring Computation</title><description>A (deprecated) blog on the art, science, software and hardware of computation.</description><link>http://phunday.blogspot.com/</link><managingEditor>pramod.sub@gmail.com (Pramod)</managingEditor><generator>Blogger</generator><openSearch:totalResults>30</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/ExploringComputation" type="application/rss+xml" /><feedburner:browserFriendly></feedburner:browserFriendly><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-3275163031081481343</guid><pubDate>Thu, 22 May 2008 17:49:00 +0000</pubDate><atom:updated>2008-05-22T10:51:28.274-07:00</atom:updated><title>Alex Papadimoulis on the Mechanics of Quitting</title><description>&lt;a target="_blank" href="http://fundae.wordpress.com/2008/05/17/alex-papadimoulis-on-the-mechanics-of-quitting/"&gt;Head over to WordPress&lt;/a&gt; for the full story.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-3275163031081481343?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/05/alex-papadimoulis-on-mechanics-of.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-839603234841114203</guid><pubDate>Wed, 14 May 2008 18:11:00 +0000</pubDate><atom:updated>2008-05-14T11:18:33.838-07:00</atom:updated><title>Functional Programing 101: List Comprehensions and Closures</title><description>For a bunch of reasons &lt;a target="_blank" href="http://fundae.wordpress.com/2008/05/11/why-wordpress/"&gt;I've detailed here&lt;/a&gt;, I decided to shift my blog to WordPress. But don't worry, in order to serve you better, I will make sure I post a link here every time I make a new entry over there. :-)&lt;br /&gt;&lt;br /&gt;Anyway, the functional programming series continues and this time we will look at list comprehensions and a simple closure. &lt;a target="_blank" href="http://fundae.wordpress.com/2008/05/14/functional-programming-101-lists-and-closures/"&gt;Head over to WordPress for the action&lt;/a&gt;!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-839603234841114203?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/05/functional-programing-101-list.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-8504668076071395581</guid><pubDate>Mon, 05 May 2008 14:15:00 +0000</pubDate><atom:updated>2008-05-05T07:48:50.433-07:00</atom:updated><title>A Question of Two Exchanges</title><description>Let me interrupt the functional programming 101 series to pose an interesting question.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Question: &lt;/span&gt;Which of these two methods of swapping a list of integers is faster?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Method 1: Plain C Code&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;for&lt;/b&gt;&lt;/span&gt;(&lt;span style="color: rgb(46, 139, 87);"&gt;&lt;b&gt;int&lt;/b&gt;&lt;/span&gt; i = &lt;span style="color: rgb(255, 0, 255);"&gt;0&lt;/span&gt;; i &lt;  array_size; i++)&lt;br /&gt;{&lt;br /&gt;&lt;span style="color: rgb(46, 139, 87);"&gt;&amp;nbsp; &amp;nbsp; &lt;b&gt;int&lt;/b&gt;&lt;/span&gt; temp = a[i];&lt;br /&gt;&amp;nbsp; &amp;nbsp; a[i] = b[i];&lt;br /&gt;&amp;nbsp; &amp;nbsp; b[i] = temp;&lt;br /&gt;}&lt;/pre&gt; &lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Method 2: Inline Assembly using the XCHG instruction.&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;__asm&lt;br /&gt;{&lt;br /&gt;&amp;nbsp; &amp;nbsp; mov ecx, array_size;&lt;br /&gt;&amp;nbsp; &amp;nbsp; mov esi, a;&lt;br /&gt;&amp;nbsp; &amp;nbsp; mov edi, b;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;loopstart&lt;/b&gt;&lt;/span&gt;:&lt;br /&gt;&amp;nbsp; &amp;nbsp; mov eax, [esi];&lt;br /&gt;&amp;nbsp; &amp;nbsp; xchg [edi], eax;&lt;br /&gt;&amp;nbsp; &amp;nbsp; mov [esi], eax;&lt;br /&gt;&lt;br /&gt;&amp;nbsp; &amp;nbsp; add esi, &lt;span style="color: rgb(255, 0, 255);"&gt;4&lt;/span&gt;;&lt;br /&gt;&amp;nbsp; &amp;nbsp; add edi, &lt;span style="color: rgb(255, 0, 255);"&gt;4&lt;/span&gt;;&lt;br /&gt;&amp;nbsp; &amp;nbsp; sub ecx, &lt;span style="color: rgb(255, 0, 255);"&gt;1&lt;/span&gt;;&lt;br /&gt;&amp;nbsp; &amp;nbsp; jnz loopstart;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;span style="font-weight: bold;"&gt;(Vague?) Hint:&lt;/span&gt; The Intel engineers who designed the XCHG instruction decided that it would primarily be used to implement synchronization primitives like semaphores and mutexes.&lt;br /&gt;&lt;br /&gt;Comment to let me know what you're thinking about the two snippets of code and why any one of the two should be faster or slower than the other. In my next entry I will post the answer and also dig deeper in the mechanics of the exchanges.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-8504668076071395581?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/05/question-of-two-exchanges.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-1670442649423025088</guid><pubDate>Sun, 13 Apr 2008 04:43:00 +0000</pubDate><atom:updated>2008-04-16T07:52:56.209-07:00</atom:updated><title>Functional Programming 101: Lambda Forms</title><description>I got into an argument with Ashwin the other day about lambdas and closures in &lt;a target="_blank" href="http://en.wikipedia.org/wiki/C%2B%2B0x"&gt;C++ 0x&lt;/a&gt;. Unfortunately, if you're jetlagged and feverish you tend not make a whole lot of sense, and that is what happened with me. At the moment, I am not very jetlagged and a little less feverish, so hopefully, the following makes a little more sense. [The good thing is even if it doesn't make any sense I already have a solid excuse :-)].&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;In the Beginning&lt;/span&gt;&lt;br /&gt;Lets use a simple example: I have a list of integers and I want to create another list that contains only those elements from this list that are greater than 5. The straightforward way to write this is like below:&lt;br /&gt;&lt;pre&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;def&lt;/b&gt;&lt;/span&gt; &lt;span style="color: rgb(0, 128, 128);"&gt;filter_above_5&lt;/span&gt;(l):&lt;br /&gt;o = []        &lt;span style="color: rgb(51, 102, 255);"&gt;# initialize empty list&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt; for&lt;/b&gt;&lt;/span&gt; num &lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;in&lt;/b&gt;&lt;/span&gt; l: &lt;span style="color: rgb(51, 102, 255);"&gt;# iterate over list&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;    if&lt;/b&gt;&lt;/span&gt; num &amp;gt; 5: o.append(num)&lt;br /&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt; return&lt;/b&gt;&lt;/span&gt; o      &lt;span style="color: rgb(51, 102, 255);"&gt;# return filtered list&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;I wrote the snippet above in &lt;a target="_blank" href="http://www.python.org/"&gt;python&lt;/a&gt;, but hopefully, it is easy enough to read even if you haven't seen the language before. [1]&lt;br /&gt;&lt;br /&gt;Shown below is example of how you'd use the &lt;span style="font-family:courier new;"&gt;filter_above_5&lt;/span&gt; function.&lt;br /&gt;&lt;pre&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;print&lt;/b&gt;&lt;/span&gt; "&lt;span style="color: rgb(255, 0, 255);"&gt;original list:&lt;/span&gt;", l&lt;br /&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;print&lt;/b&gt;&lt;/span&gt; "&lt;span style="color: rgb(255, 0, 255);"&gt;filtered list:&lt;/span&gt;", filter_above_5(l)&lt;br /&gt;&lt;/pre&gt;Simple enough, isn't it? For reasons that we will come to later, people like to call this an &lt;span style="font-style: italic;"&gt;imperative implementation&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Lets Make This Functional&lt;/span&gt;&lt;br /&gt;If you think about it, &lt;span style="font-family:courier new;"&gt;filter_above_5&lt;/span&gt; is a special case of a generic filter function which iterates over an input list and produces an output list that contains all the elements from the input list that satisfy some condition.  [If you wanted to make things sound complicated, you'd use &lt;a target="_blank" href="http://en.wikipedia.org/wiki/Predicate_%28computer_programming%29"&gt;predicate&lt;/a&gt; instead of "some condition" and sequence and sub-sequence instead of input and output list. :-)] Jokes apart, it would be cool if we could write a general function like this:&lt;br /&gt;&lt;pre&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;def&lt;/b&gt;&lt;/span&gt; &lt;span style="color: rgb(0, 128, 128);"&gt;filter_list&lt;/span&gt;(predicate, l):&lt;br /&gt;o = []&lt;br /&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt; for&lt;/b&gt;&lt;/span&gt; element &lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;in&lt;/b&gt;&lt;/span&gt; l:&lt;br /&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;   if&lt;/b&gt;&lt;/span&gt; predicate(element): o.append(element)&lt;br /&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt; return&lt;/b&gt;&lt;/span&gt; o&lt;br /&gt;&lt;/pre&gt;&lt;span style="font-family:courier new;"&gt;filter_list&lt;/span&gt; is a function that takes a list and a function (our predicate) as parameters and iterates over each element of the input list, and adds only those elements to the output list that satisfy the predicate. Then you'd call &lt;span style="font-family:courier new;"&gt;filter_list&lt;/span&gt; like below:&lt;br /&gt;&lt;pre&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;def&lt;/b&gt;&lt;/span&gt; &lt;span style="color: rgb(0, 128, 128);"&gt;greater_than_5&lt;/span&gt;(x): &lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;return&lt;/b&gt;&lt;/span&gt; x &amp;gt; 5&lt;br /&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;print&lt;/b&gt;&lt;/span&gt; "&lt;span style="color: rgb(255, 0, 255);"&gt;original list:&lt;/span&gt;", l&lt;br /&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;print&lt;/b&gt;&lt;/span&gt; "&lt;span style="color: rgb(255, 0, 255);"&gt;filtered list:&lt;/span&gt;", filter_list(greater_than_5, l)&lt;/pre&gt;It turns out that python already has a builtin function &lt;span style="font-family:courier new;"&gt;filter&lt;/span&gt; that does exactly what our &lt;span style="font-family:courier new;"&gt;filter_list&lt;/span&gt; function does. So our implementation just boils down to calling the that function with our custom predicate:&lt;br /&gt;&lt;pre&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;def&lt;/b&gt;&lt;/span&gt; &lt;span style="color: rgb(0, 128, 128);"&gt;greater_than_5&lt;/span&gt;(x): &lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;return&lt;/b&gt;&lt;/span&gt; x &amp;gt; 5&lt;br /&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;print&lt;/b&gt;&lt;/span&gt; "&lt;span style="color: rgb(255, 0, 255);"&gt;original list:&lt;/span&gt;", l&lt;br /&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;print&lt;/b&gt;&lt;/span&gt; "&lt;span style="color: rgb(255, 0, 255);"&gt;filtered list:&lt;/span&gt;", &lt;span style="font-weight: bold;"&gt;filter&lt;/span&gt;(greater_than_5, l)&lt;/pre&gt;Notice the difference between our initial implementation and new one.  &lt;span style="font-family:courier new;"&gt;filter_above_5&lt;/span&gt; did not just tell the interpreter what to do, it also specified exactly how to do it using a series of commands. For that reason, we call it an &lt;span style="font-style: italic;"&gt;imperative&lt;/span&gt; implementation.&lt;br /&gt;&lt;br /&gt;Our newest implementation that uses the builtin &lt;span style="font-family:courier new;"&gt;filter &lt;/span&gt;function does &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; specify how the filtering should be done. For instance, with all this ado about multi-core, it is not far fetched to think of a version of &lt;span style="font-family:courier new;"&gt;filter&lt;/span&gt; that spawns a number of threads and splits up the work of filtering  among these threads. The cool thing is, if, say, Python 3.1 implemented such a multi-threaded filtering function, all our code which used the &lt;span style="font-family:courier new;"&gt;filter&lt;/span&gt; function would get the performance improvement for free.&lt;br /&gt;&lt;br /&gt;Plus, notice that this version contains much lesser code, and we're programming at a much higher level of abstraction, not to mention the fact that the new version is more elegant.&lt;br /&gt;&lt;br /&gt;All put together, I think it is fair to say, that the implementation that is based on &lt;span style="font-family:courier new;"&gt;filter&lt;/span&gt; is "better" than the imperative implementation.&lt;br /&gt;&lt;br /&gt;Notice an important thing about filter - we are passing a function as a parameter to another function. Programming languages which allow this kind of thing - passing and returning functions from other functions, construction of functions at runtime etc. - are said to have &lt;a target="_blank" href="http://en.wikipedia.org/wiki/First-class_function"&gt;&lt;span style="font-style: italic;"&gt;first class functions&lt;/span&gt;.&lt;/a&gt; Now, exactly what set of attributes are required for a language to be deemed to have first class function support is a controversial topic, and so we will side-step that issue here. [2] The important point is passing functions to other functions doing so is a key aspect of functional programming. And since we're doing this in our filtering snippet, we're going to call it a &lt;span style="font-style: italic;"&gt;functional implementation&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Enter the Lambda&lt;br /&gt;&lt;/span&gt;Nothing we've done so far can't be done from, say, C/C++. In C for instance, we could've created a filter function that operated on array of &lt;span style="font-family:courier new;"&gt;void*&lt;/span&gt;, and used a function pointer to pass the predicate around. [Similar in spirit to the &lt;a target="_blank" href="http://www.opengroup.org/onlinepubs/009695399/functions/qsort.html"&gt;qsort&lt;/a&gt; library function]. In C++, we could have created a templatized version of filter that was allegedly more elegant and worked in more or less the same way, but would claim to be "safer" because it would be type-checked. [3]&lt;br /&gt;&lt;br /&gt;Recognize that functions like &lt;span style="font-family:courier new;"&gt;greater_than_5&lt;/span&gt; are use and throw functions. They are created only for the purpose of being called from the filtering function, and then they are never used again. Unfortunately, despite the fact that these functions are going to be called from exactly one place, they will still be sitting in source code polluting our namespace. What would be cool is a way of creating the function &lt;span style="font-family:courier new;"&gt;greater_than_5&lt;/span&gt; specifically for the purpose of passing it to filter and then "forgetting about it".&lt;br /&gt;&lt;br /&gt;The solution to this problem is called a &lt;span style="font-weight: bold;"&gt;lambda&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; function&lt;/span&gt;. In python it looks like this:&lt;br /&gt;&lt;pre&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;print&lt;/b&gt;&lt;/span&gt; "&lt;span style="color: rgb(255, 0, 255);"&gt;original list:&lt;/span&gt;", l&lt;br /&gt;&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;print&lt;/b&gt;&lt;/span&gt; "&lt;span style="color: rgb(255, 0, 255);"&gt;filtered list:&lt;/span&gt;", filter(&lt;span style="color: rgb(128, 64, 64);"&gt;&lt;b&gt;lambda&lt;/b&gt;&lt;/span&gt; x: x&amp;gt;5, l)&lt;/pre&gt;Pay attention to the first argument to &lt;span style="font-family:courier new;"&gt;filter&lt;/span&gt;: "&lt;span style="color: rgb(128, 64, 64);font-family:courier new;" &gt;&lt;b&gt;lambda&lt;/b&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; x: x&gt;5"&lt;/span&gt;. What it means is: create a function that takes one parameter as input - "x", and returns the value of the expression &lt;span style="font-family:courier new;"&gt;x&gt;5&lt;/span&gt; and then &lt;span style="font-style: italic;"&gt;return that function that just got created&lt;/span&gt;. Note that the created function is anonymous. The key benefit now is that if I had to call &lt;span style="font-family:courier new;"&gt;filter &lt;/span&gt;in a hundred different places with 50 different filtering predicates, I needn't create a 50 predicates similar to &lt;span style="font-family:courier new;"&gt;greater_than_5&lt;/span&gt; just to pass them to filter. I can create each of these functions at the point at which &lt;span style="font-family:courier new;"&gt;filter&lt;/span&gt; is called.&lt;br /&gt;&lt;br /&gt;This means my code will look much cleaner; there will be no namespace pollution and most importantly I will have to write lesser code.&lt;br /&gt;&lt;br /&gt;So, in summary &lt;span style="font-style: italic;"&gt;lambda forms&lt;/span&gt; allow the functional programmer to create anonymous functions that can be passed around to other functions. Or even be stored for later use [although this is somewhat uncommon].&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Isn't this just syntactic sugar?&lt;br /&gt;&lt;/span&gt;The thing is, besides the lack of namespace pollution, and some fewer keystrokes to type [not that these aren't important things, but lets play the devil's advocate here], we still haven't gained much by using the lambda. And the fact is, if these were the only advantages of the lambda, lambda's would just be cute syntactic sugar.&lt;br /&gt;&lt;br /&gt;But there is more - the key advantage of lambda's is when they are combined with a language that supports &lt;span style="font-weight: bold;"&gt;closures&lt;/span&gt;. We will look at closures in the next post.&lt;br /&gt;&lt;br /&gt;In the meantime, if any of the above didn't make sense, or you have some feedback, please comment below.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;hr /&gt;&lt;br /&gt;[1] If you still run into problems understanding the syntax, please do let me know using the comments.&lt;br /&gt;&lt;br /&gt;[2] Point is that it is is not correct to call python a functional programming language. If you want to be PC, it is a multi-paradigm language with some functional programming features. Anyway, lets not get into these religious wars and concentrate on the fun stuff instead. :-)&lt;br /&gt;&lt;br /&gt;[3] Ensuring const-correctness on your elegant function, making it behave like a STL function so that it works correctly for everything that implements forward_iterators and/or const_forward_iterators and fun times spent debugging the wonderfully cryptic error messages that compilers spit out for templates are but minor prices to pay for that elegance. [Sorry, I couldn't resist that crack :-)]&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;EDIT (16-Apr-07):&lt;/span&gt; Added missing argument to the call to filter. Thanks &lt;span style="font-weight: bold;"&gt;Psygote&lt;/span&gt; for pointing this out!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-1670442649423025088?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/04/functional-programming-101-lambda-forms.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-7472337632683244523</guid><pubDate>Sun, 13 Apr 2008 04:15:00 +0000</pubDate><atom:updated>2008-04-12T21:42:27.440-07:00</atom:updated><title>weblog.rename("Exploring Computation");</title><description>When I started off blogging, I wasn't sure where this was going and it seemed that I was going to be blogging about more or less random things. (Hence the lousy title :-) But as I blogged more and more, two things happened - (1) my own thoughts became a lot clearer about what I wanted to do and (2) the agenda for the blog itself became clearer.&lt;br /&gt;&lt;br /&gt;So, moral of the story: this new tagline for the   blog is that it will be about exploring the art, science, hardware and software of computation. For that to make any sense, I think it is important to define the meaning of the term computation. Here is what &lt;a href="http://en.wikipedia.org/wiki/Computation"&gt;wikipedia&lt;/a&gt; has to say:&lt;br /&gt;&lt;blockquote&gt;&lt;b&gt;Computation&lt;/b&gt; is a general term for any type of &lt;a href="http://en.wikipedia.org/wiki/Information_processing" title="Information processing"&gt;information processing&lt;/a&gt; that can be represented mathematically. This includes phenomena ranging from human thinking to calculations with a more narrow meaning. Computation is a process following a well-defined &lt;a href="http://en.wikipedia.org/wiki/Model_%28abstract%29" title="Model (abstract)"&gt;model&lt;/a&gt; that is understood and can be expressed in an &lt;a href="http://en.wikipedia.org/wiki/Algorithm" title="Algorithm"&gt;algorithm&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Protocol_%28computing%29" title="Protocol (computing)"&gt;protocol&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Network_topology" title="Network topology"&gt;network topology&lt;/a&gt;, etc.&lt;/blockquote&gt;The content of the blog is going to remain similar. The name change is an attempt to bring the documentation in sync with codebase, it is not a modification of the project scope. :-D&lt;br /&gt;&lt;br /&gt;I hope you like the name change. If you don't, please let me know why. (If you do, you can still take this opportunity to compliment my naming skills :-)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-7472337632683244523?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/04/weblogrenameexploring-computation.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-178226199905843003</guid><pubDate>Sun, 23 Mar 2008 09:56:00 +0000</pubDate><atom:updated>2008-03-23T03:12:27.714-07:00</atom:updated><title>Paul Graham on Programmers and Their Bosses</title><description>&lt;p&gt;At last, Mr. Graham has &lt;a target="_blank" href="http://paulgraham.com/boss.html"&gt;written something I agree with&lt;/a&gt;! :-)&lt;/p&gt;&lt;span style=";font-family:verdana;font-size:85%;"  &gt;&lt;blockquote&gt;&lt;span style="color: rgb(51, 51, 153);"&gt;The restrictiveness of big company jobs is particularly hard on programmers, because the essence of programming is to build new things.  Sales people make much the same pitches every day; support people answer much the same questions; but once you've written a piece of code you don't need to write it again.  So a programmer working as programmers are meant to is always making new things. And when you're part of an organization whose structure gives each person freedom in inverse proportion to the size of the tree, you're going to face resistance when you do something new.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 153);"&gt;This seems an inevitable consequence of bigness.  It's true even in the smartest companies.  I was talking recently to a founder who considered starting a startup right out of college, but went to work for Google instead because he thought he'd learn more there. He didn't learn as much as he expected.  Programmers learn by doing, and most of the things he wanted to do, he couldn't—sometimes because the company wouldn't let him, but often because the company's code wouldn't let him.  Between the drag of legacy code, the overhead of doing development in such a large organization, and the restrictions imposed by interfaces owned by other groups, he could only try a fraction of the things he would have liked to.  He said he has learned much more in his own startup, despite the fact that he has to do all the company's errands as well as programming, because at least when he's programming he can do whatever he wants.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 153);"&gt;An obstacle downstream propagates upstream.  If you're not allowed to implement new ideas, you stop having them.  And vice versa: when you can do whatever you want, you have more ideas about what to do. So working for yourself makes your brain more powerful in the same way a low-restriction exhaust system makes an engine more powerful.&lt;/span&gt;&lt;blockquote&gt;&lt;/blockquote&gt;&lt;/blockquote&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-178226199905843003?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/03/paul-graham-on-programmers-and-their.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-7242742074526146576</guid><pubDate>Tue, 18 Mar 2008 16:12:00 +0000</pubDate><atom:updated>2008-03-23T03:09:35.162-07:00</atom:updated><title>Intel Interesting</title><description>Apologies for the corny title, but I couldn't stop myself! :-)&lt;br /&gt;&lt;p&gt;The most immediate news is the 6-core Dunnington, expected to release in the second half of this year. It looks like it has a 3-way split 9MB L2 cache, 16 MB of L3 cache and micro-architecture based on the Penryn. The kicker is that it only has a TDP of 80W!&lt;br /&gt;&lt;/p&gt;&lt;p&gt;There is more and more information about the Nehalem now becoming available. We'd already heard about the integrated DDR3 memory controller and simultaneous multi-threading (the  successor to hyperthreading). The latest update is that the Nehalem will feature a &lt;a target="_blank" href="http://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect"&gt;QuickPath &lt;/a&gt;interconnect capable of transfer rates of upto 25.6 GBps. And it is not just more memory bandwidth and multiple cores that will improve performance, it turns out that the Nehalem will increase the OOO execution window to 128 uOps (from 96 on the Penryn) - so individual cores are also going to get faster!&lt;br /&gt;&lt;/p&gt;&lt;p&gt;In the longer term Sandy Bridge - the second of Intel's 32 nm processors - is going to have a 256-bit wide SSE execution unit. The only complaint I have about this is that it is 2 years away from shipping. :-)&lt;br /&gt;&lt;/p&gt;&lt;p&gt;See &lt;a target="_blank" href="http://www.theinquirer.net/gb/inquirer/news/2008/03/17/intel-puts-flesh-future-chip"&gt;here&lt;/a&gt;, &lt;a target="_blank" href="http://community.zdnet.co.uk/blog/0,1000000567,10007559o-2000331777b,00.htm"&gt;here&lt;/a&gt; and &lt;a target="_blank" href="http://www.itweek.co.uk/itweek/news/2212201/intel-details-roadmap-cores"&gt;here&lt;/a&gt; for more details.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-7242742074526146576?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/03/intel-interesting.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-6863330785456529755</guid><pubDate>Wed, 05 Mar 2008 17:10:00 +0000</pubDate><atom:updated>2008-03-05T09:15:10.463-08:00</atom:updated><title>Random Link Clearance</title><description>Links I've accumulated over the last week or so:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=UBvR3-16aEM"&gt;50 Cent vs Windows XP.&lt;/a&gt; In Da Club mixed with error sounds from XP. I like this better than the original!&lt;/li&gt;&lt;li&gt;Scott Meyer's wonderful comic with Basic Instructions on &lt;a href="http://www.basicinstructions.net/2007/09/how-to-seem-smart.html"&gt;How to Seem Smart.&lt;/a&gt;&lt;/li&gt;&lt;li&gt;On a slightly more serious note, Jeff Atwood &lt;a href="http://www.codinghorror.com/blog/archives/001058.html"&gt;explains how to hack your progress bars to make your code seem faster&lt;/a&gt;!&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-6863330785456529755?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/03/random-link-clearance.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-4834001121979136331</guid><pubDate>Sat, 01 Mar 2008 15:26:00 +0000</pubDate><atom:updated>2008-03-01T07:29:50.032-08:00</atom:updated><title>SSE Tutorial : Part III</title><description>After fighting with blogger for about half-an-hour to make it get the damn formatting right, I've given up. So I've published it on Google Docs instead.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://docs.google.com/Doc?id=dfpj2d8d_70cswxt6c3"&gt;Click here to see the document&lt;/a&gt;, and please leave your comments below.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-4834001121979136331?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/03/sse-tutorial-part-iii_6992.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-4405582356812449540</guid><pubDate>Thu, 21 Feb 2008 15:38:00 +0000</pubDate><atom:updated>2008-02-21T08:15:26.969-08:00</atom:updated><title>Finding Which Core Your Thread is Running On</title><description>How do you figure which core of a multi-core processor your thread in running on?&lt;br /&gt;&lt;br /&gt;At first, I figured that there might be a Win32 function to do this. So, I looked around unsuccessfully in MSDN hoping for a "get" equivalent of &lt;a href="http://msdn2.microsoft.com/en-us/library/ms686247%28VS.85%29.aspx"&gt;SetThreadAffinityMask&lt;/a&gt; or &lt;a href="http://msdn2.microsoft.com/en-us/library/ms686247%28VS.85%29.aspx"&gt;SetThreadIdealProcessor&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It turns out the correct way to find this information is to ask the processor. Seems logical if you think about it, but unfortunately the answer not easily googleable because of the weird terminology!&lt;br /&gt;&lt;br /&gt;After a little digging around, I found that the CPUID instruction has this interesting beast called an APIC ID. &lt;a href="http://softwarecommunity.intel.com/articles/eng/3025.htm"&gt;Intel documentation&lt;/a&gt;&lt;a href="http://softwarecommunity.intel.com/articles/eng/3025.htm"&gt; &lt;/a&gt;says:&lt;br /&gt;&lt;span style="color: rgb(51, 51, 153);"&gt;&lt;blockquote&gt;"Each logical processor has a unique APIC ID,which is initially assigned by the hardware at system reset and can be later reprogrammed by the BIOS or the operating system. On a processor that supports Hyper-Threading Technology, the CPUID instruction also provides the initial APIC ID for a logical processor prior to any changes by the BIOS or operating system."&lt;/blockquote&gt;&lt;/span&gt;Simplified, this just means that in a multi-processor/multi-core system, the APIC ID is a unique identifier for each logical processor.&lt;br /&gt;&lt;br /&gt;What's a logical processor? A hyperthreaded system capable of running two threads "in parallel" has two logical processors. A quad-core system has 4 logical processors. A simple way of looking at this is that you have as many logical processors as graphs in the performance tab of the task manager!&lt;br /&gt;&lt;br /&gt;So, all you have to do is use the CPUID function 0x0001 and look in bits 31-24 of the EBX register, which is where the APIC ID lives. This looks like this:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;int&lt;/span&gt; apic_id;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;__asm &lt;/span&gt;  &lt;br /&gt;{&lt;br /&gt;mov     eax, &lt;span style="color: rgb(102, 0, 0);"&gt;1&lt;/span&gt;;&lt;br /&gt;cpuid;&lt;br /&gt;shr     ebx, &lt;span style="color: rgb(51, 0, 0);"&gt;24&lt;/span&gt;;&lt;br /&gt;mov     apic_id, ebx;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So that's that!&lt;br /&gt;Let me know if you have questions/comments.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-4405582356812449540?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/02/finding-which-core-your-thread-is.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-6362856870152959571</guid><pubDate>Sun, 10 Feb 2008 14:08:00 +0000</pubDate><atom:updated>2008-02-10T06:15:09.258-08:00</atom:updated><title>Einstien Quote</title><description>Here is one of my favourite Albert Einstein quotes:&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);font-size:180%;"&gt;&lt;br /&gt;“&lt;/span&gt;&lt;br /&gt;The more a man is imbued with the ordered regularity of all events, the firmer becomes his conviction that there is no room left by the side of this ordered regularity for causes of a different nature.  For him neither the rule of human nor the rule of divine will exists as an independent cause of natural events.  To be sure, the doctrine of a personal God interfering with natural events could never be refuted, in the real sense, by science, for this doctrine can always take refuge in those domains in which scientific knowledge has not yet been able to set foot.  But I am persuaded that such behavior on the part of the representatives of religion would not only be unworthy but also fatal.  For a doctrine which is able to maintain itself not in clear light, but only in the dark, will of necessity lose its effect on mankind, with incalculable harm to human progress.  In their struggle for the ethical good, teachers of religion must have the stature to give up the doctrine of a personal God, that is, give up that source of fear and hope which in the past placed such vast powers in the hands of priests.  In their labors they will have to avail themselves of those forces which are capable of cultivating the Good, the True, and the Beautiful in humanity itself.  This is, to be sure, a more difficult but an incomparably more worthy task.&lt;br /&gt;&lt;div style="text-align: right;"&gt;&lt;span style="color: rgb(51, 51, 255);font-size:180%;" &gt;”&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-6362856870152959571?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/02/einstien-quote.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-3153234756527175513</guid><pubDate>Sat, 09 Feb 2008 13:05:00 +0000</pubDate><atom:updated>2008-02-09T05:46:14.851-08:00</atom:updated><title>Experiments with CUDA/CUBLAS on a Tesla C870</title><description>I've been playing around with the CUDA/CUBLAS/CUFFT libraries recently attempting to benchmark the kind of performance improvements possible with GPUs. I've found some surprising results along the way, and I thought I'd put them down here.&lt;br /&gt;&lt;br /&gt;Data transfer between GPU/CPU is still a big bottleneck. On my relatively old DDR2-533M bus with the Tesla sitting in PCIe-x16 slot I am seeing throughputs in the 1100-1300 MB/s range. Obviously, smaller data sizes don't see the full throughput. Note that pinned memory is about twice as fast as "regular" memory. I am told that some folks have seen over 3GB/s using pinned memory; presumably using faster buses than mine. Taken by itself, that is a very impressive number but even that does not make the transfer bottleneck go away.&lt;br /&gt;&lt;br /&gt;A corollary of the previous result is that to see a real speedup (lets arbitrarily define this as &gt;= 20X over single threaded CPU code) for single operations, the big-Oh complexity of operation you're doing on the GPU must be higher than that of the data transfer. Not sure if I put that too well, what I'm saying is: you will see a big speedup with matrix-matrix multiply because the operation is O(n^3) and the data transfer is O(n^2). On the other hand, matrix-vector multiply will not see as much of a speedup because both your data-transfer and operation are O(n^2). [1]&lt;br /&gt;&lt;br /&gt;There are sweet spots for CUBLAS kernels. Sgemm likes matrix sizes that multiples of 32, and Sgemv likes matrices with rows that are multiples of 16. Performance at the sweet spots is much much better.&lt;br /&gt;&lt;br /&gt;Sgemv throughput peaks at only 10 or so GFLOPs when the trans input is 't'. When trans is 'n', I'm seeing throughputs of over 35 GFLOPS.  Translated into English, this means that cublasSgemv is &lt;span style="font-style:italic;"&gt;much&lt;/span&gt; faster at doing y = Ax + y [trans='n'] than y = transpose(A)x + y [trans='t'].&lt;br /&gt;&lt;br /&gt;Sgemm gives me peak throughputs of 120+ GFLOPs. There are some variations for different inputs, but except for the multiple of 32 quirk (which can bring the 120 GLOPs number down to 100 GFLOPs), I haven't seen anything that affects performance significantly. &lt;br /&gt;&lt;br /&gt;Sgemm and Sgemv are &lt;span style="font-weight:bold;"&gt;much faster&lt;/span&gt; than Ssymm and Ssymv. My recommendation is to use Sgemm and Sgemv throughout - even for symmetric matrices.&lt;br /&gt;&lt;br /&gt;Another weird thing I'm noticing is high CPU usage. With almost no CPU code running [except the data-transfers and kernel-invocations] I am seeing CPU usages in the 50-70% range. I know for a fact that several NI device drivers can do DMA between main memory and modular instruments with less than 10% CPU usage. I'm no device driver expert but I figured I'd see numbers in the same ballpark here. I'd love to see the CPU usage number go down, so I can bang out even more FLOPs using the CPU.&lt;br /&gt;&lt;br /&gt;Overall, my experience with CUDA has been extremely positive. Despite the weird performance quirks, the technology seems very promising. The API is well designed and easy to program and debug. One minor complaint I have is the column-major array storage. This combined with the Sgemv trans='t' performance bug is causing me a few problems.&lt;br /&gt;&lt;br /&gt;Let me know if you have questions/comments.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Footnotes&lt;/span&gt;&lt;br /&gt;[1] The situation could be different if you multiplying vectors with a &lt;span style="font-style:italic;"&gt;constant matrix&lt;/span&gt; [so you don't need to push the matrix over the bus again and again], in that case Sgemv can give you big performance gains over CPU code.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-3153234756527175513?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/02/experiments-with-cudacublas-on-tesla.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-8131353836942892945</guid><pubDate>Fri, 08 Feb 2008 18:39:00 +0000</pubDate><atom:updated>2008-02-20T07:54:18.231-08:00</atom:updated><title>On Relevant Research</title><description>Mahesh sent me &lt;a target="_blank" href="http://www.oreilly.com/catalog/opensources/book/appa.html"&gt;this fascinating newsgroup thread&lt;/a&gt; from back in 1992 where Andrew Tannenbaum (of the OS and networking books fame) flames Linus Torvalds and make some less than correct predictions about Linux! Highly recommended reading.&lt;br /&gt;&lt;br /&gt;Anyway, there's this thing has been bugging me for a long time now, but this newgroup thread and my recent visit to &lt;a target="_blank" href="http://www.ee.iitb.ac.in/~ncc2008/"&gt;NCC '08&lt;/a&gt; pushed these thoughts to fore and I figured I put them down here for public ridicule. :-) What I've been thinking about is fundamentally simple: what fraction of research is actually useful? [1]&lt;br /&gt;&lt;br /&gt;To me, the fundamental point of research (or for that matter anything at all) is that it should solve somebody's problem. Mathematically beautiful results, and elegant languages are all very well and good if you're making a living writing papers and textbooks. But, unless people are able to use them to do stuff they couldn't before, they are useless. When you put it down like this what I'm saying seems fairly obvious, but people just keep missing this damn stupid obvious fact all the time. Consider the Lisp fanboys led by Paul Graham and his ilk: they keep going on and on about how Lisp is a wonderfully elegant language that will expand your ways of thinking, blah blah blah, you should move to Lisp and you will be saved, blah blah blah. Granted, Mr. Graham actually made a ton of money selling his Lisp web-app to Yahoo, but I believe this is because he was a good programmer who wrote good code for a good application. IOW, it had nothing to do with Lisp and everything to do with the application and its programmers. For, if Lisp were the magic route to startup success Cleartrip.com wouldn't suck so much [2]. Not to mention that there'd be more guys trying to repeat the formula &lt;span style="font-style:italic;"&gt;and succeeding.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;My theory is that people who consider themselves reasonably intelligent human beings [3] find it is extremely hard to resist the tendency to show off how smart they are. And so, they get caught up in intellectual games that don't achieve anything useful. As a consequence lots of research keeps falling into this trap of doing stuff not because it helps make something better, but rather as a means for the doer to show off how much smarter s/he is than the rest of us.&lt;br /&gt;&lt;br /&gt;Coming back to the point about Lisp : an elegant, theoretically beautiful language is  great, but what is really required is a reasonably productive language that is efficient, has great interoperability and will allow me to leverage existing code. [4] I don't claim to speak for anyone else, but efficiency is definitely a criteria for me - I am working on real applications that have real timing requirements that look very difficult to achieve from even relatively fast high-level languages like C#. And I can assure you that interoperability and the ability to leverage existing codebases is a huge deal for anybody. And so my response to the Lisp/ML/Haskell advocates is to get back to me when a compiler that can provide performance to within, say, 50% of C, and will work seamlessly with existing code is available.&lt;br /&gt;&lt;br /&gt;I went off on a rant about Lisp, but I should clarify that I am not against using Lisp or any other functional language. (In fact, I'm trying to teach myself F# right now) The point I am trying to make is that for a programming language to be really useful to everybody, there are more constraints than just elegance of expression. Similarly, like Lawrence C. Foard points out to Mr. Tannenbaum, theorizing about an OS's elegance and portability is bogus if you don't meet the first criterion - usability. [5] Generalizing, I believe this is a point that applies to all research, if something doesn't make something possible/easier/better/faster it isn't worth doing.&lt;br /&gt;&lt;br /&gt;&lt;hr&gt;&lt;span style="font-weight:bold;"&gt;Footnotes:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;[1] I am using research and useful without defining them precisely, hopefully, you all get what I mean. My goal here is to explore a line of thought, not to get into vague semantic arguments. :-)&lt;br /&gt;&lt;br /&gt;[2] The last time I used it, I got sever errors on 3/8 page reloads! I find it ironic that for a Lisp startup they have an MBA for a CEO!&lt;br /&gt;&lt;br /&gt;[3] EDIT: There used to be a comment where which was being misinterpreted, so it doesn't exist any more! :-)&lt;br /&gt;&lt;br /&gt;[4] And may I say that Anders Hejlsberg, Jim Hugunin and team have done wonderful job achieving just that with the CLR and DLR.&lt;br /&gt;&lt;br /&gt;[5] By usability I mean being able to use the OS to run real applications.&lt;br /&gt;&lt;br /&gt;&lt;hr&gt;&lt;span style="font-weight:bold;"&gt;Answer to the question in the SSE Tutorial Part I&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I'd asked why "add eax, 1" would be preferred to "inc eax". It turns out inc eax is a 1-byte instruction, and so can screw up alignments for subsequent memory fetches. (Something along these lines, I will put a more detailed explanation shortly).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-8131353836942892945?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/02/on-relevant-research.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-6026063711560699053</guid><pubDate>Sun, 27 Jan 2008 14:20:00 +0000</pubDate><atom:updated>2008-01-27T08:27:49.148-08:00</atom:updated><title>SSE Tutorial : Part II</title><description>A note before I start: not all the features mentioned below appeared in the same revision of the SSE instruction set. (IOW, the instruction set is constantly evolving, so your mileage may vary depending on when you bought your processor).&lt;br /&gt;&lt;br /&gt;The key point to SSE is that AMD and Intel have put 8 new 128-bit registers on their processors. You can view the registers as:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;16 8-bit integers&lt;/li&gt;&lt;li&gt;8 16-bit integers&lt;/li&gt;&lt;li&gt;4 32-bit integers&lt;/li&gt;&lt;li&gt;4 32-bit single precision floating point numbers (float)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;2 64-bit double precision floating point numbers (double)&lt;/li&gt;&lt;/ul&gt;And you can do math on all the 16/8/4/2 elements "in parallel". The obvious benefit is that if you're doing the same operation on all the elements of an array, doing it 16 at a time or 4 at a time is faster than doing one at a time. There is also another benefit - it turns out that on modern processors and buses, reading memory elements that are aligned on 128-bit boundaries is much faster. So, optionally, SSE provides instructions to perform 128-bit aligned reads and writes from memory. (Of course, you can choose not to use them, because there are equivalent unaligned reads as well) There are also instructions provides closer control over caching behaviour - for instance the MOVNTDQ instruction writes a 128-bit value (Double Quadword) to memory whilst hinting to the processor to not store that value in the cache. The instruction PREFETCH, (obviously), prefetches values into the cache.&lt;br /&gt;&lt;br /&gt;One big feature (especially for C/C++ programmers) that came with SSE was faster double/float to integer conversion instructions. When coding using the x87 floating point stack, the code to convert a floating point number to an integer in C, would work roughly like this: (a) store the FPU rounding mode (b) Set the rounding mode to truncate [note that this is required by the C standard] (c) Convert double to integer (d) Restore old FPU rounding mode. The whole circus was rather expensive. SSE gives 2 sets of fast instructions to convert doubles/floats (and the tuples of 4 floats and 2 doubles) to integers. The instruction itself specifies the rounding mode to use (i.e. truncate/round) and so no messing with the FPU state is required. In fact, the so-called SSE acceleration that the VC++ 8 compiler pretends to do is mainly limited to using the faster convert instructions!&lt;br /&gt;&lt;br /&gt;Besides the fast math there are some other benefits. SSE finally got rid of the floating point register stack, so the new floating point code is easier to write and a little faster at times. There are also some pretty cool instructions aimed at specific problems. An example is the CRC calculation instruction that is available with SSE revision 4!&lt;br /&gt;&lt;br /&gt;Anyway, that's it for now. Next time, I promise more code and less gyan. Let me know if you have questions/comments.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-6026063711560699053?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/01/sse-tutorial-part-ii.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-6316444666276914132</guid><pubDate>Sun, 27 Jan 2008 06:55:00 +0000</pubDate><atom:updated>2008-01-27T08:20:18.421-08:00</atom:updated><title>Software Engineers aren't?</title><description>You all probably know that &lt;a href="http://ravimohan.blogspot.com/2008/01/ratcatchers-and-engineers.html"&gt;Ravi Mohan&lt;/a&gt; has started off this controversy (flame-war?) by saying something to the effect of "You can call yourself an engineer, but by my definition of an engineer, you're lying (to yourself). You're actually a clerk.".&lt;br /&gt;&lt;br /&gt;The who is an engineer and who isn't debate that he's started off reminds of one of these useless academic questions we got asked in VTU. [1] Whether you conform to what X's defines an engineer/clerk/manager/scientist to be, is completely irrelevant and useless. The designation you have is just a label - an aid for identification. If you claim to be a engineer, but you aren't - who cares? [2] The key goal is you should be happy doing what you're doing. To be fair, Ravi says as much (albeit in a provocative tone).&lt;br /&gt;&lt;br /&gt;The unfortunate part of this debate is that people are going to take offense because they perceive being called a clerk as an insult. They'd like to think they are better than clerks, and they don't like being told otherwise.&lt;br /&gt;&lt;br /&gt;My point is, people have all sorts of foolish and not-so-foolish delusions - and mostly they cause no harm. Sometimes, it is these so-called delusions that give you motivation during hard times. If your ECG technician fancies himself a cardiac specialist, good for him. Smile while he gives you advice on your heart and use your common sense on whether to follow it. Don't shout at him and call him a unqualified technician. Not following his advice is perfectly fine, but if you shout at him or even make fun of him privately, you're being arrogant.&lt;br /&gt;&lt;br /&gt;The only defense I can see for this whole series to have been posted on the internet is that it opens up an interesting line of thought. Exactly what does an engineer mean? Lets follow this to its logical end. Assume that we did arrive at a consensus on an exact definition for engineers. Now you can say with authority that X is an engineer and Y isn't. Now that we have this, I can't see a single useful benefit having it. I &lt;span style="font-style: italic;"&gt;can&lt;/span&gt; see a lot of pissed off engineers/clerks (take your pick). And that is why I believe he should've kept his thoughts to himself.&lt;br /&gt;&lt;br /&gt;I also don't buy the don't-read-my-blog-if-you-don't-agree-with-me argument. It is a bit too similar to:&lt;br /&gt;&lt;br /&gt;Person A: You're an idiot. Did you know that?&lt;br /&gt;Person B: Why the hell are you insulting me?&lt;br /&gt;Person A: Don't listen to me if you don't like what I say.&lt;br /&gt;&lt;br /&gt;Yes, everybody has a right to their opinions, but with that right comes the responsibility of having to stand up for them. If you believe that X is true, you've got to take the shit that comes with it.&lt;br /&gt;&lt;br /&gt;Ravi, I suggest you do some introspection on whether this who is and isn't engineer thing actually did anybody any good. May I also point out that &lt;a href="http://www.blogger.com/comment.g?blogID=14853042&amp;amp;postID=6137338992807348287"&gt;insulting the &lt;span style="font-style: italic;"&gt;commenter&lt;/span&gt;&lt;/a&gt; who (rightly/wrongly) criticized your blog &lt;span style="font-style: italic;"&gt;entries&lt;/span&gt; was uncalled for.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;[1] Case in point: Differentiate between a for-loop and while loop.&lt;br /&gt;[2] There is also a bigger question here - can you even make a definition of an engineer that is objective? If not, and it seems like that to me, whose definition do you pick as the "right" one? Having said that, I believe that in this specific instance, since the definition itself is irrelevant, the question doesn't need any answering.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-6316444666276914132?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2008/01/software-engineers-arent.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-7596642285664663718</guid><pubDate>Tue, 25 Dec 2007 14:03:00 +0000</pubDate><atom:updated>2008-01-27T00:16:25.940-08:00</atom:updated><title>A Quick and Dirty Intro to SSE - Episode I</title><description>Since, I've done a little bit of assembly language coding using SSE, I thought it might be useful to write up a little introduction here.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;What is SSE?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;SSE = Streaming SIMD Extensions. It is an instruction set for IA-32 processors that speeds up doing SIMD operations.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;What the heck is SIMD anyway?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;SIMD = Single Instruction Multiple Data. Example: you want to add 1.0 to every element of a floating point array - so your instruction (Add 1.0) is the same for each element, but the data (the elements of the array) are different - hence &lt;span style="font-style: italic;"&gt;multiple data&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;single instruction&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Why can't I just use a loop?&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Loops are slow. That's why.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;                  mov     ecx, nElements&lt;br /&gt;addOneLoop:&lt;br /&gt;                  mov     eax, [esi]&lt;br /&gt;                  add     eax, 1&lt;br /&gt;                  mov     [esi], eax&lt;br /&gt;                  add     esi, 4&lt;br /&gt;                  sub     ecx, 1&lt;br /&gt;                  jnz     addOneLoop&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Assuming that all elements of the array are in L1 cache (which is a bogus assumption, but will do for our purposes help simplify the explanation), the processor will spend far more time doing the book-keeping for the loop - decrement the counter, increment the pointer, executing the branch, than actually doing the addition. In fact, the add eax, 1&lt;sup&gt;*&lt;/sup&gt;  is the fastest instruction in the loop because the immediate operand is a nice small constant, and the other operand is a register. In fact, if you looping condition is more complicated than the one shown, or if you have a branching construct inside of the loop, branch misprediction alone can slow your code by a factor of around 2/3!&lt;br /&gt;&lt;br /&gt;It turns out that this kind of loop is really common in certain applications - signal and image processing for instance, and so AMD and Intel support the SSE instruction set which accelerate these algorithms.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Must be complicated, right?&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Not really. It is quite simple, and we'll go into the details over the next few posts. Let me know if you have any questions/comments.&lt;br /&gt;&lt;br /&gt;&lt;hr&gt;&lt;span style="font-weight:bold;"&gt;The Eager Student's Question Corner:&lt;/span&gt; (Or maybe lame section name corner :-)&lt;br /&gt;* Any guesses as to why I used "add eax, 1" instead of "inc eax"?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-7596642285664663718?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2007/12/quick-and-dirty-intro-to-sse-episode-i.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-2785503075640428910</guid><pubDate>Tue, 25 Dec 2007 13:29:00 +0000</pubDate><atom:updated>2007-12-25T06:02:50.859-08:00</atom:updated><title>Size Matters</title><description>&lt;a href="http://steve-yegge.blogspot.com/2007/12/codes-worst-enemy.html"&gt;Steve Yegge latest post&lt;/a&gt; starts off talking about code size, meanders around to how much Java sucks and how Java programmer's are oblivious to it and ends with a plug for dynamic languages. Business as usual, right? :-)&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;Fortunately, &lt;a href="http://www.codinghorror.com/blog/archives/001025.html"&gt;Jeff Atwood&lt;/a&gt; has deciphered Mr. Yegge's hidden meaning and given us an executive summary:&lt;br /&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;If you personally write 500,000 lines of code in any language, you are so totally screwed. &lt;/li&gt;&lt;li&gt;If you personally rewrite 500,000 lines of static language code into 190,000 lines of dynamic language code, you are still pretty screwed. And you'll be out a year of your life, too. &lt;/li&gt;&lt;li&gt;If you're starting a new project, consider using a dynamic language like Ruby, JavaScript, or Python. You may find you can write less code that means more. A lot of incredibly smart people like Steve present a compelling case that the grass really &lt;i&gt;is&lt;/i&gt; greener on the dynamic side. At the very least, you'll learn how the other half lives, and maybe remove some blinders you didn't even know you were wearing. &lt;/li&gt;&lt;li&gt;If you're stuck using exclusively static languages, ask yourself this: why &lt;i&gt;do&lt;/i&gt; we have to write so much damn code to get anything done-- and how can this be changed? &lt;a href="http://en.wikiquote.org/wiki/Alan_Kay"&gt;Simple things should be simple, complex things should be possible&lt;/a&gt;. It's healthy to question authority, &lt;i&gt;particularly&lt;/i&gt; language authorities. &lt;/li&gt;&lt;/ul&gt;My two cents on the issue:&lt;ul&gt;&lt;li&gt;Code is like life, if you pay attention to detail when it is small, it will be easier to maintain when it grows. So, being scrupulous when you're writing code the first time round, as opposed to "making-it-work-now-and-cleaning-it-up-later" will really pay off in the long run.&lt;/li&gt;&lt;li&gt;Even if you can't write in a dynamic language, just programming in these languages a little will show you newer, cleaner ways of expression that you can take back to C++/Java.&lt;/li&gt;&lt;/ul&gt;&lt;div style="text-align: left;"&gt;Both posts are highly recommended reading.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-2785503075640428910?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2007/12/size-matters.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-1791996272314678562</guid><pubDate>Sun, 02 Dec 2007 14:34:00 +0000</pubDate><atom:updated>2007-12-02T07:15:32.423-08:00</atom:updated><title>blog.restart();</title><description>A few months down the line, I've figured that completely shutting down the blog was a bad idea and so I am going to start blogging again.&lt;br /&gt;&lt;table border="1" frame="box" rules="none"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;pre&gt; &lt;span style="color:blue;"&gt;RantGenerator&lt;/span&gt; blog = Pramod.createRantObject();&lt;br /&gt; blog.restart();&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;I have a couple of interesting things I want to blog about. So stay tuned please!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-1791996272314678562?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2007/12/blogrestart.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-3261341343978891365</guid><pubDate>Sat, 06 Oct 2007 13:57:00 +0000</pubDate><atom:updated>2007-10-06T07:02:50.207-07:00</atom:updated><title>Shutting down blog</title><description>Hi everybody,&lt;br /&gt;&lt;br /&gt;I am not able to find enough time to do justice to this blog and so I've decided to stop blogging for now. I don't want to delete the blog, because at some point, I want to start again, and I kinda like the domain name! :-) But I think that is at least a few years into the future. &lt;br /&gt;&lt;br /&gt;So, for now, this blog is officially shutdown.&lt;br /&gt;&lt;br /&gt;When I started writing, I didn't really expect anybody to read this. But quite a few people did, and I really appreciate your support.&lt;br /&gt;&lt;br /&gt;Thanks!&lt;br /&gt;-- Pramod&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-3261341343978891365?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2007/10/shutting-down-blog.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-2597638468055720982</guid><pubDate>Mon, 23 Jul 2007 16:17:00 +0000</pubDate><atom:updated>2007-07-23T10:53:15.190-07:00</atom:updated><title>Is Assembly Language Still Worthwhile?</title><description>I've been spending the last few weekends coding in Assembly Language. &lt;br /&gt;&lt;br /&gt;The  interesting thing is, I've heard from a whole bunch of people (Ashwin, Anand and most recently Nigel) that modern compilers do a very good job of optimizing C/C++. And so, (a) it will very difficult to write code that is faster than the compiler (b) it is a not a worthwhile exercise writing assembly code. Now, I am always skeptical when people say that computers can do something better than humans! :-) And for that (and other reasons), I find that statement somewhat hard to digest.&lt;br /&gt;&lt;br /&gt;On a related note, there is a lively debate going on at the &lt;a href="http://www.pldesignline.com/"&gt;"Programmable Logic DesignLine Newsletter"&lt;/a&gt; about whether people still programming in assembly language. The general consensus seems to be: yes, maybe, sometimes, but watch your step! And that is  more or less how I feel about the issue! :-) &lt;br /&gt;&lt;br /&gt;Now on the compiler vs. human issue, I think a human can always hand tune compiler generated code to do better. I don't think this is always going to be worthwhile because of the amount of effort this involves, but if you want to be at the bleeding edge of performance, judicious coding in assembly can only do you good. I say this with confidence because as a human you will always know more about the  program than your compiler!&lt;br /&gt;&lt;br /&gt;Let me give you an example. Ashwin wrote this &lt;a href="http://en.wikipedia.org/wiki/Convolutional_code"&gt;convolutional coding&lt;/a&gt; function that looks somewhat like this:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;inline int8 calculateMaskedParity(int32 SRState, int32 mask)&lt;br /&gt;{&lt;br /&gt; int32 y = SRState&amp;mask;&lt;br /&gt; y ^= (y&gt;&gt;16);&lt;br /&gt; y ^= (y&gt;&gt;8);&lt;br /&gt; y ^= (y&gt;&gt;4);&lt;br /&gt; y ^= (y&gt;&gt;2);&lt;br /&gt; y ^= (y&gt;&gt;1);&lt;br /&gt; y &amp;= 1;&lt;br /&gt; return (int8)y;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;extern "C" CONVOLUTIONALENCODERASM_API &lt;br /&gt;void convolutionalEncodeC(LVInt32ArrayHandle hGenMatrix, LVInt8ArrayHandle inBitStream, LVInt8ArrayHandle outBitStream)&lt;br /&gt;{&lt;br /&gt; int8 *inBits   = (*inBitStream)-&gt;data;&lt;br /&gt; int8 *outBits  = (*outBitStream)-&gt;data;&lt;br /&gt; int32 numberOfBits = (*inBitStream)-&gt;size;&lt;br /&gt; int32* genMatrix = (*hGenMatrix)-&gt;data;&lt;br /&gt; int32 nMasks = (*hGenMatrix)-&gt;size;&lt;br /&gt;&lt;br /&gt; int32 srState = 0;&lt;br /&gt; int32 outIndex = 0;&lt;br /&gt; for(int i=0 ; i&amp;lt;numberOfBits ; ++i)&lt;br /&gt; {&lt;br /&gt;  srState &lt;&lt;= 1;&lt;br /&gt;  srState |= inBits[i];&lt;br /&gt;  &lt;br /&gt;  for(int j = 0; j &lt; nMasks; j++)&lt;br /&gt;   outBits[outIndex++] = calculateMaskedParity(srState, genMatrix[j]);&lt;br /&gt; }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Ignore all the structures prefixed with LV, they are basically ways of passing data between LabVIEW and C/C++. The meat of the code is inside that the for loop. And the for loop calls that neatly written function calculateMaskedParity which xor's all that bits in an integer in time proportional to log(n) where n is the number of bits in a integer. You'd think that this was a pretty good convolutional coding function and if you ask me, I'd say that it is! But I was able to write a function that does &lt;a href="http://en.wikipedia.org/wiki/Convolutional_code#Punctured_convolutional_codes"&gt;convolutional coding and puncturing&lt;/a&gt;, and with an option for different puncturing vectors operating on different parts of the input bits in assembly language that ran at the same speed as the plain coding function!&lt;br /&gt;&lt;br /&gt;I was able to do this because I knew how many times each loop was going to typically run. I could make decisions like this inner loop is going to be run 30/40 times, so I'd better keep everything this one needs in a register. This variable is only going to be used once every 100 or so bits, I don't really need to use a register for this. I could say, I need to do outBits[i], but I don't have room for a register for outBits. But stop, outBits is just [ebp+12], and the loop counter is eax, so I can use [ebp+eax+12] instead! No compiler can figure out these kind of "optimizations" - they can create graphs and re-order instructions, they can allocate registers "optimally". But if there are 4 variables and only 3 registers, the compiler can't make decisions like: now "i" needs a register, inPtr needs a register, but I am only going to be writing out if this bit is not going to be punctured, so I probably skip the register for outPtr and use it to keep track of my puncturing vector instead - &lt;span style="font-style:italic;"&gt;because compilers cannot understand programs!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now I am not saying that we should all go and write everything in assembly. (We should go write everything in &lt;a href="http://www.python.org"&gt;Python&lt;/a&gt;, not assembly :-)  what I am saying is, you can always "beat" the compiler if you put sufficient effort into it. Whether that effort will be worth your while depends on your needs and your application.&lt;br /&gt;&lt;br /&gt;Another reason you might want to use assembly language is if you want to take advantage of some special features of your architecture/instruction set. I will try and blog about this in over the next few days, so stay tuned. :-)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-2597638468055720982?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2007/07/is-assembly-language-still-worthwhile.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-3548995538071920288</guid><pubDate>Fri, 08 Jun 2007 16:04:00 +0000</pubDate><atom:updated>2007-06-08T09:09:54.395-07:00</atom:updated><title>Intel Taking GPU Threat Seriously</title><description>Ashwin found &lt;a target="_blank" href="http://www.beyond3d.com/content/articles/31/4"&gt;this Intel Presentation about the threat to CPUs from GPGPU&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;The presentation makes very interesting reading, and I think the multi-core "gpu-like" architecture that the presentation is recommending is far more scalable than the path Intel are going down on now. Also, programming one of those &lt;a target="_blank" href="http://en.wikipedia.org/wiki/SPMD"&gt;SPMD&lt;/a&gt; cores is going to be so much fun! :-)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-3548995538071920288?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2007/06/intel-taking-gpu-threat-seriously.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-7915506520294093714</guid><pubDate>Sun, 03 Jun 2007 04:44:00 +0000</pubDate><atom:updated>2007-06-08T10:38:05.153-07:00</atom:updated><title>Ant Colony Optimization</title><description>Ant Colony Optimization is a combinatorial optimization technique discovered by Marco Dorigo of the at the Université Libre de Bruxelles.&lt;br /&gt;&lt;br /&gt;Basically, the method tries to model how ants find their way to food. &lt;br /&gt;&lt;br /&gt;Initially, ants move in random directions seeking food. As an ant moves, it drops &lt;a target="_blank" href="http://en.wikipedia.org/wiki/Pheromone"&gt;pheromone&lt;/a&gt; trails on the ground. This serves two purposes, once the ant finds food, it has a way to get back home. Secondly, another ant moving around knows that an ant passed through here and went in a certain direction. Thirdly, ants which encounter pheromone trails are likely to follow that trail. The probability of following a trial increases with the amount of pheromone dropped on that trail. Finally, the pheromone doesn't stay around forever, it evaporates.&lt;br /&gt;&lt;br /&gt;Picture this scenario: an ant finds its way to food and returns to the base. Around the same time, a few other ants stumble upon this trail. Some of them decide follow it. These ants will also find the food. In doing so they will lay out a pretty strong concentration of pheromone on the ground - causing even more ants to follow that trail. Pretty soon all the ants will have a route to food.&lt;br /&gt;&lt;br /&gt;The story doesn't end here. Say one of the ants is a rebel. It breaks out of the pheromone trail to the food and "discovers" a shorter path to the food. Because of the pheromone that the rebel laid out, some of the other ants will also follow the new trail. If the route is shorter, fewer ants will be required to achieve the same concentration of pheromone as the initial longer trail. Pretty soon, all the ants will switch to the shorter trail. Cool, huh?&lt;br /&gt;&lt;br /&gt;It gets better. Supposing an evil human drops a big fat book right on the route to the food. (Probably squashing a few ants, but that's not the point) The ants will now face one of two scenarios: (a) they have an alternative (and also what used to be sub-optimal) path to the food (b) they don't have a path to food. Option (a) presents no trouble at all. The ants will just simply switch to the other trail (because now there is only one trail to follow). Option (b) will mean the ants will now start from scratch again. After a while, a new path to the food will be found and the pheromone along the old path will have evaporated. And so, the ants will stop following the  path to that leads to a dead-end and switch to the new route.&lt;br /&gt;&lt;br /&gt;Ant Colony Optimization techniques are an attempt to use a similar algorithm to optimize a function. They work by exploring a search space using entities analogous to ants, and metrics that are analogous to pheromones. There are three key points here:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt; The algorithm is &lt;a target="_blank" href="http://en.wikipedia.org/wiki/Stochastic"&gt;stochastic&lt;/a&gt;.&lt;br /&gt;&lt;li&gt; The algorithm is massively parallel - because you can put as many ants as you want on the problem and each ant's path can be computed independently of other ants. (And so are amenable to a parallel implementation)&lt;br /&gt;&lt;li&gt; The algorithm relies on communication via the environment (the "pheromones") rather than between the ants themselves. (And so reduce the evils of communication between agents in parallel machines)&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;Ant Colony algorithms have been applied to many different problems with considerable success. You should visit &lt;a target="_blank" href="http://www.aco-metaheuristic.org/"&gt;the official website&lt;/a&gt; for more information.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-7915506520294093714?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2007/06/ant-colony-optimization.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-2537769708294399726</guid><pubDate>Fri, 01 Jun 2007 15:50:00 +0000</pubDate><atom:updated>2007-06-01T09:24:58.085-07:00</atom:updated><title>Excellent Video (and  a little philosophy)</title><description>My goal for this blog has been to keep the content (for want of a better term :-) technical. But why should my blog be technical? Well, I don't want a personal blog because I believe personal communication should not be done on "&lt;a target="_blank" href="http://en.wikipedia.org/wiki/Broadcasting_%28networks%29"&gt;broadcasts&lt;/a&gt;". I guess I'm a little old fashioned in that respect. I don't have anything to say about movies, music or any of that stuff. I don't think whining on a blog is going to make the world a better place - Looks like I'm unique in that respect though! :-) And, I don't want one of these "mind-dump" blogs - thank you very much. So, what all of that boils down to is my content filter usually lets in only technical posts!&lt;br /&gt;&lt;br /&gt;I said usually, and today I've got something unusual! Anand showed me &lt;a target="_blank"  href="http://www.youtube.com/watch?v=LU8DDYz68kM"&gt;this really great video on YouTube&lt;/a&gt;. If you've not seen it already, I am sure you will enjoy it. Just be patient and watch it till the end!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-2537769708294399726?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2007/06/excellent-video.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">4</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-6136955622459827993</guid><pubDate>Mon, 14 May 2007 15:48:00 +0000</pubDate><atom:updated>2007-12-03T08:02:43.211-08:00</atom:updated><title>Presentation on GPU's for Signal Processing</title><description>&lt;span style="font-weight:bold;"&gt;UPDATE:&lt;/span&gt; I have pulled the presentation down. I will soon put up a slightly modified version of the presentation. I apologize for the inconvenience.&lt;br /&gt;&lt;br /&gt;Ashwin, Nick and I recently presented a paper at NITech (a National Instruments internal technical conference) on how to use Graphics Processing Units for "general purpose" computation. Ashwin and I were specifically interested in using GPUs for Signal Processing (Part II of the presentation), and we ported some "standard" signal processing algorithms to the GPU and benchmarked against their &lt;a target="_blank" href="http://www.ni.com/labview"&gt;LabVIEW&lt;/a&gt; implementations.&lt;br /&gt;&lt;br /&gt;If you're interested &lt;a target="_blank" href="http://pramod.sub.googlepages.com/GPGPU.zip"&gt;the presentation is here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I think the best way to do this is to view the presentation on a dual monitor system with one monitor showing the slides so that you can see the animations and the other monitor set up so that you can read the notes as you go along. There are lots and lots of notes and a couple of hidden (and unfortunately incomplete) slides too, so it shouldn't be too hard to figure out what we're trying to say.&lt;br /&gt;&lt;br /&gt;And of course, you are most welcome to contact me with questions, comments and ideas at &lt;a href="mailto:pramod.subramanyan@gmail.com"&gt;pramod.subramanyan@gmail.com&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;What are you waiting for? &lt;a target="_blank" href="http://pramod.sub.googlepages.com/GPGPU.zip"&gt;Go click that link!&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-6136955622459827993?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2007/05/presentation-on-gpus-for-signal.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-18359540.post-2262006714285737127</guid><pubDate>Fri, 11 May 2007 22:17:00 +0000</pubDate><atom:updated>2007-06-13T10:25:45.626-07:00</atom:updated><title>Intel Performance Primitives and a Random Story</title><description>A colleague of mine recently found &lt;a href="http://www.intel.com/cd/software/products/asmo-na/eng/302910.htm"&gt;this&lt;/a&gt;. Looks like Intel have created optimized processor specific libraries for computationally intensive tasks.&lt;br /&gt;&lt;br /&gt;Cool.&lt;br /&gt;&lt;br /&gt;This reminds me of the last time I downloaded a library from Intel. &lt;a href="http://www.intel.com/cd/ids/developer/asmo-na/eng/209859.htm?page=1"&gt;This one&lt;/a&gt;. Let me tell you the whole story.&lt;br /&gt;&lt;br /&gt;It was about 4-5 months ago (probably Oct/Nov 2006), and I was trying to optimize a software FM-Demodulator. The demodulator had this tight loop - probably 4 or so computationally intensive statements inside and I wanted to find out which of those statements was the worst of the lot. So, because I am too lazy to download and install a profiler and figure out how to use it, I did the next best thing - I stuck a bunch of timing code around each statement using the Windows API QueryPerformanceXXX functions so I could measure how long each statement was taking to execute.&lt;br /&gt;&lt;br /&gt;In case you've never done this before, this is typically how its done:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;  LARGE_INTEGER freq, start, end;&lt;br /&gt;  QueryPerformanceCounter(&amp;amp;start);&lt;br /&gt;  {&lt;br /&gt;    &lt;font color="#0000ff"&gt;// do your stuff here;&lt;/font&gt;&lt;br /&gt;    &lt;font color="#0000ff"&gt;// and then some more&lt;/font&gt;&lt;br /&gt;    &lt;font color="#0000ff"&gt;// and more&lt;/font&gt;&lt;br /&gt;  }&lt;br /&gt;  QueryPerformanceCounter(&amp;amp;end);&lt;br /&gt;  QueryPerformanceFrequency(&amp;amp;freq);&lt;br /&gt;  &lt;font color="#2e8b57"&gt;&lt;b&gt;double&lt;/b&gt;&lt;/font&gt; time = (&lt;font color="#2e8b57"&gt;&lt;b&gt;double&lt;/b&gt;&lt;/font&gt;) (end.QuadPart - start.QuadPart)/freq.QuadPart;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So I had this timing code all around each statement inside the loop trying to figure out which statement was the worst of the lot. And then I ran it, and spit out the timing results. Found out that &lt;span style="font-style:italic;"&gt;all&lt;/span&gt; of the statements were taking almost exactly the same time to run! &lt;br /&gt;&lt;br /&gt;Huh? One statement was this complex arctan, and another was this simple FP subtraction, how can they both take the same time?&lt;br /&gt;&lt;br /&gt;Good question. It turns out that all the time was actually being spent inside the timing measurement functions! And because the time my code took to execute was so small, all I was measuring was how long it took for the QueryPerformanceCounter function to run! &lt;br /&gt;&lt;br /&gt;My guess is that some kind of hardware timer was being queried inside QueryPerformanceCounter, and since talking to hardware timer means I/O - and this in turn means all sorts of arbitration, interrupts and switching into kernel mode and a gazillion other overheads - moral of the story: QueryPeformanceCounter is slow!&lt;br /&gt;&lt;br /&gt;Good. I like the irony. A function that is supposed to help you make your code faster is itself slow.&lt;br /&gt;&lt;br /&gt;Anyway, once I'd figured this out, I was searching for other ways of making timing measurements. So I came across this &lt;a href="http://www.intel.com/cd/ids/developer/asmo-na/eng/209859.htm?page=1"&gt;enhanced timer thingy&lt;/a&gt; on Intel's website. So without further ado, I jumped to the last page and downloaded the damn thing and set about measuring my code timings with Intel's Enhanced Timer. &lt;br /&gt;&lt;br /&gt;Good stuff happened, I got exactly the same measurements.&lt;br /&gt;&lt;br /&gt;Then I dug a little bit deeper, by looking into the Enhanced Timer source files. Turns out the so-called Enhanced Timer was itself using QueryPerformanceCounter! Grrr! &lt;br /&gt;&lt;br /&gt;Anyway, the point of all this is:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;The so called Enhanced Timer is not enhanced at all. &lt;br /&gt;&lt;li&gt;Measuring code timings is only accurate for code that takes at least a couple of hundred milliseconds to run.&lt;br /&gt;&lt;li&gt;Don't ever trust a marketing flyer. :-)&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Update:&lt;/span&gt; I heard from Sastry that some NI customers have used IPP, and with excellent results. So I take back all the nasty things I said about Intel's software engineers. However, I still think Intel have a long way to go before they are up to speed on the software side of things.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18359540-2262006714285737127?l=phunday.blogspot.com'/&gt;&lt;/div&gt;</description><link>http://phunday.blogspot.com/2007/05/intel-performance-primitives-and-random.html</link><author>pramod.sub@gmail.com (Pramod)</author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></item></channel></rss>
