<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">

<channel>
	<title>Igor Ostrovsky Blogging</title>
	
	<link>http://igoro.com</link>
	<description>On programming, technology, and random things of interest</description>
	<lastBuildDate>Wed, 03 Mar 2010 10:05:00 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/igoro" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="igoro" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">igoro</feedburner:emailServiceId><feedburner:feedburnerHostname xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>Volatile keyword in C# – memory model explained</title>
		<link>http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/</link>
		<comments>http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/#comments</comments>
		<pubDate>Tue, 23 Feb 2010 09:57:22 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://igoro.com/?p=468</guid>
		<description><![CDATA[
div.mynote { border: black 1px solid; padding: 10px; background-color: #f0f0ff; margin-top: 20px; margin-bottom: 20px; margin-left: 10px; >
The memory model is a fascinating topic – it touches on hardware, concurrency, compiler optimizations, and even math.
The memory model defines what state a thread may see when it reads a memory location modified by other threads. For example, [...]]]></description>
			<content:encoded><![CDATA[<style>
div.mynote { border: black 1px solid; padding: 10px; background-color: #f0f0ff; margin-top: 20px; margin-bottom: 20px; margin-left: 10px; ></style>
<p>The <strong>memory model</strong> is a fascinating topic – it touches on hardware, concurrency, compiler optimizations, and even math.</p>
<p>The memory model defines what state a thread may see when it reads a memory location modified by other threads. For example, if one thread updates a regular non-volatile field, it is possible that another thread reading the field will <strong>never </strong>observe the new value. This program never terminates (in a release build):</p>
<pre class="code"><span style="color: blue">class </span><span style="color: #2b91af">Test
</span>{
    <span style="color: blue">private </span><span style="color: blue">bool </span>_loop = <span style="color: blue">true</span>;

    <span style="color: blue">public static void </span>Main()
    {
        <span style="color: #2b91af">Test </span>test1 = <span style="color: blue">new </span><span style="color: #2b91af">Test</span>();

        <span style="color: green">// Set _loop to false on another thread
        </span><span style="color: blue">new </span><span style="color: #2b91af">Thread</span>(() =&gt; { test1._loop = <span style="color: blue">false</span>;}).Start();

        <span style="color: green">// Poll the _loop field until it is set to false
        </span><span style="color: blue">while </span>(test1._loop == <span style="color: blue">true</span>) ;

        <span style="color: green">// The loop above will never terminate!</span>
    }
}</pre>
<p>There are two possible ways to get the while loop to terminate:</p>
<ol>
<li><strong>Use a lock </strong>to protect all accesses (reads and writes) to the _loop field </li>
<li><strong>Mark the _loop field as volatile</strong> </li>
</ol>
<p>There are two reasons why a read of a non-volatile field may observe a stale value: <strong>compiler optimizations</strong> and <strong>processor optimizations</strong>.</p>
<div class="mynote">
<p>In concurrent programming, threads can get interleaved in many different ways, resulting in possibly many different outcomes. But as the example with the infinite loop shows, threads do not just get interleaved – they potentially interact in more complex ways, unless you correctly use locks and volatile fields.</p>
</div>
<h3>Compiler optimizations</h3>
<p>The first reason why a non-volatile read may return a stale value has to do with compiler optimizations. In the infinite loop example, the JIT compiler optimizes the while loop from this:</p>
<pre class="code"><span style="color: blue">while </span>(test1._loop == <span style="color: blue">true</span>) ;</pre>
<p>To this:</p>
<pre class="code"><span style="color: blue">if </span>(test1._loop) { <span style="color: blue">while </span>(<span style="color: blue">true</span>); }</pre>
<p>This is an entirely reasonable transformation if only one thread accesses the _loop field. But, if another thread changes the value of the field, this optimization can prevent the reading thread from noticing the updated value.</p>
<p>If you mark the _loop field as volatile, the compiler will not hoist the read out of the loop. The compiler will know that other threads may be modifying the field, and so it will be careful to avoid optimizations that would result in a read of a stale value.</p>
<div class="mynote">
<p>The code transformation I showed is a close approximation of the optimization done by the CLR JIT compiler, but not completely exact.</p>
<p>The full story is that the assembly code emitted by the JIT compiler will store the value test1._loop in the EAX register. The loop condition will keep polling the register, and will read test1._loop from memory again. Even when the thread is pre-empted, the CPU registers get saved. Once the thread is again scheduled to run, the same stale EAX register value will be restored, and the loop never terminates.</p>
<p>The assembly code generated by the while loop looks as follows:</p>
<pre>00000068  test        eax,eax
0000006a  jne         00000068</pre>
<p>If you make the _loop field volatile, this code is generated instead:</p>
<pre>00000064  cmp         byte ptr [eax+4],0
00000068  jne         00000064</pre>
<p>If the _loop field is not volatile, the compiler will store _loop in the EAX register. If _loop is volatile, the compiler will instead keep the test1 variable in EAX, and the value of _loop will be re-fetched from memory on each access (by “ptr [eax+4]”).</p>
</div>
<div class="mynote">
<p>From my experience playing around with the current version of the CLR, I get the impression that these kinds of compiler optimizations are not terribly frequent. On x86 and x64, often the same assembly code will be generated regardless of whether a field is volatile or not. On IA64, the situation is a bit different &#8211; see the next section.</p>
</div>
<h3>Processor optimizations</h3>
<p>On some processors, not only must the compiler avoid certain optimizations on volatile reads and writes, it also has to use special instructions. On a multi-core machine, different cores have different caches. The processors may not bother to keep those caches coherent by default, and special instructions may be needed to flush and refresh the caches.</p>
<p>The mainstream <strong>x86 </strong>and <strong>x64 </strong>processors implement a strong memory model where memory access is effectively volatile. So, a volatile field forces the compiler to avoid some high-level optimizations like hoisting a read out of a loop, but otherwise results in the same assembly code as a non-volatile read.</p>
<p>The <strong>Itanium </strong>processor implements a weaker memory model. To target Itanium, the JIT compiler has to use special instructions for volatile memory accesses: <strong>LD.ACQ</strong> and <strong>ST.REL</strong>, instead of LD and ST. Instruction LD.ACQ effectively says, “refresh my cache and then read a value” and ST.REL says, “write a value to my cache and then flush the cache to main memory”. LD and ST on the other hand may just access the processor’s cache, which is not visible to other processors.</p>
<div class="mynote">
<p>For the reasons explained in this section and the previous sections, marking a field as volatile will often incur <strong>zero </strong>performance penalty on x86 and x64.</p>
</div>
<div class="mynote">
<p>The x86/x64 instruction set actually does contains three fence instructions: LFENCE, SFENCE, and MFENCE. LFENCE and SFENCE are apparently not needed on the current architecture, but MFENCE is useful to go around one particular issue: if a core reads a memory location it previously wrote, the read may be served from the store buffer, even though the write has not yet been written to memory. <a href="http://software.intel.com/en-us/forums/showthread.php?t=47577">[Source]</a> I don&#8217;t actually know whether the CLR JIT ever inserts MFENCE instructions.</p>
</div>
<h3>Volatile accesses in more depth</h3>
<p>To understand how volatile and non-volatile memory accesses work, you can imagine each thread as having its own cache. Consider a simple example with a <strong>non-volatile </strong>memory location (i.e. a field) <strong>u</strong>, and a <strong>volatile </strong>memory location <strong>v</strong>.</p>
<p>&#160;<img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/03/image.png" width="346" height="295" /></p>
<p>A <strong>non-volatile write</strong> could just update the value in the thread’s cache, and not the value in main memory:</p>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/03/image1.png" width="346" height="295" />&#160; </p>
<p>However, in C# <strong>all writes are volatile</strong> (unlike say in Java), regardless of whether you write to a volatile or a non-volatile field. So, the above situation actually never happens in C#.</p>
<p>A <strong>volatile write </strong>updates the thread’s cache, and then flushes the entire cache to main memory. If we were to now set the volatile field v to 11, both values u and v would get flushed to main memory:</p>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/03/image2.png" width="346" height="295" /></p>
<p>Since all C# writes are volatile, you can think of all writes as going straight to main memory.</p>
<p>A regular, <strong>non-volatile read</strong> can read the value from the thread’s cache, rather than from main memory. Despite the fact that thread 1 set u to 11, when thread 2 reads u, it will still see value 10:</p>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/03/image3.png" width="346" height="295" /></p>
<p>When you read a non-volatile field in C#, a non-volatile read occurs, and you may see a stale value from the thread’s cache. Or, you may see the updated value. Whether you see the old or the new value depends on your compiler and your processor.</p>
<p>Finally, let’s take a look at an example of a <strong>volatile read</strong>. Thread 2 will read the volatile field v:</p>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/03/image4.png" width="346" height="294" />&#160;</p>
<p>Before the volatile read, thread 2 refreshes its <strong>entire cache</strong>, and then reads the updated value of v: 11. So, it will observe the value&#160; that is really in main memory, and also refresh its cache as a bonus.</p>
<p>Note that the thread caches that I described are <strong>imaginary </strong>– there really is no such thing as a thread cache. Threads only appear to have these caches as an artifact of compiler and processor optimizations.</p>
<div class="mynote">
<p>One interesting point is that all writes in C# are volatile according to the memory model as documented <a href="http://msdn.microsoft.com/en-us/magazine/cc163715.aspx">here</a> and <a href="http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx">here</a>, and are also presumably implemented as such. The ECMA specification of the C# language actually defines a weaker model where writes are not volatile by default.</p>
</div>
<div class="mynote">
<p>You may find it surprising that a volatile read refreshes the <strong>entire </strong>cache, not just the&#160; read value. Similarly, a volatile write (i.e., every C# write) flushes the <strong>entire </strong>cache, not just the written value. These semantics are sometimes referred to as “strong volatile semantics”.</p>
<p>The original Java memory model designed in 1995 was based on weak volatile semantics, but was changed in 2004 to strong volatile. The weak volatile model is very inconvenient. One example of the problem is that the “safe publication” pattern is not safe. Consider this example:</p>
<pre class="code"><span style="color: blue">volatile string</span>[] _args = <span style="color: blue">null</span>;</pre>
<pre class="code"><span style="color: blue">public void </span>Write() {
    <span style="color: blue">string</span>[] a = <span style="color: blue">new string</span>[2];
    a[0] = <span style="color: #a31515">&quot;arg1&quot;</span>;
    a[1] = <span style="color: #a31515">&quot;arg2&quot;</span>;
    _args = a;
    ...
}

<span style="color: blue">public void </span>Read() {
    <span style="color: blue">if </span>(_args != <span style="color: blue">null</span>) {
        <span style="color: green">// Under weak volatile semantics, this assert could fail!</span>
        <span style="color: #2b91af">Debug</span>.Assert(_args[0] != <span style="color: blue">null</span>);
    }
}</pre>
<p><a href="http://11011.net/software/vspaste"></a><font face="Courier New"></font></p>
<p>Under strong volatile semantics (i.e., the .NET and C# volatile semantics), a non-null value in the _args field guarantees that the elements of _args are also not null. The safe publication pattern is very useful and commonly used in practice.</p>
</div>
<h3>Memory model and .NET operations</h3>
<p>Here is a table of how various .NET operations interact with the imaginary thread cache:</p>
<table border="1" cellspacing="0" cellpadding="2" width="689">
<tbody>
<tr>
<td valign="top" width="160"><strong>Construct</strong></td>
<td valign="top" width="120"><strong>Refreshes thread cache before?</strong></td>
<td valign="top" width="104"><strong>Flushes thread cache after?</strong></td>
<td valign="top" width="303"><strong>Notes</strong></td>
</tr>
<tr>
<td valign="top" width="160">Ordinary read</td>
<td valign="top" width="120">No</td>
<td valign="top" width="104">No</td>
<td valign="top" width="303">Read of a non-volatile field</td>
</tr>
<tr>
<td valign="top" width="160">Ordinary write</td>
<td valign="top" width="120">No</td>
<td valign="top" width="104"><strong>Yes</strong></td>
<td valign="top" width="303">Write of a non-volatile field</td>
</tr>
<tr>
<td valign="top" width="160">Volatile read</td>
<td valign="top" width="120"><strong>Yes</strong></td>
<td valign="top" width="104">No</td>
<td valign="top" width="303">Read of volatile field, or Thread.VolatileRead</td>
</tr>
<tr>
<td valign="top" width="160">Volatile write</td>
<td valign="top" width="120">No</td>
<td valign="top" width="104"><strong>Yes</strong></td>
<td valign="top" width="303">Write of a volatile field – same as non-volatile</td>
</tr>
<tr>
<td valign="top" width="160"><a href="http://msdn.microsoft.com/en-us/library/system.threading.thread.memorybarrier.aspx">Thread.MemoryBarrier</a></td>
<td valign="top" width="120"><strong>Yes</strong></td>
<td valign="top" width="104"><strong>Yes</strong></td>
<td valign="top" width="303">Special memory barrier method</td>
</tr>
<tr>
<td valign="top" width="160"><a href="http://msdn.microsoft.com/en-us/library/system.threading.interlocked.aspx">Interlocked</a> operations</td>
<td valign="top" width="120"><strong>Yes</strong></td>
<td valign="top" width="104"><strong>Yes</strong></td>
<td valign="top" width="303">Increment, Add, Exchange, etc.</td>
</tr>
<tr>
<td valign="top" width="160">Lock acquire</td>
<td valign="top" width="120"><strong>Yes</strong></td>
<td valign="top" width="104">No</td>
<td valign="top" width="303"><a href="http://msdn.microsoft.com/en-us/library/de0542zz.aspx">Monitor.Enter</a> or entering a lock {}&#160; region</td>
</tr>
<tr>
<td valign="top" width="160">Lock release</td>
<td valign="top" width="120">No</td>
<td valign="top" width="104"><strong>Yes</strong></td>
<td valign="top" width="303"><a href="http://msdn.microsoft.com/en-us/library/system.threading.monitor.exit.aspx">Monitor.Exit</a> or exiting a lock {} region</td>
</tr>
</tbody>
</table>
<p>&#160;</p>
<p>For each operation, the table shows two things:</p>
<ul>
<li>Is the entire imaginary thread cache <strong>refreshed </strong>from main memory <strong>before </strong>the operation? </li>
<li>Is the entire imaginary thread cache <strong>flushed </strong>to main memory <strong>after </strong>the operation? </li>
</ul>
<h3>Disclaimer and limitations of the model</h3>
<p>This blog post reflects my personal understanding of the .NET memory model, and is based purely on publicly available information.</p>
<p>I find the explanation based on imaginary thread caches more intuitive than the more commonly used explanation based on operation reordering. The thread cache model is also accurate for most intents and purposes.</p>
<p>To be even more accurate, you should assume that the thread caches can form an arbitrary large hierarchy, and so you cannot assume that a read is served only from two possible places – main memory or the thread’s cache. I think that you would have to construct a somewhat of a clever case in order for the cache hierarchy to make a difference, though. If anyone is aware of a case where the hierarchical thread cache model makes a prediction different from the reordering-based model, I would love to hear about it.</p>
<p>If you are interested in the .NET memory model, I encourage you to read <a href="http://msdn.microsoft.com/en-us/magazine/cc163715.aspx">Understand the Impact of Low-Lock Techniques in Multithreaded Apps</a> in the MSDN Magazine, and the <a href="http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx">Memory model</a> blog post from Chris Brumme.</p>
<div class="mynote">
<p>Read more of my articles:</p>
<p><a href="http://igoro.com/archive/gallery-of-processor-cache-effects/">Gallery of processor cache effects</a></p>
<p><a href="http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/">What really happens when you navigate to a URL</a></p>
<p><a href="http://igoro.com/archive/human-heart-is-a-turing-machine-research-on-xbox-360-shows-wait-what/">Human heart is a Turing machine, research on XBox 360 shows. Wait, what?</a></p>
<p><a href="http://igoro.com/archive/7-tricks-to-simplify-your-programs-with-linq/">7 tricks to simplify your programs with LINQ</a></p>
<p>And if you like my blog, <a href="http://igoro.com/feed/">subscribe</a>!</p>
</div>
<img src="http://feeds.feedburner.com/~r/igoro/~4/bSxMouaL-Rs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>What really happens when you navigate to a URL</title>
		<link>http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/</link>
		<comments>http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/#comments</comments>
		<pubDate>Tue, 09 Feb 2010 08:14:26 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://igoro.com/?p=422</guid>
		<description><![CDATA[div.mynote { border: black 1px solid; padding: 10px; background-color: #f0f0ff; margin-top: 20px; margin-bottom: 20px; margin-left: 10px; >
As a software developer, you certainly have a high-level picture of how web apps work and what kinds of technologies are involved: the browser, HTTP, HTML, web server, request handlers, and so on.
In this article, we will take a [...]]]></description>
			<content:encoded><![CDATA[<style>div.mynote { border: black 1px solid; padding: 10px; background-color: #f0f0ff; margin-top: 20px; margin-bottom: 20px; margin-left: 10px; ></style>
<p>As a software developer, you certainly have a high-level picture of how web apps work and what kinds of technologies are involved: the browser, HTTP, HTML, web server, request handlers, and so on.</p>
<p>In this article, we will take a deeper look at the sequence of events that take place when you visit a URL.</p>
<h3>1. You enter a URL into the browser</h3>
<p>It all starts here:</p>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/image4.png" width="591" height="103" /> </p>
<p><strong></strong></p>
<h3>2. The browser looks up the IP address for the domain name</h3>
<p>&#160;<img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/image13.png" width="228" height="96" /> </p>
<p>The first step in the navigation is to figure out the IP address for the visited domain. The DNS lookup proceeds as follows:</p>
<p> <span id="more-422"></span>
<ul>
<li><strong>Browser cache – </strong>The browser caches DNS records for some time. Interestingly, the OS does not tell the browser the time-to-live for each DNS record, and so the browser caches them for a fixed duration (varies between browsers, 2 – 30 minutes). </li>
<li><strong>OS cache</strong> – If the browser cache does not contain the desired record, the browser makes a system call (gethostbyname in Windows). The OS has its own cache. </li>
<li><strong>Router cache</strong> – The request continues on to your router, which typically has its own DNS cache. </li>
<li><strong>ISP DNS cache</strong> – The next place checked is the cache ISP’s DNS server. With a cache, naturally. </li>
<li><strong>Recursive search</strong> – Your ISP’s DNS server begins a recursive search, from the root nameserver, through the .com top-level nameserver, to Facebook’s nameserver. Normally, the DNS server will have names of the .com nameservers in cache, and so a hit to the root nameserver will not be necessary. </li>
</ul>
<p>Here is a diagram of what a recursive DNS search looks like:</p>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="500px-An_example_of_theoretical_DNS_recursion_svg" border="0" alt="500px-An_example_of_theoretical_DNS_recursion_svg" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/500pxAn_example_of_theoretical_DNS_recursion_svg.png" width="500" height="178" />&#160;</p>
<div class="mynote">
<p>One worrying thing about DNS is that the entire domain like wikipedia.org or facebook.com seems to map to a single IP address. Fortunately, there are ways of mitigating the bottleneck:</p>
<ul>
<li><strong>Round-robin DNS</strong> is a solution where the DNS lookup returns multiple IP addresses, rather than just one. For example, facebook.com actually maps to four IP addresses. </li>
<li><strong>Load-balancer</strong> is the piece of hardware that listens on a particular IP address and forwards the requests to other servers. Major sites will typically use expensive high-performance load balancers. </li>
<li><strong>Geographic DNS </strong>improves scalability by mapping a domain name to different IP addresses, depending on the client’s geographic location. This is great for hosting static content so that different servers don’t have to update shared state. </li>
<li><strong>Anycast</strong> is a routing technique where a single IP address maps to multiple physical servers. Unfortunately, anycast does not fit well with TCP and is rarely used in that scenario. </li>
</ul></div>
<div class="mynote">
<p>Most of the DNS servers themselves use anycast to achieve high availability and low latency <strong>of the DNS lookups</strong>.</p>
</p></div>
<h3>3. The browser sends a HTTP request to the web server</h3>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; margin-left: 0px; border-left-width: 0px; margin-right: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/image22.png" width="216" height="95" /> </p>
<p>You can be pretty sure that Facebook’s homepage will not be served from the browser cache because dynamic pages expire either very quickly or immediately (expiry date set to past).</p>
<p>So, the browser will send this request to the Facebook server:</p>
<pre class="code">GET http://facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, <font style="color: lightblue">[...]</font>
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; <font style="color: lightblue">[...]</font>
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: facebook.com
Cookie: datr=1265876274-<font style="color: lightblue">[...]</font>; locale=en_US; lsd=WW<font style="color: lightblue">[...]</font>; c_user=2101<font style="color: lightblue">[...]</font></pre>
<p>The GET request names the <strong>URL</strong> to fetch<strong>:</strong> “http://facebook.com/”. The browser identifies itself (<strong>User-Agent</strong> header), and states what types of responses it will accept (<strong>Accept</strong> and <strong>Accept-Encoding</strong> headers). The <strong>Connection</strong> header asks the server to keep the TCP connection open for further requests.</p>
<p>The request also contains the <strong>cookies</strong> that the browser has for this domain. As you probably already know, cookies are key-value pairs that track the state of a web site in between different page requests. And so the cookies store the name of the logged-in user, a secret number that was assigned to the user by the server, some of user’s settings, etc. The cookies will be stored in a text file on the client, and sent to the server with every request.</p>
<div class="mynote">
<p>There is a variety of tools that let you view the raw HTTP requests and corresponding responses. My favorite tool for viewing the raw HTTP traffic is <a href="http://www.fiddler2.com/fiddler2/">fiddler</a>, but there are many other tools (e.g., FireBug) These tools are a great help when optimizing a site.</p>
</div>
<div class="mynote">
<p>In addition to GET requests, another type of requests that you may be familiar with is a POST request, typically used to submit forms. A GET request sends its parameters via the URL (e.g.: http://robozzle.com/puzzle.aspx<strong>?id=85</strong>). A POST request sends its parameters in the request body, just under the headers.</p>
</div>
<div class="mynote">
<p>The trailing slash in the URL “http://facebook.com/” is important. In this case, the browser can safely add the slash. For URLs of the form http://example.com/folderOrFile, the browser cannot automatically add a slash, because it is not clear whether folderOrFile is a folder or a file. In such cases, the browser will visit the URL without the slash, and the server will respond with a redirect, resulting in an unnecessary roundtrip.</p>
</div>
<h3>4. The facebook server responds with a permanent redirect</h3>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/image8.png" width="214" height="87" /> </p>
<p>This is the response that the Facebook server sent back to the browser request:</p>
<pre class="code">HTTP/1.1 301 Moved Permanently
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
      pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Location: http://www.facebook.com/
P3P: CP=&quot;DSP LAW&quot;
Pragma: no-cache
Set-Cookie: made_write_conn=deleted; expires=Thu, 12-Feb-2009 05:09:50 GMT;
      path=/; domain=.facebook.com; httponly
Content-Type: text/html; charset=utf-8
X-Cnection: close
Date: Fri, 12 Feb 2010 05:09:51 GMT
Content-Length: 0</pre>
<p>The server responded with a 301 Moved Permanently response to tell the browser to go to “http://www.facebook.com/” instead of “http://facebook.com/”. </p>
<div class="mynote">
<p>There are interesting reasons why the server insists on the redirect instead of immediately responding with the web page that the user wants to see.</p>
<p>One reason has to do with <strong>search engine rankings</strong>. See, if there are two URLs for the same page, say <a href="http://www.igoro.com/">http://www.igoro.com/</a> and <a href="http://igoro.com/">http://igoro.com/</a>, search engine may consider them to be two different sites, each with fewer incoming links and thus a lower ranking. Search engines understand permanent redirects (301), and will combine the incoming links from both sources into a single ranking.</p>
<p>Also, multiple URLs for the same content are not <strong>cache-friendly</strong>. When a piece of content has multiple names, it will potentially appear multiple times in caches.</p>
</div>
<h3>5. The browser follows the redirect</h3>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/image23.png" width="216" height="95" /></p>
<p>The browser now knows that “http://www.facebook.com/” is the correct URL to go to, and so it sends out another GET request:</p>
<pre class="code">GET http://www.facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, <font style="color: lightblue">[...]</font>
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; <font style="color: lightblue">[...]</font>
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Cookie: lsd=XW<font style="color: lightblue">[...]</font>; c_user=21<font style="color: lightblue">[...]</font>; x-referer=<font style="color: lightblue">[...]</font>
Host: www.facebook.com</pre>
<p>The meaning of the headers is the same as for the first request.</p>
<h3>6. The server ‘handles’ the request</h3>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/image9.png" width="93" height="78" /> </p>
<p>The server will receive the GET request, process it, and send back a response.</p>
<p>This may seem like a straightforward task, but in fact there is a lot of interesting stuff that happens here – even on a simple site like my blog, let alone on a massively scalable site like facebook.</p>
<ul>
<li><strong>Web server software<br /></strong>The web server software (e.g., IIS or Apache) receives the HTTP request and decides which request handler should be executed to handle this request. A request handler is a program (in ASP.NET, PHP, Ruby, …) that reads the request and generates the HTML for the response.
<p>In the simplest case, the request handlers can be stored in a file hierarchy whose structure mirrors the URL structure, and so for example <a href="http://example.com/folder1/page1.aspx">http://example.com/folder1/page1.aspx</a> URL will map to file /httpdocs/folder1/page1.aspx. The web server software can also be configured so that URLs are manually mapped to request handlers, and so the public URL of page1.aspx could be <a href="http://example.com/folder1/page1">http://example.com/folder1/page1</a>. </p>
</li>
<li><strong>Request handler<br /></strong>The request handler reads the request, its parameters, and cookies. It will read and possibly update some data stored on the server. Then, the request handler will generate a HTML response. </li>
</ul>
<div class="mynote">
<p>One interesting difficulty that every dynamic website faces is how to store data. Smaller sites will often have a single SQL database to store their data, but sites that store a large amount of data and/or have many visitors have to find a way to split the database across multiple machines. Solutions include sharding (splitting up a table across multiple databases based on the primary key), replication, and usage of simplified databases with weakened consistency semantics. </p>
</div>
<div class="mynote">
<p>One technique to keep data updates cheap is to defer some of the work to a batch job. For example, Facebook has to update the newsfeed in a timely fashion, but the data backing the “People you may know” feature may only need to be updated nightly (my guess, I don’t actually know how they implement this feature). Batch job updates result in staleness of some less important data, but can make data updates much faster and simpler.</p>
</div>
<h3>7. The server sends back a HTML response</h3>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/image10.png" width="214" height="87" /> </p>
<p>Here is the response that the server generated and sent back:</p>
<pre class="code">HTTP/1.1 200 OK
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
    pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
P3P: CP=&quot;DSP LAW&quot;
Pragma: no-cache
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
X-Cnection: close
Transfer-Encoding: chunked
Date: Fri, 12 Feb 2010 09:05:55 GMT

2b3��������T�n�@����<font style="color: lightblue">[...]</font></pre>
<p>The entire response is 36 kB, the bulk of them in the byte blob at the end that I trimmed.</p>
<p>The <strong>Content-Encoding</strong> header tells the browser that the response body is compressed using the gzip algorithm. After decompressing the blob, you’ll see the HTML you’d expect:</p>
<pre>&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0 Strict//EN&quot;&#160;&#160;&#160;
      &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd&quot;&gt;
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot; xml:lang=&quot;en&quot;
      lang=&quot;en&quot; id=&quot;facebook&quot; class=&quot; no_js&quot;&gt;
&lt;head&gt;
&lt;meta http-equiv=&quot;Content-type&quot; content=&quot;text/html; charset=utf-8&quot; /&gt;
&lt;meta http-equiv=&quot;Content-language&quot; content=&quot;en&quot; /&gt;
...</pre>
<p>In addition to compression, headers specify whether and how to cache the page, any cookies to set (none in this response), privacy information, etc.</p>
<div class="mynote">
<p>Notice the header that sets <strong>Content-Type</strong> to <strong>text/html</strong>. The header instructs the browser to render the response content as HTML, instead of say downloading it as a file. The browser will use the header to decide how to interpret the response, but will consider other factors as well, such as the extension of the URL.</p>
</div>
<h3>8. The browser begins rendering the HTML</h3>
<p>Even before the browser has received the entire HTML document, it begins rendering the website:</p>
<p>&#160;<img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/image6.png" width="244" height="137" /> </p>
<h3>9. The browser sends requests for objects embedded in HTML</h3>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/image11.png" width="214" height="126" /> </p>
<p>As the browser renders the HTML, it will notice tags that require fetching of other URLs. The browser will send a GET request to retrieve each of these files.</p>
<p>Here are a few URLs that my visit to facebook.com retrieved:</p>
<ul>
<li><strong>Images<br /></strong>http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif<br />http://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif<br />… </li>
<li><strong>CSS style sheets<br /></strong>http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css<br />http://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css<br />… </li>
<li><strong>JavaScript files</strong><br />http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js<br />http://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js<br />…</li>
</ul>
<p>Each of these URLs will go through process a similar to what the HTML page went through. So, the browser will look up the domain name in DNS, send a request to the URL, follow redirects, etc.</p>
<p>However, static files &#8211; unlike dynamic pages &#8211; allow the browser to cache them. Some of the files may be served up from cache, without contacting the server at all. The browser knows how long to cache a particular file because the response that returned the file contained an Expires header. Additionally, each response may also contain an ETag header that works like a version number – if the browser sees an ETag for a version of the file it already has, it can stop the transfer immediately.</p>
<div class="mynote">
<p>Can you guess what <strong>“fbcdn.net”</strong> in the URLs stands for? A safe bet is that it means “Facebook content delivery network”. Facebook uses a content delivery network (CDN) to distribute static content – images, style sheets, and JavaScript files. So, the files will be copied to many machines across the globe.</p>
<p>Static content often represents the bulk of the bandwidth of a site, and can be easily replicated across a CDN. Often, sites will use a third-party CDN provider, instead of operating a CND themselves. For example, Facebook’s static files are hosted by Akamai, the largest CDN provider.</p>
<p>As a demonstration, when you try to ping static.ak.fbcdn.net, you will get a response from an akamai.net server. Also, interestingly, if you ping the URL a couple of times, may get responses from different servers, which demonstrates the load-balancing that happens behind the scenes.</p>
</div>
<h3>10. The browser sends further asynchronous (AJAX) requests</h3>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/image12.png" width="214" height="120" /> </p>
<p>In the spirit of Web 2.0, the client continues to communicate with the server even after the page is rendered.</p>
<p>For example, Facebook chat will continue to update the list of your logged in friends as they come and go. To update the list of your logged-in friends, the JavaScript executing in your browser has to send an asynchronous request to the server. The asynchronous request is a programmatically constructed GET or POST request that goes to a special URL. In the Facebook example, the client sends a POST request to http://www.facebook.com/ajax/chat/buddy_list.php to fetch the list of your friends who are online.</p>
<p>This pattern is sometimes referred to as “AJAX”, which stands for “Asynchronous JavaScript And XML”, even though there is no particular reason why the server has to format the response as XML. For example, Facebook returns snippets of JavaScript code in response to asynchronous requests.</p>
<div class="mynote">
<p>Among other things, the fiddler tool lets you view the asynchronous requests sent by your browser. In fact, not only you can observe the requests passively, but you can also modify and resend them. The fact that it is this easy to “spoof” AJAX requests causes a lot of grief to developers of online games with scoreboards. (Obviously, please don’t cheat that way.)</p>
</div>
<div class="mynote">
<p>Facebook chat provides an example of an interesting problem with AJAX: pushing data from server to client. Since HTTP is a request-response protocol, the chat server cannot push new messages to the client. Instead, the client has to poll the server every few seconds to see if any new messages arrived.</p>
<p><a href="http://en.wikipedia.org/wiki/Push_technology#Long_polling">Long polling</a> is an interesting technique to decrease the load on the server in these types of scenarios. If the server does not have any new messages when polled, it simply does not send a response back. And, if a message for this client is received within the timeout period, the server will find the outstanding request and return the message with the response.</p>
</div>
<h3>Conclusion</h3>
<p>Hopefully this gives you a better idea of how the different web pieces work together.</p>
<div class="mynote">
<p>Read more of my articles:</p>
<p><a href="http://igoro.com/archive/gallery-of-processor-cache-effects/">Gallery of processor cache effects</a></p>
<p><a href="http://igoro.com/archive/human-heart-is-a-turing-machine-research-on-xbox-360-shows-wait-what/">Human heart is a Turing machine, research on XBox 360 shows. Wait, what?</a></p>
<p><a href="http://igoro.com/archive/self-printing-game-of-life-in-c/">Self-printing Game of Life in C#</a></p>
<p><a href="http://igoro.com/archive/skip-lists-are-fascinating/">Skip lists are fascinating!</a></p>
<p>And if you like my blog, <a href="http://igoro.com/feed/">subscribe</a>!</p>
</div>
<img src="http://feeds.feedburner.com/~r/igoro/~4/4Sh82ReaibE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/feed/</wfw:commentRss>
		<slash:comments>67</slash:comments>
		</item>
		<item>
		<title>Gallery of Processor Cache Effects</title>
		<link>http://igoro.com/archive/gallery-of-processor-cache-effects/</link>
		<comments>http://igoro.com/archive/gallery-of-processor-cache-effects/#comments</comments>
		<pubDate>Tue, 19 Jan 2010 10:28:11 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://igoro.com/?p=366</guid>
		<description><![CDATA[Most of my readers will understand that cache is a fast but small type of memory that stores recently accessed memory locations.  This description is reasonably accurate, but the &#8220;boring&#8221; details of how processor caches work can help a lot when trying to understand program performance.
In this blog post, I will use code samples to [...]]]></description>
			<content:encoded><![CDATA[<p>Most of my readers will understand that cache is a fast but small type of memory that stores recently accessed memory locations.  This description is reasonably accurate, but the &#8220;boring&#8221; details of how processor caches work can help a lot when trying to understand program performance.</p>
<p>In this blog post, I will use code samples to illustrate various aspects of how caches work, and what is the impact on the performance of real-world programs.</p>
<p>The examples are in C#, but the language choice has little impact on the performance scores and the conclusions they lead to.</p>
<h3>Example 1: Memory accesses and performance</h3>
<p>How much faster do you expect Loop 2 to run, compared Loop 1?</p>
<pre class="code"><span style="color: blue;">int</span>[] arr = <span style="color: blue;">new int</span>[64 * 1024 * 1024];
<span id="more-366"></span>
<span style="color: green;">// Loop 1</span>
<span style="color: blue;">for </span>(<span style="color: blue;">int </span>i = 0; i &lt; arr.Length; i++) arr[i] *= 3;

<span style="color: green;">// Loop 2</span>
<span style="color: blue;">for </span>(<span style="color: blue;">int </span>i = 0; i &lt; arr.Length; i += 16) arr[i] *= 3;</pre>
<p><!--More--></p>
<p>The first loop multiplies every value in the array by 3, and the second loop multiplies only every 16-th. The second loop only does about <strong>6% of the work</strong> of the first loop, but on modern machines, the two for-loops take about the same time<strong>:</strong> <strong>80</strong> and <strong>78</strong> <strong>ms</strong> respectively on my machine.</p>
<p>The reason why the loops take the same amount of time has to do with memory. The running time of these loops is dominated by the memory accesses to the array, not by the integer multiplications. And, as I’ll explain on Example 2, the hardware will perform the same main memory accesses for the two loops.</p>
<h3>Example 2: Impact of cache lines</h3>
<p>Let’s explore this example deeper. We will try other step values, not just 1 and 16:</p>
<pre class="code"><span style="color: blue;">for </span>(<span style="color: blue;">int </span>i = 0; i &lt; arr.Length; i += K) arr[i] *= 3;</pre>
<p>Here are the running times of this loop for different step values (K):</p>
<p><img style="display: inline; border-width: 0px;" title="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/01/image6.png" border="0" alt="image" width="483" height="291" /></p>
<p>Notice that while step is in the range from 1 to 16, the running time of the for-loop hardly changes. But from 16 onwards, the running time is halved each time we double the step.</p>
<p>The reason behind this is that today’s CPUs do not access memory byte by byte. Instead, they fetch memory in chunks of (typically) 64 bytes, called <em>cache lines</em>. When you read a particular memory location, the entire cache line is fetched from the main memory into the cache. And, accessing other values from the same cache line is cheap!</p>
<p>Since 16 ints take up 64 bytes (one cache line), for-loops with a step between 1 and 16 have to touch the same number of cache lines: all of the cache lines in the array. But once the step is 32, we’ll only touch roughly every other cache line, and once it is 64, only every fourth.</p>
<p>Understanding of cache lines can be important for certain types of program optimizations. For example, alignment of data may determine whether an operation touches one or two cache lines. As we saw in the example above, this can easily mean that in the misaligned case, the operation will be twice slower.</p>
<h3>Example 3: L1 and L2 cache sizes</h3>
<p>Today’s computers come with two or three levels of caches, usually called L1, L2 and possibly L3. If you want to know the sizes of the different caches, you can use the <a href="http://technet.microsoft.com/en-us/sysinternals/cc835722.aspx">CoreInfo</a> SysInternals tool, or use the <a href="http://msdn.microsoft.com/en-us/library/ms683194(VS.85).aspx">GetLogicalProcessorInfo</a> Windows API call. Both methods will also tell you the cache line sizes, in addition to the cache sizes.</p>
<p>On my machine, CoreInfo reports that I have a 32kB L1 data cache, a 32kB L1 instruction cache, and a 4MB L2 data cache. The L1 caches are per-core, and the L2 caches are shared between pairs of cores:</p>
<pre class="code">Logical Processor to Cache Map:
*---  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
*---  Instruction Cache   0, Level 1,   32 KB, Assoc   8, LineSize  64
-*--  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
-*--  Instruction Cache   1, Level 1,   32 KB, Assoc   8, LineSize  64
**--  Unified Cache       0, Level 2,    4 MB, Assoc  16, LineSize  64
--*-  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
--*-  Instruction Cache   2, Level 1,   32 KB, Assoc   8, LineSize  64
---*  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
---*  Instruction Cache   3, Level 1,   32 KB, Assoc   8, LineSize  64
--**  Unified Cache       1, Level 2,    4 MB, Assoc  16, LineSize  64</pre>
<p>Let’s verify these numbers by an experiment. To do that, we’ll step over an array incrementing every 16th integer – a cheap way to modify every cache line. When we reach the last value, we loop back to the beginning. We’ll experiment with different array sizes, and we should see drops in the performance at the array sizes where the array spills out of one cache level.</p>
<p>Here is the program:</p>
<pre class="code"><span style="color: blue;">int </span>steps = 64 * 1024 * 1024; <span style="color: green;">// Arbitrary number of steps</span>
<span style="color: blue;">int </span>lengthMod = arr.Length - 1;
<span style="color: blue;">for </span>(<span style="color: blue;">int </span>i = 0; i &lt; steps; i++)
{
    arr[(i * 16) &amp; lengthMod]++; <span style="color: green;">// (x &amp; lengthMod) is equal to (x % arr.Length)
</span>}</pre>
<p>And here are the timings:</p>
<p> <img style="display: inline; border-width: 0px;" title="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/image.png" border="0" alt="image" width="483" height="291" /></p>
<p>You can see distinct drops after 32kB and 4MB – the sizes of L1 and L2 caches on my machine.</p>
<h3>Example 4: Instruction-level parallelism</h3>
<p>Now, let’s take a look at something different. Out of these two loops, which one would you expect to be faster?</p>
<pre class="code"><span style="color: blue;">int </span>steps = 256 * 1024 * 1024;
<span style="color: blue;">int</span>[] a = <span style="color: blue;">new int</span>[2];

<span style="color: green;">// Loop 1
</span><span style="color: blue;">for </span>(<span style="color: blue;">int </span>i=0; i&lt;steps; i++) { a[0]++; a[0]++; }

<span style="color: green;">// Loop 2
</span><span style="color: blue;">for </span>(<span style="color: blue;">int </span>i=0; i&lt;steps; i++) { a[0]++; a[1]++; }</pre>
<p>It turns out that the second loop is about twice faster than the first loop, at least on all of the machines I tested. Why? This has to do with the dependencies between operations in the two loop bodies.</p>
<p>In the body of the first loop, operations depend on each other as follows:</p>
<p><img style="display: inline; border-width: 0px;" title="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/01/image.png" border="0" alt="image" width="613" height="25" /></p>
<p>But in the second example, we only have these dependencies:</p>
<p><img style="display: inline; border-width: 0px;" title="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/image2.png" border="0" alt="image" width="289" height="73" /></p>
<p>The modern processor has various parts that have a little bit of parallelism in them: it can access two memory locations in L1 at the same time, or perform two simple arithmetic operations. In the first loop, the processor cannot exploit this instruction-level parallelism, but in the second loop, it can.</p>
<p><strong>[UPDATE]</strong>: Many people on reddit are asking about compiler optimizations, and whether { a[0]++; a[0]++; } would just get optimized to { a[0]+=2; }. In fact, the C# compiler and CLR JIT will not do this optimization – not when array accesses are involved. I built all of the tests in release mode (i.e. with optimizations), but I looked at the JIT-ted assembly to verify that optimizations aren’t skewing the results.</p>
<h3>Example 5: Cache associativity</h3>
<p>One key decision in cache design is whether each chunk of main memory can be stored in any cache slot, or in just some of them.</p>
<p>There are three possible approaches to mapping cache slots to memory chunks:</p>
<ol>
<li><strong>Direct mapped cache</strong>
<p>Each memory chunk can only be stored only in one particular slot in the cache. One simple solution is to map the chunk with index chunk_index to cache slot (chunk_index % cache_slots). Two memory chunks that map to the same slot cannot be stored simultaneously in the cache.</li>
<li><strong>N-way set associative cache</strong>
<p>Each memory chunk can be stored in any one of N particular slots in the cache. As an example, in a 16-way cache, each memory chunk can be stored in 16 different cache slots. Commonly, chunks with indices with the same lowest order bits will all share 16 slots.</li>
<li><strong>Fully associative cache</strong>
<p>Each memory chunk can be stored in any slot in the cache. Effectively, the cache operates like a hash table.</li>
</ol>
<p>Direct mapped caches can suffer from conflicts &#8211; when multiple values compete for the same slot in the cache, they keep evicting each other out, and the hit rate plummets. On the other hand, fully associative caches are complicated and costly to implement in the hardware. N-way set associative caches are the typical solution for processor caches, as they make a good trade off between implementation simplicity and good hit rate.</p>
<p>For example, the 4MB L2 cache on my machine is 16-way associative. All 64-byte memory chunks are partitioned into sets (based on the lowest order bits of the chunk index), and chunks in the same set compete for 16 slots in the L2 cache.</p>
<p>Since the L2 cache has 65,536 slots, and each set will need 16 slots in the cache, we will have 4,096 sets. So, the lowest 12 bits of the chunk index will determine which set the chunk belongs to (2<sup>12</sup> = 4,096). As a result, cache lines at addresses that differ by a multiple of 262,144 bytes (4096 * 64) will compete for the same slot in the cache. The cache on my machine can hold at most 16 such cache lines.</p>
<p>In order for the effects of cache associativity to become apparent, I need to repeatedly access more than  16 elements from the same set. I will demonstrate this using the following method:</p>
<pre class="code"><span style="color: blue;">public static long </span>UpdateEveryKthByte(<span style="color: blue;">byte</span>[] arr, <span style="color: blue;">int </span>K)
{
    <span style="color: #2b91af;">Stopwatch </span>sw = <span style="color: #2b91af;">Stopwatch</span>.StartNew();
    <span style="color: blue;">const int </span>rep = 1024*1024; <span style="color: green;">// Number of iterations – arbitrary</span>

<span style="color: green;">    </span><span style="color: blue;">int </span>p = 0;
    <span style="color: blue;">for </span>(<span style="color: blue;">int </span>i = 0; i &lt; rep; i++)
    {
        arr[p]++;
        p += K;
        <span style="color: blue;">if </span>(p &gt;= arr.Length) p = 0;
    }

    sw.Stop();
    <span style="color: blue;">return </span>sw.ElapsedMilliseconds;
}</pre>
<p>This method increments every K-th value in the array. Once the it reaches the end of the array, it starts again from the beginning. After running sufficiently long (2^20 steps), the loop stops.</p>
<p>I ran UpdateEveryKthByte() with different array sizes (in 1MB increments) and different step sizes. Here is a plot of the results, with blue representing long running time, and white representing short:</p>
<p> <a href="http://igoro.com/wordpress/wp-content/uploads/2010/02/image3.png"><img style="display: inline; border-width: 0px;" title="image" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/image_thumb1_opt.png" border="0" alt="image" width="582" height="299" /></a></p>
<p>The blue areas (long running times) are cases where the updated values <strong>could not be simultaneously held in the cache</strong> as we repeatedly iterated over them. The bright blue areas correspond to running times of ~80 ms, and the nearly white areas to ~10 ms.</p>
<p>Let’s explain the blue parts of the chart:</p>
<ol>
<li><strong>Why the vertical lines?</strong>The vertical lines show the step values that touch too many memory locations (&gt;16) from the same set. For those steps, we cannot simultaneously hold all touched values in the 16-way associative cache on my machine.
<p>Some bad step values are powers of two: 256 and 512. As an example, consider step 512 on an 8MB array. An 8MB cache line contains 32 values that are spaced by 262,144 bytes apart. All of those values will be updated by each pass of our loop, because 512 divides 262,144.</p>
<p>And since 32 &gt; 16, those 32 values will keep competing for the same 16 slots in the cache.</p>
<p>Some values that are not powers of two are simply unfortunate, and will end up visiting disproportionately many values from the same set. Those step values will also show up as as blue lines.</li>
<li><strong>Why do the vertical lines stop at 4MB array length?</strong>On arrays of 4MB or less, a 16-way associative cache is just as good as a fully associative one.
<p>A 16-way associative cache can hold at most 16 cache lines that are a multiple of 262,144 bytes apart. There is <strong>no set</strong> of 17 or more cache lines all aligned on 262,144-byte boundaries within 4MB, because 16 * 262,144 = 4,194,304.</li>
<li><strong>Why the blue triangle in upper left?</strong>In the triangle area, we cannot hold all necessary data in cache simultaneously … not due to the associativity, but simply because of the L2 cache size limit.
<p>For example, consider the array length 16MB with step 128. We are repeatedly updating every 128th byte in the array, which means that we touch every other 64-byte memory chunk. To store every other cache line of a 16MB array, we’d need 8MB cache. But, my machine only has 4MB of cache.</p>
<p>Even if the 4MB cache on my machine was fully associative, it still wouldn’t be able to hold 8MB of data.</li>
<li><strong>Why does the triangle fade out in the left?</strong>Notice that the gradient goes from 0 to 64 bytes – one cache line! As explained in examples 1 and 2, additional accesses to same cache line are nearly free. For example, when stepping by 16 bytes, it will take 4 steps to get to the next cache line. So, we get four memory accesses for the price of one.
<p>Since the number of steps is the same for all cases, a cheaper step results in a shorter running time.</li>
</ol>
<p>These patterns continue to hold as you extend the chart:</p>
<p><a href="http://igoro.com/wordpress/wp-content/uploads/2010/02/assoc_big1.png"><img style="display: inline; border-width: 0px;" title="assoc_big" src="http://igoro.com/wordpress/wp-content/uploads/2010/02/assoc_big_thumb1_opt.png" border="0" alt="assoc_big" width="582" height="299" /></a></p>
<p>Cache associativity is interesting to understand and can certainly be demonstrated, but it tends to be less of a problem compared to the other issues discussed in this article. It is certainly not something that should be at the forefront of your mind as you write programs.</p>
<h3>Example 6: False cache line sharing</h3>
<p>On multi-core machines, caches encounter another problem – consistency. Different cores have fully or partly separate caches. On my machine, L1 caches are separate (as is common), and there are two pairs of processors, each pair sharing an L2 cache. While the details vary, a modern multi-core machine will have a multi-level cache hierarchy, where the faster and smaller caches belong to individual processors.</p>
<p>When one processor modifies a value in its cache, other processors cannot use the old value anymore. That memory location will be invalidated in all of the caches. Furthermore, since caches operate on the granularity of cache lines and not individual bytes, the <strong>entire cache line</strong> will be invalidated in all caches!</p>
<p>To demonstrate this issue, consider this example:</p>
<pre class="code"><span style="color: blue;">private static int</span>[] s_counter = <span style="color: blue;">new int</span>[1024];
<span style="color: blue;">private void </span>UpdateCounter(<span style="color: blue;">int </span>position)
{
    <span style="color: blue;">for </span>(<span style="color: blue;">int </span>j = 0; j &lt; 100000000; j++)
    {
        s_counter[position] = s_counter[position] + 3;
    }
}</pre>
<p>On my quad-core machine, if I call UpdateCounter with parameters 0,1,2,3 from four different threads, it will take <strong>4.3 seconds </strong>until all threads are done.</p>
<p>On the other hand, if I call UpdateCounter with parameters 16,32,48,64 the operation will be done in <strong>0.28 seconds</strong>!</p>
<p>Why? In the first case, all four values are very likely to end up on the same cache line. Each time a core increments the counter, it invalidates the cache line that holds all four counters. All other cores will suffer a cache miss the next time they access their own counters. This kind of thread behavior effectively disables caches, <strong>crippling</strong> the program’s performance.</p>
<h3>Example 7: Hardware complexities</h3>
<p>Even when you know the basics of how caches work, the hardware will still sometimes surprise you. Different processors differ in optimizations, heuristics, and subtle details of how they do things.</p>
<p>On some processors, L1 cache can process two accesses in parallel if they access cache lines from different banks, and serially if they belong to the same bank. Also, processors can surprise you with clever optimizations. For example, the false-sharing example that I’ve used on several machines in the past did not work well on my machine without tweaks – my home machine can optimize the execution in the simplest cases to reduce the cache invalidations.</p>
<p>Here is one odd example of “hardware weirdness”:</p>
<pre class="code"><span style="color: blue;">private static int </span>A, B, C, D, E, F, G;
<span style="color: blue;">private static void </span>Weirdness()
{
    <span style="color: blue;">for </span>(<span style="color: blue;">int </span>i = 0; i &lt; 200000000; i++)
    {
        &lt;something&gt;
    }
}</pre>
<p>When I substitute three different blocks for “&lt;something&gt;”, I get these timings:</p>
<table border="0" cellspacing="0" cellpadding="2" width="362">
<tbody>
<tr>
<td width="199" valign="top"><strong>&lt;something&gt;</strong></td>
<td width="161" valign="top"><strong>Time</strong></td>
</tr>
<tr>
<td width="220" valign="top">A++; B++; C++; D++;</td>
<td width="169" valign="top">719 ms</td>
</tr>
<tr>
<td width="225" valign="top">A++; C++; E++; G++;</td>
<td width="172" valign="top">448 ms</td>
</tr>
<tr>
<td width="225" valign="top">A++; C++;</td>
<td width="174" valign="top">518 ms</td>
</tr>
</tbody>
</table>
<p>Incrementing fields A,B,C,D takes longer than incrementing fields A,C,E,G. And what’s even weirder, incrementing just A and C takes <strong>longer </strong>than increment A and C <strong>and</strong> E and G!</p>
<p>I don’t know for sure what is the reason behind these numbers, but I suspect it is related to memory banks. If someone can explain these numbers, I’d be very curious to hear about it.</p>
<p>The lesson of this example is that can be difficult to fully predict hardware performance. There is a lot that you <strong>can</strong> predict, but ultimately, it is very important to measure and verify your assumptions.</p>
<div style="background-color: #f6f6ff; margin-top: 20px; width: 600px; margin-bottom: 20px; margin-left: 10px; border: black 1px solid; padding: 4px;">
<p>Read more of my articles:</p>
<p><a href="http://igoro.com/archive/human-heart-is-a-turing-machine-research-on-xbox-360-shows-wait-what/">Human heart is a Turing machine, research on XBox 360 shows. Wait, what?</a></p>
<p><a href="http://igoro.com/archive/self-printing-game-of-life-in-c/">Self-printing Game of Life in C#</a></p>
<p><a href="http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/">Efficient auto-complete with a ternary search tree</a></p>
<p><a href="http://igoro.com/archive/numbers-that-cannot-be-computed/">Numbers that cannot be computed</a></p>
<p>And if you like my blog, <a href="http://igoro.com/feed/">subscribe</a>!</p>
</div>
<h3>Conclusion</h3>
<p>Hopefully all of this helps you understand how caches work, and apply that knowledge when tuning your programs.</p>
<img src="http://feeds.feedburner.com/~r/igoro/~4/5dtBX13VqlA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/gallery-of-processor-cache-effects/feed/</wfw:commentRss>
		<slash:comments>43</slash:comments>
		</item>
		<item>
		<title>Video of my PLINQ session at PDC 2009</title>
		<link>http://igoro.com/archive/video-of-my-plinq-session-at-pdc-2009/</link>
		<comments>http://igoro.com/archive/video-of-my-plinq-session-at-pdc-2009/#comments</comments>
		<pubDate>Sun, 22 Nov 2009 05:25:21 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Concurrency]]></category>

		<guid isPermaLink="false">http://igoro.com/?p=353</guid>
		<description><![CDATA[PDC 2009 was an exciting event, with announcements about Azure, Silverlight 4, and Office 2010 popping up one after another. For me, there was another reason why this year&#8217;s PDC was exciting &#8211; it was my first chance to present in a major conference.
 
 
The video of my session is available as Silverlight video, [...]]]></description>
			<content:encoded><![CDATA[<p>PDC 2009 was an exciting event, with announcements about Azure, Silverlight 4, and Office 2010 popping up one after another. For me, there was another reason why this year&#8217;s PDC was exciting &#8211; it was my first chance to present in a major conference.</p>
<p><a href="http://microsoftpdc.com/Sessions/FT21"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2009/11/image.png" width="648" height="370" /></a> </p>
<p> <span id="more-353"></span>
<p>The video of my session is available as <a href="http://microsoftpdc.com/Sessions/FT21">Silverlight video</a>, and also as <a href="http://ecn.channel9.msdn.com/o9/pdc09/wmvhigh/FT21.wmv">high-quality WMV</a> and <a href="http://ecn.channel9.msdn.com/o9/pdc09/wmv/FT21.wmv">standard-quality WMV</a>. Here are the <a href="http://ecn.channel9.msdn.com/o9/pdc09/ppt/FT21.pptx">slides</a>. Check it out and let me know how I did!</p>
<p>And, here are videos of all <a href="http://blogs.msdn.com/pfxteam/archive/2009/11/21/9926691.aspx">PDC sessions related to parallel programming</a>.</p>
<p>Never having presented at PDC before, I didn’t know how many people to expect at my session. I was happy with the turn out – this is a view of my session from the back of the room:</p>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="SNC00100" border="0" alt="SNC00100" src="http://igoro.com/wordpress/wp-content/uploads/2009/11/SNC00100.jpg" width="512" height="384" /></p>
<img src="http://feeds.feedburner.com/~r/igoro/~4/-593umDPCCI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/video-of-my-plinq-session-at-pdc-2009/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
<enclosure url="http://ecn.channel9.msdn.com/o9/pdc09/wmvhigh/FT21.wmv" length="720490561" type="video/x-ms-wmv" />
<enclosure url="http://ecn.channel9.msdn.com/o9/pdc09/wmv/FT21.wmv" length="105056918" type="video/x-ms-wmv" />
		</item>
		<item>
		<title>RoboZZle hacked, and 100+ sites are still compromised</title>
		<link>http://igoro.com/archive/robozzle-hacked-and-100-sites-are-still-compromised/</link>
		<comments>http://igoro.com/archive/robozzle-hacked-and-100-sites-are-still-compromised/#comments</comments>
		<pubDate>Wed, 21 Oct 2009 09:59:31 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://igoro.com/?p=328</guid>
		<description><![CDATA[
 Around two weeks ago, I found this email in my inbox, with the subject “Complaint about Robozzle”:
Hi Igor
Robozzle is really cute, I like it, but why on earth is it polluted with hundreds of invisible links to porn sites? From a guy like you I don&#8217;t expect to do such dirty things. pls remove [...]]]></description>
			<content:encoded><![CDATA[<p><!-- blockquote { border: 1px solid #d0d0d0 } --></p>
<p><img style="border-right-width: 0px; margin: 0px 0px 0px 10px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="j0434750" src="http://igoro.com/wordpress/wp-content/uploads/2009/10/j0434750.png" border="0" alt="j0434750" width="36" height="36" align="right" /> Around two weeks ago, I found this email in my inbox, with the subject “Complaint about Robozzle”:</p>
<blockquote><p>Hi Igor</p>
<p>Robozzle is really cute, I like it, but why on earth is it polluted with hundreds of invisible links to porn sites? From a guy like you I don&#8217;t expect to do such dirty things. pls remove them.</p>
<p>David</p></blockquote>
<p><span id="more-328"></span></p>
<p>Hoping for a simple explanation, I looked at the source of the <a href="http://robozzle.com/">RoboZZle</a> front page. And my heart sunk as I saw <strong>hundreds</strong> of spam links on the bottom:</p>
<blockquote><p>&lt;a href=&#8221;&lt;a hijacked site&gt;.com/files/cms/7/pornhud.com-20076.html&#8221;&gt;porn hud.com&lt;/a&gt;<br />
&lt;a href=&#8221;&lt;a hijacked site&gt;.com/files/cms/7/www.tube.com8-5852.html&#8221;&gt;www.tube.com 8&lt;/a&gt;<br />
…</p></blockquote>
<p>The links were inside a &lt;p&gt; tag with visibility set to “hidden”, so the links were only visible to search engines, not to human visitors.</p>
<p>I wondered if this affected how my site shows up in search engines. So, I searched for RoboZZle on Google (results in Bing or Yahoo! were not affected), and here is what I got:</p>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="robozzle_bad" src="http://igoro.com/wordpress/wp-content/uploads/2009/10/robozzle_bad1.png" border="0" alt="robozzle_bad" width="563" height="103" /></p>
<p>Oh crap. This sucks. How did it happen?</p>
<h3>Site infestation</h3>
<p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; margin-left: 0px; border-left-width: 0px; margin-right: 0px" title="j0438025" src="http://igoro.com/wordpress/wp-content/uploads/2009/10/j0438025.png" border="0" alt="j0438025" width="144" height="144" align="right" /></p>
<p>So what did the hackers do to my poor game? I looked around to find what has changed:</p>
<ol>
<li><strong>Planted links<br />
</strong>Multiple pages on my site have been modified to import a planted config.ini file. The config.ini file contained hundreds of fake links, and was updated every couple hours with a new set of links. The links all pointed to spammy content planted on another hijacked site.<br />
 </li>
<li><strong>Planted content</strong><br />
My site also contained thousands of small planted HTML files, similar to the spammy stuff that the planted links pointed to. An odd folder (something like http://robozzle.com/old/old2/) contained the HTML files, all with links to suspicious content, and all infested with spammy keywords.<br />
 </li>
<li><strong>Backdoor PHP scripts</strong><br />
And finally, my site also contained a couple of PHP backdoor scripts. If you visited a particular URL on the robozzle.com domain, you’d get a file manager that lets you upload, delete, and manage files on my site, no passwords needed. As I found out later, the hackers actually knew my FTP password so they didn’t need the backdoors, but they left the backdoors so that they can get in once I change my password.</li>
</ol>
<p>I carefully checked my site and removed all of this stuff. Even after I removed the PHP backdoors, the config.ini file continued to get updated until I changed my FTP password. From that point, I was pretty sure that the attackers somehow got my FTP password.</p>
<h3>Looking for other hijacked sites</h3>
<p>After cleaning up my site, one of the first things I did was notify the other compromised site, which hosted the planted content linked from my site.</p>
<p>I also wondered if more sites have been hacked in a similar way. I tried various online tools to find other sites that contain the same links that have been planted on my site.</p>
<p>I found a handful of hacked sites right away. The fact that the links change every few hours made the task more difficult, though. The newest links generally point to content that has not yet been crawled by search engines. But, there is a simple solution. By looking at older versions of several compromised sites in a service like Google Cache, I eventually found older links that lead me to many more hijacked sites.</p>
<p>After repeating the process for a couple hours, I ended up with a list of over a <strong>150 infected sites</strong>, including some fairly major sites. The larger sites generally removed the infestation quickly, though. Here are a few sites that haven’t cleaned up, despite the fact that I alerted them at least two weeks ago:</p>
<ul>
<li>bayonnenj.org – City of Bayonne, NJ</li>
<li>steinercollege.edu &#8211; Rudolf Steiner College</li>
<li>dillard.senategop.org &#8211; GOP Senator Kirk Dillard</li>
<li>egnc-ibm.gov.eg &#8211; Egypt-IBM Nanotechnology Research Center</li>
</ul>
<p style="color: #800000"><strong>Don’t go to these sites unless you know what you are doing. At the time of writing this post, these sites appear to be compromised, so they may well contain viruses or malware.</strong></p>
<p><strong>UPDATE:</strong> Most of the sites are suddenly not showing the links. My guess is that the hacker group distributed an empty config.ini file after this story became popular on reddit. I don&#8217;t believe that so many obscure sites on my list would be fixed over night when they have been infected for months before. Some sites still contain the planted links, but those seem to be the ones that haven&#8217;t been updating the links regularly. These are probably the sites that the hackers only have partial control over at this point (e.g., FTP password changed, but there is still a backdoor on the site somewhere). You should be able to view the planted links on all infected sites by looking at Google Cache.</p>
<p>I did my best to contact owners of as many hijacked sites as I could. Looking for contact information on that many sites – most of them not in English &#8211; is a time-consuming endeavor, though.</p>
<h3>Tools I used</h3>
<p>I found these tools useful when tracking down other sites that have been hijacked:</p>
<table border="0" cellspacing="0" cellpadding="2" width="667">
<tbody>
<tr>
<td width="161" valign="top"><a href="http://siteexplorer.search.yahoo.com/">Yahoo! Site Explorer</a></td>
<td width="504" valign="top">This is the best tool I found to search for sites that link to a particular URL. At least for my purposes, it worked much better than “link:” queries in Google.<br />
 </td>
</tr>
<tr>
<td width="161" valign="top"><a href="http://proxify.org/">Proxify</a></td>
<td width="504" valign="top">When visiting sites controlled by hackers, it is worthwhile to be cautious, since the site may be infested by malware. Proxify sends the request on your behalf, and in the Source mode, sends you the HTTP response as text. So, the infested site won’t see your IP address, and any HTML it sends back will not be rendered, just shown in text format.<br />
 </td>
</tr>
<tr>
<td width="161" valign="top">Web caches</td>
<td width="504" valign="top">As you probably know, all major search engines (Google, Bing, Yahoo!) let you view the version of the page cached by the service. This gives you a version of the page as it was a few days or weeks ago.<br />
 </td>
</tr>
</tbody>
</table>
<p> </p>
<h3>And how did I get hacked in the first place?</h3>
<p>Of course, I have been wondering about how the hackers got my FTP password in the first place. It wasn’t really guessable or discoverable by brute force, and I didn’t use the same password on other sites.</p>
<p>And then I got an email from my webhost, notifying me that FTP passwords <strong>may have</strong> been stolen due to a vulnerability. They tracked down the issue to a particular software package they use.</p>
<p>So, I assume that my password got stolen this way. If not, it is also possible that I logged into my site on a computer with a particular virus. <a href="http://blog.tigertech.net/posts/ftp-password-viruses/">Apparently</a>, that’s how many FTP passwords get stolen.</p>
<blockquote><p>If you liked this article, check out these ones:</p>
<p><a href="http://igoro.com/archive/human-heart-is-a-turing-machine-research-on-xbox-360-shows-wait-what/">Human heart is a Turing machine, research on XBox 360 shows. Wait, what?</a></p>
<p><a href="http://igoro.com/archive/numbers-that-cannot-be-computed/">Numbers that cannot be computed</a></p></blockquote>
<img src="http://feeds.feedburner.com/~r/igoro/~4/V83D6pgIH80" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/robozzle-hacked-and-100-sites-are-still-compromised/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Human heart is a Turing machine, research on XBox 360 shows. Wait, what?</title>
		<link>http://igoro.com/archive/human-heart-is-a-turing-machine-research-on-xbox-360-shows-wait-what/</link>
		<comments>http://igoro.com/archive/human-heart-is-a-turing-machine-research-on-xbox-360-shows-wait-what/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 08:13:31 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Cool Stuff]]></category>

		<guid isPermaLink="false">http://igoro.com/?p=285</guid>
		<description><![CDATA[Did you ever see one of those auto-generated random “academic papers” like this one? When I first saw the following title, my first thought was that it is a randomly-generated “paper”:
Implications of the Turing completeness of reaction-diffusion models, informed by GPGPU simulations on an XBox 360: Cardiac arrhythmias, re-entry and the Halting problem [PDF]
Turing completeness, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://commons.wikimedia.org/wiki/File:CoeurHumain.svg"><img style="margin: 0px 0px 10px 10px; display: inline; border: 0px;" title="211px-CoeurHumain_svg" src="http://robozzle.com/igoro/211px-CoeurHumain_svg.gif" border="0" alt="211px-CoeurHumain_svg" width="150" align="right" /></a>Did you ever see one of those auto-generated random “academic papers” like <a href="http://pdos.csail.mit.edu/scigen/rooter.pdf">this one</a>? When I first saw the following title, my first thought was that it is a randomly-generated “paper”:</p>
<p><a href="http://www.sciencedirect.com/science?_ob=ArticleURL&amp;_udi=B73G2-4WH2KVG-1&amp;_user=10&amp;_rdoc=1&amp;_fmt=&amp;_orig=search&amp;_sort=d&amp;_docanchor=&amp;view=c&amp;_searchStrId=1022400504&amp;_rerunOrigin=scholar.google&amp;_acct=C000050221&amp;_version=1&amp;_urlVersion=0&amp;_userid=10&amp;md5=979bc3e20f7e915c6cb32be391bc370d">Implications of the Turing completeness of reaction-diffusion models, informed by GPGPU simulations on an XBox 360: Cardiac arrhythmias, re-entry and the Halting problem</a> [<a href="http://research.microsoft.com/pubs/79271/turing.pdf">PDF</a>]</p>
<p>Turing completeness, cardiac arrhythmias, XBox 360… those things don’t seem to have much in common. But, I had my interest piqued. I looked up the paper and read through it. And, it turns out that not only is the paper serious, what it has to say is also quite interesting.</p>
<p><span id="more-285"></span></p>
<h3>Heart and logic circuits</h3>
<p>I had to take author’s word on the medical aspects of the article, since I know nothing about cardiology. Apparently, understanding of electrical signals between cells in a human heart is important for research into heart arrhythmias. Makes sense.</p>
<p>The important insight of the article is that a logic NOR gate can be simulated using electrical signals between heart cells. Constructing a NOR gate is a powerful result, because similarly to NAND, NOR is a universal logic gate. That means that all other logic gates like AND, OR and NOT can be built out of NORs:</p>
<p><span style="font-family: Courier New;"><span style="color: #0080ff;">NOT</span>(A)    = <span style="color: #0080ff;">NOR</span>(A, A)<br />
<span style="color: #0080ff;">OR</span>(A, B)  = <span style="color: #0080ff;">NOR</span>(<span style="color: #0080ff;">NOR</span>(A, B), <span style="color: #0080ff;">NOR</span>(A, B))<br />
<span style="color: #0080ff;">AND</span>(A, B) = </span><span style="font-family: Courier New;"><span style="color: #0080ff;">NOR</span>(<span style="color: #0080ff;">NOR</span>(A, A), <span style="color: #0080ff;">NOR</span>(B, B))</span></p>
<p>So, since you can simulate a NOR gate using cardiac cells, you can simulate an arbitrary logic circuit in heart tissue.</p>
<h3>Heart and Turing machines</h3>
<p>But, there has to be more to the story. The paper title mentioned Turing machines, and logic circuits are not Turing-complete. Halting problem doesn’t even make sense when applied to boolean expressions!</p>
<p>The missing part is the passage of time. See, a logic circuit is not Turing-complete. But, if you take a logic circuit with multiple inputs, the same number of outputs as inputs, and repeatedly apply the circuit over the results of the previous iteration, you get a Turing-complete system. This type of a system can be modeled with behavior of a heart tissue over a period of time.</p>
<p>The most intuitive explanation that I can think of is via <a href="http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life">Game of Life</a>. In the Game of Life, each cell dies, becomes alive or stays alive depending on how many live neighbors it had in the previous generation.  Here is one example of a Game of Life board in action:</p>
<p><a href="http://en.wikipedia.org/wiki/File:Gospers_glider_gun.gif"><img style="display: inline;" title="Gospers_glider_gun" src="http://robozzle.com/igoro/Gospers_glider_gun.gif" alt="Gospers_glider_gun" width="250" height="180" /></a> </p>
<p>Now, the important observation is that the Game of Life rules can be encoded using a logic circuit. For example, if the eight neighbors of a particular cell are represented as A, B … H, then the rule that the cell becomes alive if it has exactly three neighbors can be encoded as  ((A and B and C and not(D) and … and not(H)) or (A and B and not(C) and D and not(E) … and not(H)) or …). This expression will be combined with an OR together with the situations under which the cell remains alive, rather than becoming alive. This will be a pretty ugly logic circuit, but it should be clear that it can be constructed.</p>
<p>Since Game of Life is known to be Turing-complete, then an “iterated logic circuit” is also Turing-complete, and the behavior of cardiac tissue is … Turing-complete.</p>
<div style="border: black 1px solid; padding-bottom: 4px; background-color: #f6f6ff; padding-left: 4px; width: 600px; padding-right: 4px; margin-left: 10px;padding-top: 4px">Also read about <a href="http://igoro.com/archive/numbers-that-cannot-be-computed/">Numbers that cannot be computed</a> and&nbsp; <a title="Link to Self-printing Game of Life in C#" href="http://igoro.com/archive/self-printing-game-of-life-in-c/">Self-printing Game of Life in C#</a>.</div>
<h3>Why is this useful?</h3>
<p>Proving that a particular behavior of cardiac tissue is Turing-complete is a useful result because it shows that the cardiac tissue is in a certain sense “unpredictable”.</p>
<p>For example, since Game of Life is Turing-complete, the Halting problem applies. So, it is a proven fact that there is no general algorithm that can look at a particular Game of Life board and decide whether the movement will eventually stop or continue forever.</p>
<p>Similarly, there is no algorithm that can decide any of these properties for all Game-of-Life boards:</p>
<ul>
<li>Whether the board will ever reach a particular configuration</li>
<li>Whether the number of live cells will ever exceed X</li>
<li>Whether the game will ever enter a cycle</li>
<li>etc.</li>
</ul>
<p>Since the studied behavior of cardiac tissue is also Turing-complete, there is no general algorithm that can look at the state of cardiac tissue and decide whether the activity will ever stop, enter a regular pattern, achieve a particular configuration, etc. That is certainly a worthwhile result!</p>
<h3>What about the XBox?</h3>
<p><a href="http://commons.wikimedia.org/wiki/File:Xbox_360.jpg"><img style="margin: 0px 0px 10px 10px; display: inline; border: 0px;" title="367px-Xbox_360" src="http://robozzle.com/igoro/367pxXbox_360.jpg" border="0" alt="367px-Xbox_360" width="147" height="240" align="right" /></a>Constructing a NOR gate out of cardiac cells is computationally intensive, and the researcher used a GPU in an XBox 360 for that task.</p>
<p>However, the paper doesn’t conclusively show that using an XBox was a real benefit. The paper says that a C++ implementation that was originally “designed more for ease of expansion […] than for speed” ran slower on an XBox 360 CPU than an “unoptimized” shader-based implementation on an XBox 360 GPU. Comparing implementations not designed for speed is unconvincing. If the goal was to speed up the computation, why not first try to optimize the original code instead of porting it to shaders, which is undoubtedly a much more difficult task?</p>
<p>Also, the article doesn’t say how did XBox 360 CPU compare against an ordinary x86 machine, or against say a CUDA-based implementation on a common NVidia card. So, the paper doesn’t come close to showing that the researchers gained much by coding against the XBox 360 GPU, rather than following the current state-of-the-art approaches.</p>
<p>But, it is still a cool paper, and adding XBox 360 into the picture certainly attracted attention. The paper was reported in press, with titles such as these:</p>
<ul>
<li><a href="http://www.time.com/time/health/article/0,8599,1925332,00.html">How Xbox Can Help Fight Heart Disease</a> [time.com].</li>
<li><a href="http://www.techshout.com/science/2009/16/study-parallel-processor-computing-of-xbox-chip-could-save-thousands/">Parallel processor computing of XBox chip could save thousands</a> [techshout.com].</li>
</ul>
<p>And, if it weren’t for the sensational articles, I wouldn’t have found out about the paper at all, so I guess I shouldn’t complain.</p>
<img src="http://feeds.feedburner.com/~r/igoro/~4/wwnFShSaqow" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/human-heart-is-a-turing-machine-research-on-xbox-360-shows-wait-what/feed/</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
		<item>
		<title>Program like a Quake developer</title>
		<link>http://igoro.com/archive/program-like-a-quake-developer/</link>
		<comments>http://igoro.com/archive/program-like-a-quake-developer/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 06:22:52 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://igoro.com/?p=192</guid>
		<description><![CDATA[Not many developers have the insights of Michael Abrash. He is a game developer with decades of experience building commercial games, including a game you may recognize as &#34;Quake&#34;. His Graphics Programming Black Book is years old, but much of it is just as interesting as it was at the time of writing. And, the [...]]]></description>
			<content:encoded><![CDATA[<p>Not many developers have the insights of Michael Abrash. He is a game developer with decades of experience building commercial games, including a game you may recognize as &quot;Quake&quot;. His Graphics Programming Black Book is years old, but much of it is just as interesting as it was at the time of writing. And, the entire book is available <a href="http://www.gamedev.net/reference/articles/article1698.asp">online</a>, for free.</p>
<p>The book consists of 70 chapters on optimization, graphics, and assembly programming. The entire book is insightful and interesting, but my favorite chapters are these:</p>
<ul>
<li><a href="http://downloads.gamedev.net/pdf/gpbb/gpbb1.pdf">The Best Optimizer Is between Your Ears</a>      </li>
<li><a href="http://downloads.gamedev.net/pdf/gpbb/gpbb16.pdf">There Ain’t No Such Thing as the Fastest Code</a>      </li>
<li>Optimizing for <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb11.pdf">286 and 386</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb12.pdf">486</a> (<a href="http://downloads.gamedev.net/pdf/gpbb/gpbb13.pdf">continued</a>), and <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb19.pdf">Pentium</a> (continued <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb20.pdf">here</a> and <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb21.pdf">here</a>).       </li>
<li>Algorithms for fast drawing of <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb35.pdf">lines</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb42.pdf">anti-aliased lines</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb38.pdf">polygons</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb39.pdf">fast convex polygons</a>, and <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb50.pdf">3d objects</a> (<a href="http://downloads.gamedev.net/pdf/gpbb/gpbb51.pdf">continued</a>)      </li>
<li>Quake’s <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb64.pdf">visible-surface determination</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb65.pdf">3D clipping</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb66.pdf">hidden-surface removal</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb67.pdf">span sorting</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb68.pdf">lighting</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb69.pdf">surface caching</a> and <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb70.pdf">post-mortem</a>.</li>
</ul>
<p>Enjoy! Again, the entire book is accessible here: <a href="http://www.gamedev.net/reference/articles/article1698.asp">Graphics Programming Black Book</a>.</p>
<img src="http://feeds.feedburner.com/~r/igoro/~4/bRhvhwUIG50" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/program-like-a-quake-developer/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Efficient auto-complete with a ternary search tree</title>
		<link>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/</link>
		<comments>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/#comments</comments>
		<pubDate>Mon, 31 Aug 2009 00:27:54 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://igoro.com/?p=177</guid>
		<description><![CDATA[
	pre.code {
		margin-left: 40px;
	}
Over the past couple of years, auto-complete has popped up all over the web. Facebook, YouTube, Google, Bing, MSDN, LinkedIn and lots of other websites all try to complete your phrase as soon as you start typing.
Auto-complete definitely makes for a nice user experience, but it can be a challenge to implement efficiently. [...]]]></description>
			<content:encoded><![CDATA[<style>
<p>	pre.code {
		margin-left: 40px;
	}</style>
<p>Over the past couple of years, auto-complete has popped up all over the web. Facebook, YouTube, Google, Bing, MSDN, LinkedIn and lots of other websites all try to complete your phrase as soon as you start typing.</p>
<p>Auto-complete definitely makes for a nice user experience, but it can be a challenge to implement efficiently. In many cases, an efficient implementation requires the use of interesting algorithms and data structures. In this blog post, I will describe one simple data structure that can be used to implement auto-complete: a ternary search tree.</p>
<h3>Trie: simple but space-inefficient</h3>
<p> <span id="more-177"></span>
<p>Before discussing ternary search trees, let&#8217;s take a look at a simple data structure that supports a fast auto-complete lookup but needs too much memory: a <em>trie</em>. A trie is a tree-like data structure in which each node contains an array of pointers, one pointer for each character in the alphabet. Starting at the root node, we can trace a word by following pointers corresponding to the letters in the target word.</p>
<p>Each node could be implemented like this in C#:</p>
<pre class="code"><span style="color: blue">class </span><span style="color: #2b91af">TrieNode
</span>{
    <span style="color: blue">public const int </span>ALPHABET_SIZE = 26;
    <span style="color: blue">public </span><span style="color: #2b91af">TrieNode</span>[] m_pointers = <span style="color: blue">new </span><span style="color: #2b91af">TrieNode</span>[ALPHABET_SIZE];
    <span style="color: blue">public bool </span>m_endsString = <span style="color: blue">false</span>;
}</pre>
<p>Here is a trie that stores words AB, ABBA, ABCD, and BCD. Nodes that terminate words are marked yellow:</p>
<p>&#160;</p>
<p><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="545" alt="gif_1" src="http://igoro.com/wordpress/wp-content/uploads/2009/08/gif-1.gif" width="577" border="0" /></p>
<p>&#160;</p>
<p>Implementing auto complete using a trie is easy. We simply trace pointers to get to a node that represents the string the user entered. By exploring the trie from that node down, we can enumerate all strings that complete user&#8217;s input.</p>
<p>But, a trie has a major problem that you can see in the diagram above. The diagram only fits on the page because the trie only supports four letters {A,B,C,D}. If we needed to support all 26 English letters, each node would have to store 26 pointers. And, if we need to support international characters, punctuation, or distinguish between lowercase and uppercase characters, the memory usage grows becomes untenable.</p>
<p>Our problem has to do with the memory taken up by all the null pointers stored in the node arrays. We could consider using a different data structure in each node, such as a hash map. However, managing thousands and thousands of hash maps is generally not a good idea, so let’s take a look at a better solution.</p>
<h3>Ternary search tree to the rescue</h3>
<p>A ternary tree is a data structure that solves the memory problem of tries in a more clever way. To avoid the memory occupied by unnecessary pointers, each trie node is represented as a tree-within-a-tree rather than as an array. Each non-null pointer in the trie node gets its own node in a ternary search tree.</p>
<p>For example, the trie from the example above would be represented in the following way as a ternary search tree:</p>
<p><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="375" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2009/09/image.png" width="183" border="0" /></p>
<p>The ternary search tree contains three types of arrows. First, there are arrows that correspond to arrows in the corresponding trie, shown as dashed down-arrows. Traversing a down-arrow corresponds to “matching” the character from which the arrow starts. The left- and right- arrow are traversed when the current character does not match the desired character at the current position. We take the left-arrow if the character we are looking for is alphabetically before the character in the current node, and the right-arrow in the opposite case.</p>
<p>For example, green arrows show how we’d confirm that the ternary tree contains string ABBA:</p>
<p>&#160;<img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="375" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2009/09/image1.png" width="183" border="0" /></p>
<p>And this is how we’d find that the ternary string does not contain string ABD:</p>
<p><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="375" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2009/09/image2.png" width="183" border="0" />&#160;</p>
<h3>Ternary search tree on a server</h3>
<p>On the web, a significant chunk of the auto-complete work has to be done by the server. Often, the set of possible completions is large, so it is usually not a good idea to download all of it to the client. Instead, the ternary tree is stored on the server, and the client will send prefix queries to the server.</p>
<p>The client will send a query for words starting with “bin” to the server:</p>
<p>&#160; <img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="218" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2009/09/image3.png" width="639" border="0" /></p>
<p>And the server responds with a list of possible words:</p>
<p><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="218" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2009/09/image4.png" width="452" border="0" />&#160;</p>
<h3>Implementation</h3>
<p>Here is a simple ternary search tree implementation in C#:</p>
<pre class="code"><span style="color: blue">public </span><span style="color: blue">class </span><span style="color: #2b91af">TernaryTree
</span>{
    <span style="color: blue">private </span><span style="color: #2b91af">Node </span>m_root = <span style="color: blue">null</span>;

    <span style="color: blue">private void </span>Add(<span style="color: blue">string </span>s, <span style="color: blue">int </span>pos, <span style="color: blue">ref </span><span style="color: #2b91af">Node </span>node)
    {
        <span style="color: blue">if </span>(node == <span style="color: blue">null</span>) { node = <span style="color: blue">new </span><span style="color: #2b91af">Node</span>(s[pos], <span style="color: blue">false</span>); }

        <span style="color: blue">if </span>(s[pos] &lt; node.m_char) { Add(s, pos, <span style="color: blue">ref </span>node.m_left); }
        <span style="color: blue">else if </span>(s[pos] &gt; node.m_char) { Add(s, pos, <span style="color: blue">ref </span>node.m_right); }
        <span style="color: blue">else
        </span>{
            <span style="color: blue">if </span>(pos + 1 == s.Length) { node.m_wordEnd = <span style="color: blue">true</span>; }
            <span style="color: blue">else </span>{ Add(s, pos + 1, <span style="color: blue">ref </span>node.m_center); }
        }
    }

    <span style="color: blue">public void </span>Add(<span style="color: blue">string </span>s)
    {
        <span style="color: blue">if </span>(s == <span style="color: blue">null </span>|| s == <span style="color: #a31515">&quot;&quot;</span>) <span style="color: blue">throw new </span><span style="color: #2b91af">ArgumentException</span>();

        Add(s, 0, <span style="color: blue">ref </span>m_root);
    }

    <span style="color: blue">public bool </span>Contains(<span style="color: blue">string </span>s)
    {
        <span style="color: blue">if </span>(s == <span style="color: blue">null </span>|| s == <span style="color: #a31515">&quot;&quot;</span>) <span style="color: blue">throw new </span><span style="color: #2b91af">ArgumentException</span>();

        <span style="color: blue">int </span>pos = 0;
        <span style="color: #2b91af">Node </span>node = m_root;
        <span style="color: blue">while </span>(node != <span style="color: blue">null</span>)
        {
            <span style="color: blue">int </span>cmp = s[pos] - node.m_char;
            <span style="color: blue">if </span>(s[pos] &lt; node.m_char) { node = node.m_left; }
            <span style="color: blue">else if </span>(s[pos] &gt; node.m_char) { node = node.m_right; }
            <span style="color: blue">else
            </span>{
                <span style="color: blue">if </span>(++pos == s.Length) <span style="color: blue">return </span>node.m_wordEnd;
                node = node.m_center;
            }
        }

        <span style="color: blue">return false</span>;
    }
}</pre>
<p><a href="http://11011.net/software/vspaste"></a></p>
<p><a href="http://11011.net/software/vspaste"></a></p>
<p><strong></strong></p>
<p>And here is the Node class:</p>
<pre class="code"><span style="color: blue">class </span><span style="color: #2b91af">Node
</span>{
    <span style="color: blue">internal char </span>m_char;
    <span style="color: blue">internal </span><span style="color: #2b91af">Node </span>m_left, m_center, m_right;
    <span style="color: blue">internal bool </span>m_wordEnd;

    <span style="color: blue">public </span>Node(<span style="color: blue">char </span>ch, <span style="color: blue">bool </span>wordEnd)
    {
        m_char = ch;
        m_wordEnd = wordEnd;
    }
}</pre>
<p><a href="http://11011.net/software/vspaste"></a></p>
<h3>Remarks</h3>
<p>For best performance, strings should be inserted into the ternary tree in a random order. In particular, do not insert strings in the alphabetical order. Each mini-tree that corresponds to a single trie node would degenerate into a linked list, significantly increasing the cost of lookups. Of course, more complex self-balancing ternary trees can be implemented as well.</p>
<p>And, don’t use a fancier data structure than you have to. If you only have a relatively small set of candidate words (say on the order of hundreds) a brute-force search should be fast enough.</p>
<h3>Further reading</h3>
<p>Another article on tries is available on DDJ (careful, their implementation assumes that no word is a prefix of another):</p>
<p><a title="http://www.ddj.com/windows/184410528" href="http://www.ddj.com/windows/184410528">http://www.ddj.com/windows/184410528</a></p>
<p>If you like this article, also check out these posts on my blog:</p>
<ul>
<li><a href="http://igoro.com/archive/skip-lists-are-fascinating/">Skip lists are fascinating!</a> </li>
<li><a href="http://igoro.com/archive/numbers-that-cannot-be-computed/">Numbers that cannot be computed</a> </li>
<li><a href="http://igoro.com/archive/quicksort-killer/">Quicksort killer</a> </li>
</ul>
<img src="http://feeds.feedburner.com/~r/igoro/~4/il6UsbF_unM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>7 tips for extending browser functionality to Silverlight apps</title>
		<link>http://igoro.com/archive/7-tips-for-extending-browser-functionality-to-silverlight-apps/</link>
		<comments>http://igoro.com/archive/7-tips-for-extending-browser-functionality-to-silverlight-apps/#comments</comments>
		<pubDate>Mon, 01 Jun 2009 07:52:06 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Misc]]></category>
		<category><![CDATA[Silverlight]]></category>

		<guid isPermaLink="false">http://igoro.com/?p=164</guid>
		<description><![CDATA[Silverlight 2 is an awesome platform for development of rich web applications. One issue to be aware of, though, is that some browser features do not extend into Silverlight apps. For example, the Back and Forward browser buttons do not always work as the user may expect with Silverlight apps. The set of browser features [...]]]></description>
			<content:encoded><![CDATA[<p>Silverlight 2 is an awesome platform for development of rich web applications. One issue to be aware of, though, is that some browser features do not extend into Silverlight apps. For example, the Back and Forward browser buttons do not always work as the user may expect with Silverlight apps. The set of browser features you&#8217;ll miss in Silverlight apps is pretty much identical to what you&#8217;d miss in Flash or AJAX, and some of them are about to be addressed in the Silverlight 3 release.</p>
<p>Let&#8217;s go over the affected browser features one by one, and discuss ways of getting them working in Silverlight apps.</p>
<h3>Feature 1: Bookmarking and deep linking</h3>
<p><span id="more-164"></span>Users like to be able to copy the link from the address bar, email it to a friend, bookmark it, or post it on Twitter or Facebook.</p>
<p>A Silverlight app will typically transition between different views without redirecting the user to a different URL. This makes the user experience smooth and polished. As an unfortunate consequence, a user that copies the URL from the address bar or bookmarks the page will not get a link to the current active view, but instead a link to the initial view of the app.</p>
<p>Thankfully, there is a solution in Silverlight. A Silverlight app can modify the &#8220;bookmark&#8221; section of the URL, which is the part of the URL following the # character. The app can update the URL each time the user transitions to a different view, and read the bookmark on startup. See <a href="http://silverlight.net/blogs/msnow/archive/2008/09/16/silverlight-tip-of-the-day-41-using-bookmarks-in-your-silverlight-application.aspx">this article</a> for details on how to implement this.</p>
<p>The <a href="http://www.silverlightshow.net/items/The-Silverlight-3-Navigation-Framework.aspx">Navigation Framework</a> feature in the upcoming Silverlight 3 release will provide an easy-to-use solution to this problem.</p>
<h3>Feature 2: Search engine discoverability</h3>
<p>If your Silverlight app links to another page, that link is invisible to search engines. As a result, pages on your site that cannot be reached via traditional HTML links will likely not get listed in search results.</p>
<p>The solution is simple: list all pages on your site in an XML document called a Sitemap. Host the Sitemap on your site, and submit the Sitemap link to search engines. Once you do that, the search engines should include all pages on your site in their indexes.</p>
<p>For details, see the <a href="http://en.wikipedia.org/wiki/Google_Sitemaps">Wikipedia article on Sitemaps</a>.</p>
<h3>Feature 3: Back and Forward browser buttons</h3>
<p>The Back and Forward browser buttons do not work in Silverlight apps. Switching to a different view in a Silverlight app does not insert a bookmark into the browser history, so the user can&#8217;t go to the previous view by clicking the Back button.</p>
<p>The problem is similar to the bookmarking issue (Feature 1), except that the solution is more complicated. If the Silverlight app modifies the bookmark part of the URL, the previous bookmark does not get recorded in the browser history, so the user still cannot use the Back button to return to the previous view.</p>
<p>It is in fact possible to get the Back and Forward buttons working. As of SP1 3.5, ASP.NET exposes a History Control that simplifies the solution significantly (see <a href="http://blog.webjak.net/2008/09/08/control-silverlight-by-using-browser-back-and-foward-buttons/">Control Silverlight by Using Browser Back and Forward Buttons</a>). There is a variety of solutions available online in case you are not using ASP.NET 3.5 SP1, but beware that they typically involve hacks that don&#8217;t necessarily work in all browsers.</p>
<p>Or, you can wait until Silverlight 3, because browser history is properly handled in the Navigation Framework.</p>
<h3>Feature 4: Support of external tools (translation, accessibility, &#8230;)</h3>
<p>Automated translation, special browsers for people with disabilities, automated RSS aggregating&#8230; these are just some examples of useful tools that consume web content. These tools typically process HTML, but don&#8217;t peek inside Silverlight applications.</p>
<p>Since HTML is the standard representation of online documents, consider whether it would make sense to expose a simplified version of your content as a simple HTML page, in addition to your Silverlight application.</p>
<h3>Feature 5: Printing</h3>
<p>Silverlight does not currently have much support for printing. In Internet Explorer 7, Silverlight applications show up on the printout, but in Firefox or Safary they do not.</p>
<p>So, a Firefox or Safari user will not be able to print out the information displayed by your Silverlight application (at least not without taking screenshots, etc).</p>
<p>If you need to support printing, you can try two different solution approaches. If you only need to print content that can be represented using HTML (e.g., no Silverlight-generated charts and such), see <a href="http://jonas.follesoe.no/PrintingInSilverlight2UsingCSSAndASPNETAJAX4.aspx">this solution</a>. An alternate approach is to <a href="http://blog.galasoft.ch/archive/2008/10/10/converting-and-customizing-xaml-to-png-with-server-side-wpf.aspx">convert XAML to an image format on the server</a>.</p>
<h3>Feature 6: Open link in new tab</h3>
<p>There doesn&#8217;t seem to be a very good way to implement the &#8216;Open link in new tab&#8217; feature in Silverlight. You can use a HyperlinkButton with a Target set to _blank, which seems to open a new tab in Firefox and a new Window in IE 7. But that still won&#8217;t allow the user to choose whether to open the page in the same or different tab/window.</p>
<p>If you are committed enough, you can probably implement a HyperlinkButton with a context menu allowing the user to open the link in a new tab/window. Right-clicking is not directly supported in Silverlight, but <a href="http://silverlight.net/blogs/msnow/archive/2008/07/01/tip-of-the-day-14-how-to-right-click-on-a-silverlight-application.aspx">workarounds do exist</a>.</p>
<p>Or, use hyperlinks sparingly in Silverlight apps.</p>
<h3>Feature 7: Find on this page</h3>
<p>I use the &#8216;Find on this page&#8217; feature (CTRL+F) extensively when browsing the web. But, content inside Silverlight apps is not searched by the browser.</p>
<p>A mitigation is to carefully design the app UI in a way that minimizes the user&#8217;s need to search. Data should be sortable and searchable in various ways, and important information should be shown prominently (e.g., the user&#8217;s own position on a scoreboard should be highlighted).</p>
<p>These are all good practices regardless of your development platform, but unavailability of the &#8216;Find on this page&#8217; feature makes them even more important.</p>
<h3>Conclusion</h3>
<p>Hopefully you&#8217;ll find these points helpful when designing your next Silverlight application. If you have other tips on extending browser features into Silverlight apps, let me know in the comments!</p>
<img src="http://feeds.feedburner.com/~r/igoro/~4/lyYbeDhNFKs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/7-tips-for-extending-browser-functionality-to-silverlight-apps/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>My YouTube debut: a RoboZZle demo video</title>
		<link>http://igoro.com/archive/my-youtube-debut-a-robozzle-demo-video/</link>
		<comments>http://igoro.com/archive/my-youtube-debut-a-robozzle-demo-video/#comments</comments>
		<pubDate>Fri, 01 May 2009 08:53:39 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[RoboZZle]]></category>

		<guid isPermaLink="false">http://igoro.com/?p=155</guid>
		<description><![CDATA[It took significantly longer than I expected, but the RoboZZle demo video is finally on YouTube.
The video made the reddit front page, so you can read the usual mixture of insightful, funny, and outright insane comments on the reddit page and the YouTube page.
My favorite funny comment is this one:

And finally, this is the video:

]]></description>
			<content:encoded><![CDATA[<p>It took significantly longer than I expected, but the RoboZZle demo video is finally on YouTube.</p>
<p>The video made the reddit front page, so you can read the usual mixture of insightful, funny, and outright insane comments on the <a href="http://www.reddit.com/r/programming/comments/8gs5j/demo_video_of_robot_programming_game_with_user/">reddit page</a> and the <a href="http://www.youtube.com/watch?v=MmqBVWi_Pc0">YouTube page</a>.</p>
<p>My favorite funny comment is this one:<br />
<img class="alignnone size-full wp-image-159" style="border: 2px solid gray" title="silverlight_tinfoilhat1" src="http://igoro.com/wordpress/wp-content/uploads/2009/05/silverlight_tinfoilhat1.png" alt="silverlight_tinfoilhat1" width="630" height="159" /></p>
<p>And finally, this is the video:<br />
<object width="640" height="505" data="http://www.youtube.com/v/MmqBVWi_Pc0&amp;hl=en&amp;fs=1&amp;rel=0" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/MmqBVWi_Pc0&amp;hl=en&amp;fs=1&amp;rel=0" /><param name="allowfullscreen" value="true" /></object></p>
<img src="http://feeds.feedburner.com/~r/igoro/~4/D0JOcGRpUlo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/my-youtube-debut-a-robozzle-demo-video/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
