<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"><channel><title>Vineet Gupta</title> <link>http://www.vineetgupta.com</link> <description>Cloud computing, distributed systems, high scalable systems, mobile-web, real-time web, functional programming, concurrency, algorithms and data structures</description> <lastBuildDate>Thu, 13 Oct 2011 21:30:29 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.0.1</generator> <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/VineetGupta" /><feedburner:info uri="vineetgupta" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><geo:lat>12.56</geo:lat><geo:long>77.38</geo:long><creativeCommons:license>http://creativecommons.org/licenses/by-sa/3.0/</creativeCommons:license><image><link>http://creativecommons.org/licenses/by-sa/3.0/</link><url>http://creativecommons.org/images/public/somerights20.gif</url><title>Some Rights Reserved</title></image><feedburner:emailServiceId>VineetGupta</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><feedburner:feedFlare href="http://add.my.yahoo.com/rss?url=http%3A%2F%2Ffeeds.feedburner.com%2FVineetGupta" src="http://us.i1.yimg.com/us.yimg.com/i/us/my/addtomyyahoo4.gif">Subscribe with My Yahoo!</feedburner:feedFlare><feedburner:feedFlare href="http://www.newsgator.com/ngs/subscriber/subext.aspx?url=http%3A%2F%2Ffeeds.feedburner.com%2FVineetGupta" src="http://www.newsgator.com/images/ngsub1.gif">Subscribe with NewsGator</feedburner:feedFlare><feedburner:feedFlare href="http://feeds.my.aol.com/add.jsp?url=http%3A%2F%2Ffeeds.feedburner.com%2FVineetGupta" src="http://o.aolcdn.com/favorites.my.aol.com/webmaster/ffclient/webroot/locale/en-US/images/myAOLButtonSmall.gif">Subscribe with My AOL</feedburner:feedFlare><feedburner:feedFlare href="http://www.bloglines.com/sub/http://feeds.feedburner.com/VineetGupta" src="http://www.bloglines.com/images/sub_modern11.gif">Subscribe with Bloglines</feedburner:feedFlare><feedburner:feedFlare href="http://www.netvibes.com/subscribe.php?url=http%3A%2F%2Ffeeds.feedburner.com%2FVineetGupta" src="http://www.netvibes.com/img/add2netvibes.gif">Subscribe with Netvibes</feedburner:feedFlare><feedburner:feedFlare href="http://fusion.google.com/add?feedurl=http%3A%2F%2Ffeeds.feedburner.com%2FVineetGupta" src="http://buttons.googlesyndication.com/fusion/add.gif">Subscribe with Google</feedburner:feedFlare><feedburner:feedFlare href="http://www.pageflakes.com/subscribe.aspx?url=http%3A%2F%2Ffeeds.feedburner.com%2FVineetGupta" src="http://www.pageflakes.com/ImageFile.ashx?instanceId=Static_4&amp;fileName=ATP_blu_91x17.gif">Subscribe with Pageflakes</feedburner:feedFlare><feedburner:feedFlare href="http://www.plusmo.com/add?url=http%3A%2F%2Ffeeds.feedburner.com%2FVineetGupta" src="http://plusmo.com/res/graphics/fbplusmo.gif">Subscribe with Plusmo</feedburner:feedFlare><feedburner:feedFlare href="http://www.live.com/?add=http%3A%2F%2Ffeeds.feedburner.com%2FVineetGupta" src="http://tkfiles.storage.msn.com/x1piYkpqHC_35nIp1gLE68-wvzLZO8iXl_JMledmJQXP-XTBOLfmQv4zhj4MhcWEJh_GtoBIiAl1Mjh-ndp9k47If7hTaFno0mxW9_i3p_5qQw">Subscribe with Live.com</feedburner:feedFlare><item><title>We all owe big to Dennis Ritchie</title><link>http://feedproxy.google.com/~r/VineetGupta/~3/ER3LQLrIUtQ/</link> <comments>http://www.vineetgupta.com/2011/10/we-all-owe-big-to-dennis-ritchie/#comments</comments> <pubDate>Thu, 13 Oct 2011 21:27:11 +0000</pubDate> <dc:creator>Vineet Gupta</dc:creator> <category><![CDATA[Personal]]></category><guid isPermaLink="false">http://www.vineetgupta.com/?p=10108</guid> <description>I just came back from IIT Guwahati and discovered that Dennis Ritchie has passed away Most people would know of Dennis Ritchie as the person who invented C and Unix. What most folks do not realize is how profound C and Unix were when they were created and what amazing impact they have had on &lt;a
href='http://www.vineetgupta.com/2011/10/we-all-owe-big-to-dennis-ritchie/'&gt;[...]&lt;/a&gt;</description> <content:encoded><![CDATA[<p>I just came back from IIT Guwahati and discovered that Dennis Ritchie has passed away</p><p>Most people would know of Dennis Ritchie as the person who invented C and Unix. What most folks do not realize is how profound C and Unix were when they were created and what amazing impact they have had on the industry.</p><p>Consider this &#8211; before C, you had to program directly to the instruction set of the machine you were programming on. Today when most hardware happens to be x86 or ARM, this seems simple, but back then, the variety of hardware was far more. And this hardware was different not just in terms of instruction sets, but also in terms of information representation, memory addressing, etc. Sure enough there were languages like Fortran and COBOL which could do math and data, but there was no general purpose language which could produce programs that could compete in terms of performance with custom code written for the hardware. Writing a portable program, much less a portable operating system was un-imaginable.</p><p>And yet, Dennis Ritchie imagined just such a world and invented a highly performant, high level programming language and then used that to write an operating system which could be ported to any hardware. This one change meant that programmers did not have to create a new OS for every new piece of hardware that got thrown at them &#8211; they could take for granted a set of tools, calls and programs, as being available and could spend time going up the value chain and doing more interesting things than just re-inventing the wheel. The result was a spurt of innovation and growth which allowed the industry to grow exponentially into a myriad different directions leading to the world we are in.</p><p>If you are a programmer, you owe a lot to Ritchie, whether you have ever programmed in C or worked on Unix or not.</p><p>[Edit: Bjarne Stroustrup expresses the same sentiment way more eloquently: <a
href="http://herbsutter.com/2011/10/12/dennis-ritchie/">http://herbsutter.com/2011/10/12/dennis-ritchie/</a></p><p>Most of us are happy to settle for incremental change. People like Ritchie don’t, and that’s what sets them apart.]</p> <div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/VineetGupta?a=ER3LQLrIUtQ:_DBykPKFuT4:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=ER3LQLrIUtQ:_DBykPKFuT4:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=ER3LQLrIUtQ:_DBykPKFuT4:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=ER3LQLrIUtQ:_DBykPKFuT4:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=ER3LQLrIUtQ:_DBykPKFuT4:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=ER3LQLrIUtQ:_DBykPKFuT4:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=ER3LQLrIUtQ:_DBykPKFuT4:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=ER3LQLrIUtQ:_DBykPKFuT4:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/VineetGupta/~4/ER3LQLrIUtQ" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://www.vineetgupta.com/2011/10/we-all-owe-big-to-dennis-ritchie/feed/</wfw:commentRss> <slash:comments>8</slash:comments> <feedburner:origLink>http://www.vineetgupta.com/2011/10/we-all-owe-big-to-dennis-ritchie/</feedburner:origLink></item> <item><title>The True Legacy of Steve Jobs</title><link>http://feedproxy.google.com/~r/VineetGupta/~3/-yIeHSX_4nY/</link> <comments>http://www.vineetgupta.com/2011/10/the-true-legacy-of-steve-jobs/#comments</comments> <pubDate>Thu, 06 Oct 2011 03:46:25 +0000</pubDate> <dc:creator>Vineet Gupta</dc:creator> <category><![CDATA[Personal]]></category><guid isPermaLink="false">http://www.vineetgupta.com/?p=10102</guid> <description>Steve Jobs passed away today &amp;#8211; I woke up and read the news on my iPhone, like much of the world. I never thought I would be so affected by the passing away of someone I never knew personally. But if you are in the world of tech and developing consumer products, it is not &lt;a
href='http://www.vineetgupta.com/2011/10/the-true-legacy-of-steve-jobs/'&gt;[...]&lt;/a&gt;</description> <content:encoded><![CDATA[<p>Steve Jobs passed away today &#8211; I woke up and read the news on my iPhone, like much of the world.</p><p>I never thought I would be so affected by the passing away of someone I never knew personally. But if you are in the world of tech and developing consumer products, it is not possible to have not thought about life as Steve Jobs &#8211; how his brain functions, how he figures out what should be built, how he sets the bar, how he hires and build teams, how he delves upon every aspect of the product, &#8230; you wish you had 0.1% of his ability, you wish you could be more like him, since being him means creating profound impact, and that is what you are really here for.</p><p>And this I believe is his true legacy. The products he helped create would have a shelf life of maybe a decade, Apple as a company would be perhaps super successful for another 2-3 decades, but what would inspire generations of current and future entrepreneurs, programmers and designers is the way he set about building products:</p><ul><li>leapfrogging the current state of the art, not incremental change</li><li>relentless focus on the user</li><li>not settling for less than perfection</li></ul><p>This approach to building products is his true legacy, and for that, the world would be a better place.</p><div></div> <div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/VineetGupta?a=-yIeHSX_4nY:lm3HinE0fls:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=-yIeHSX_4nY:lm3HinE0fls:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=-yIeHSX_4nY:lm3HinE0fls:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=-yIeHSX_4nY:lm3HinE0fls:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=-yIeHSX_4nY:lm3HinE0fls:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=-yIeHSX_4nY:lm3HinE0fls:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=-yIeHSX_4nY:lm3HinE0fls:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=-yIeHSX_4nY:lm3HinE0fls:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/VineetGupta/~4/-yIeHSX_4nY" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://www.vineetgupta.com/2011/10/the-true-legacy-of-steve-jobs/feed/</wfw:commentRss> <slash:comments>4</slash:comments> <feedburner:origLink>http://www.vineetgupta.com/2011/10/the-true-legacy-of-steve-jobs/</feedburner:origLink></item> <item><title>Deciphering Complex C Declarations</title><link>http://feedproxy.google.com/~r/VineetGupta/~3/SoclXVUdbk4/</link> <comments>http://www.vineetgupta.com/2011/03/deciphering-complex-c-declarations/#comments</comments> <pubDate>Thu, 17 Mar 2011 12:47:40 +0000</pubDate> <dc:creator>Vineet Gupta</dc:creator> <category><![CDATA[Languages]]></category> <category><![CDATA[C declarations]]></category><guid isPermaLink="false">http://www.vineetgupta.com/?p=10094</guid> <description>As I have mentioned in earlier posts, we are building a x-platform mobile chat client. One of the platforms is iPhone and that means that the team is having to learn Objective C, and therefore C. Last couple of days, I have been spending time  helping my team understand and appreciate C better. Yesterday, we &lt;a
href='http://www.vineetgupta.com/2011/03/deciphering-complex-c-declarations/'&gt;[...]&lt;/a&gt;</description> <content:encoded><![CDATA[<p>As I have mentioned in <a
href="http://www.vineetgupta.com/2011/03/mobile-platforms-part-1-android/"> earlier</a> <a
href="http://www.vineetgupta.com/2011/03/mobile-platforms-part-2-blackberry/"> posts</a>, we are building a x-platform mobile chat client. One of the platforms  is iPhone and that means that the team is having to learn Objective C, and  therefore C. Last couple of days, I have been spending time  helping my team understand and appreciate C better. Yesterday, we  got really deep on figuring out how to read a C declaration and it was a lot of  fun taking people thru the rules and examples of how to read some complex C  declarations. I am summarizing the key aspects over here.</p><p>PS: Could not cross link to references since there is no authoritative version  of the <a
href="http://en.wikipedia.org/wiki/C99">C-99</a> standard in an HTML  form &#8211; the <a
href="http://www.open-std.org/jtc1/sc22/wg14/www/standards">standard docs</a> are all in <a
href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf"> PDF</a>.</p><h3>C Declaration Grammar</h3><p>Understanding a C declaration requires that we understand what elements can a  C declaration have. While I will describe the overall syntax over here, I will  not go into semantics:</p><pre><em>declaration:
	declaration-specifiers init-declarator-list<sub>opt</sub> ;
declaration-specifiers:
	storage-class-specifier declaration-specifiers<sub>opt</sub>
	type-specifier declaration-specifiers<sub>opt</sub>
	type-qualifier declaration-specifiers<sub>opt</sub>
	function-specifier declaration-specifiers<sub>opt</sub>
init-declarator-list:
	init-declarator
	init-declarator-list , init-declarator
init-declarator:
	declarator
	declarator = initializer</em></pre><p>This is what the above grammar is saying:</p><ul><li>A declaration is:<ul><li>declaration-specifiers followed by an optional list of  		init-declarators</li></ul></li><li>The declaration-specifiers are:<ul><li>A storage-class specifier, followed by other optional declaration  		specifiers</li><li>A type-specifier, followed by other optional declaration specifiers</li><li>A type-qualifier, followed by other optional declaration specifiers</li><li>A function-specifier, followed by other optional declaration  		specifiers</li></ul></li><li>List of init-declarators is:<ul><li>An init-declarator</li><li>An init-declarator list, followed by a comma, followed by an  		init-declarator</li></ul></li><li>An init-declarator is:<ul><li>A declarator</li><li>A declarator followed by an assignment operator (=),  followed  		by an initializer</li></ul></li></ul><h3>Declaration Specifiers</h3><p>As per the grammar defined  above, a declaration starts with a declaration specifier. A  declaration-specifier can be:</p><ul><li><strong>Storage-class specifier:</strong> These are: <code>auto</code>, <code>static</code>, <code>extern</code>, <code>register</code> and <code>typedef</code>. Default is <code>auto</code>, so if nothing is specified, <code>auto</code> is assumed.</li><li><strong>Type-specifier:</strong> A type-specifier is one of the following. Default is <code>int</code><ul><li><code>void</code></li><li><code>char</code></li><li><code>signed char</code></li><li><code>unsigned char</code></li><li><code>short</code>, <code>signed short</code>, <code>short int</code>, or <code>signed short int</code></li><li><code>unsigned short</code> or <code>unsigned short int</code></li><li><code>int</code>, <code>signed</code> or <code>signed int</code></li><li><code>unsigned</code> or <code>unsigned int</code></li><li><code>long</code>, <code>signed long</code>, <code>long int</code> or <code>signed long int</code></li><li><code>unsigned long</code> or <code>unsigned long int</code></li><li><code>long long</code>, <code>signed long long</code>, <code>long long int</code>, or <code>signed long long int</code></li><li><code>unsigned long long</code> or <code>unsigned long long int</code></li><li><code>float</code></li><li><code>double</code></li><li><code>long double</code></li><li><code>_Bool</code></li><li><code>float _Complex</code></li><li><code>double _Complex</code></li><li><code>long double _Complex</code></li><li><em>struct-or-union-specifier</em></li><li><em>enum-specifier</em></li><li><em>typedef-name</em></li></ul></li><li><strong>Type-Qualifer:</strong> Applicable only to <a
href="http://en.wikipedia.org/wiki/Value_(computer_science)">l-values</a>: <code>const</code>, <code>volatile</code>, <code>restrict</code></li><li><strong>Function-specifier:</strong> There is only one function specifier: <code>inline</code></li></ul><h3>Declarators</h3><p>What follows a declaration-specifier is a init-declarator-list &#8211; a comma  separated sequence of declarators that may optionally be initialized. A declarator [2] is defined as:</p><pre><em>declarator:
	pointer<sub>opt</sub> direct-declarator
direct-declarator:
	identifier
	( declarator )
	direct-declarator [ type-qualifier-list<sub>opt</sub> assignment-expression<sub>opt</sub> ]
	direct-declarator [ static type-qualifier-list<sub>opt</sub> assignment-expression ]
	direct-declarator [ type-qualifier-list static assignment-expression ]
	direct-declarator [ type-qualifier-list<sub>opt</sub> * ]
	direct-declarator ( parameter-type-list )
	direct-declarator ( identifier-list<sub>opt</sub> )
pointer:
	* type-qualifier-list<sub>opt</sub>
	* type-qualifier-list<sub>opt</sub> pointer
type-qualifier-list:
	type-qualifier
	type-qualifier-list type-qualifier
parameter-type-list:
	parameter-list
	parameter-list , ...
parameter-list:
	parameter-declaration
	parameter-list , parameter-declaration
parameter-declaration:
	declaration-specifiers declarator
	declaration-specifiers abstract-declarator<sub>opt</sub>
identifier-list:
	identifier
	identifier-list , identifier</em></pre><p>We can break down the above grammar the same way we did for the grammar of a  declaration, but I&#8217;m gonna skip that and give some examples of declarators, just  to give an idea of what declarators are supposed to be:</p><ul><li><code>i</code> &#8211; An identifier (and hence a direct-declarator)</li><li><code>i, j</code> &#8211; A list of identifiers</li><li><code>i = 10</code> &#8211; An Identifer followed by an assignment expression</li><li><code>*p</code> &#8211; A pointer declarator</li><li><code>* const p</code> &#8211; A pointer declarator with a type-qualifer (<code>const</code>)</li><li><code>a[10]</code> &#8211; An array declarator with a constant size</li><li><code>a[*]</code> &#8211; An array declarator with a variable size</li><li><code>a[]</code> &#8211; An array declarator with an unspecified size &#8211; the size will need to be defined somewhere else</li><li><code>f()</code> &#8211; A function declarator</li><li><code>f(void)</code> &#8211; A function declarator with no parameters</li><li><code>f(int i)</code> &#8211; A function declarator with a parameter</li><li><code>f(int i, int j)</code> &#8211; A function declarator with a parameter-list</li></ul><h3>Structure of a Declaration</h3><p>Now that we have developed an idea of declaration-specifiers and declarators,  we can see that the overall structure of a declaration is of the following form:</p><ol><li>One or more declaration-specifiers followed by</li><li>One or more declarators (separated by commas)</li></ol><p>However, not all combinations are valid. The following are not allowed:</p><pre>f()[] // function can't return an array
f()() // function can't return a function
a[]() // array can't hold a function</pre><p>The following, however, are allowed:</p><pre>(* f())[] // function returning pointer to an array
(* f())() // function returning pointer to a function
(* a[])() // array holding pointers to functions</pre><p>If you are flummoxed by the last three examples of combining pointers,  functions and arrays, you are not alone. Such declarations can look scary till  you develop a technique to start deciphering them. In order to do that however,  we need to know the basic order of precedence</p><h3>Operator Precedence</h3><ol><li>Parentheses grouping together parts of a declaration</li><li>Postfix operators: parentheses (for a function), square brackets (for an  	array)</li><li>Prefix operator: Asterisk (for a pointer)</li></ol><p>Here are some examples of how this order applies:</p><table
style="width: 100%;"><tbody><tr><td>Declaration</td><td>Same as</td><td>Not Same as</td></tr><tr><td><code>int * f();</code></td><td><code>int * (f());</code></td><td><code>int (* f());</code></td></tr><tr><td><code>int * a[];</code></td><td><code>int * (a[]);</code></td><td><code>int (* a[]);</code></td></tr></tbody></table><p>Now the way to use these rules is as follows:</p><ol><li>Read from left to right</li><li>For the first identifier you encounter (or if there is no identifier,  	then look for the inner-most construct), look to the immediate right<ul><li>If there is nothing, or if you have a closing parenthesis, go to 3</li><li>Otherwise you have a function declarator indicated by () or an array  		declarator indicated by [] to the right of the identifier</li><li>Read left to right &#8211; you will have a &#8220;function returning&#8221; or &#8220;array  		of&#8221;</li><li>This would typically end with a right parenthesis, or the end of the  		declarator (semi-colon or assignment)</li></ul></li><li>Look to the left<ul><li>If you find nothing on the left, or if you find an opening  		parenthesis, go to 4</li><li>Otherwise you have a pointer declarator indicated by an asterisk to  		the left of the identifier. Read right to left &#8211; you will get a &#8220;pointer  		to&#8221;</li><li>This would end with a left parenthesis, or start of declarator</li></ul></li><li>At this point you have either<ul><li>A complete declarator &#8211; then you are done</li><li>Or an expression in parenthesis &#8211; go back to step 2.</li></ul></li></ol><p>If you did not realize this on reading the above description, what we are  really doing in steps 2 and 3 is to convert the C declaration into a postfix  expression by taking into account the lower predence of the asterisk. This same  approach is used in the <a
href="http://www.cdecl.org/">cdecl</a> program given in K&amp;R.</p><p>This will become clear with a few examples:</p><pre>int * f();</pre><ul><li>Start with identifer &#8211; f</li><li>Parenthesis to the right &#8211; &#8220;function returning&#8221;</li><li>Asterisk to the left &#8211; &#8220;pointer to&#8221;</li><li>int</li></ul><p>The postfix expression is:</p><pre>f () * int // "f is" "a function returning" "pointer to int"</pre><p>Here are a few more examples:</p><pre>int * a[10];
// postfix expression: a [10] * int
// "a is" "array of 10" "pointers to int"</pre><pre>int (* a)[10]
// postfix: a * [10] int
// "a is" "pointer to" "array of 10" "int"</pre><pre>int **p;
// postfix: p * * int
// "p is" "pointer to" "pointer to" int</pre><pre>int **p[10];
// postfix: p [10] * * int
// "p is" "array of 10" "pointer to" "pointer to" int</pre><pre>int *f();
// postfix: f () * int
// "f is" "function returning" "pointer to" int</pre><pre>int (*f)();
// postfix: f * () int
// "f is" "a pointer to" "function returning" int</pre><pre>int (* vtable[])();
// postfix: vtable [] * () int
// vtable is an array of pointer to function returning int</pre><pre>int (* a[])(int, int);
// postfix: a [] * (int, int) int
// a is an array of pointer to function returning int
// and taking (int, int) as parameters</pre><h3>Applying Type-Qualifiers</h3><p>The way you apply a type-qualifier (<code>const</code>, <code>volatile</code>) is that:</p><ol><li>If next to a type-specifer, it applies to the type-specifier</li><li>Otherwise applies to the asterisk (pointer) on its immediate left</li></ol><p>Examples:</p><pre>const int * p;
// postfix: p * const int
// p is a pointer to a const int</pre><pre>int const * p;
// postfix: p * const int
// p is a pointer to a const int</pre><pre>int * const p;
// postfix: p const * int
// p is a const pointer to int</pre><pre>int * const * p;
// postfix: p * const * int
// p is a pointer to a const pointer to int</pre><pre>int * const * (* p)();
// postfix: p * () * const * int
// p is a pointer to a function returning a
// pointer to const-pointer-to-int</pre><h3>Abstract Declarators</h3><p>An abstract declarator is a declarator without an identifer. However, lack of  an identifier does not mean that we can&#8217;t interpret such declarators, since the  missing identifier&#8217;s position can be determined by the placement of (), [] and *  Some examples will help clarify this:</p><pre>int *
// pointer to int</pre><pre>int * [10]
// array of 10 pointers to int</pre><pre>int * (*)
// function returing a pointer to int and taking no arguments</pre><pre>int (*) [10]
// pointer to array of 10 int</pre><pre>int (*) (*)
// pointer to a function returning an int and taking no arguments</pre><p>With this background, lets tackle this gem from Andrew Koenig&#8217;s C Traps and  Pitfalls (<a
href="http://www.literateprogramming.com/ctraps.pdf">PDF</a>):</p><pre>(*(void(*)())0)();</pre><p>Clearly, we are casting 0 to some type here. The type being:</p><pre>void(*)()</pre><p>We know this type:</p><pre>// postifx: * () void
// pointer to function returning void</pre><p>In the next step, this pointer is de-referenced and then called. So what is  happening here is that zero is being cast to a pointer to a function returning  void, then dereferenced and called!</p> <div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/VineetGupta?a=SoclXVUdbk4:sflFZcBsr4g:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=SoclXVUdbk4:sflFZcBsr4g:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=SoclXVUdbk4:sflFZcBsr4g:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=SoclXVUdbk4:sflFZcBsr4g:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=SoclXVUdbk4:sflFZcBsr4g:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=SoclXVUdbk4:sflFZcBsr4g:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=SoclXVUdbk4:sflFZcBsr4g:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=SoclXVUdbk4:sflFZcBsr4g:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/VineetGupta/~4/SoclXVUdbk4" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://www.vineetgupta.com/2011/03/deciphering-complex-c-declarations/feed/</wfw:commentRss> <slash:comments>4</slash:comments> <feedburner:origLink>http://www.vineetgupta.com/2011/03/deciphering-complex-c-declarations/</feedburner:origLink></item> <item><title>Mobile Platforms – Part 2 – Blackberry</title><link>http://feedproxy.google.com/~r/VineetGupta/~3/q3t068xNfHc/</link> <comments>http://www.vineetgupta.com/2011/03/mobile-platforms-part-2-blackberry/#comments</comments> <pubDate>Sat, 12 Mar 2011 13:56:34 +0000</pubDate> <dc:creator>Vineet Gupta</dc:creator> <category><![CDATA[Mobile]]></category> <category><![CDATA[Blackberry Java Development]]></category><guid isPermaLink="false">http://www.vineetgupta.com/?p=10081</guid> <description>As I mentioned on my previous post, we are building a x-platform native mobile chat client for Android, iPhone and Blackberry. The earlier post was about the Android platform, this one is on Blackberry. After having gone thru the Android documentation and videos from Google IO, I found the documentation on Blackberry to be somewhat &lt;a
href='http://www.vineetgupta.com/2011/03/mobile-platforms-part-2-blackberry/'&gt;[...]&lt;/a&gt;</description> <content:encoded><![CDATA[<p>As I mentioned on my <a
href="http://www.vineetgupta.com/2011/03/mobile-platforms-part-1-android/"> previous post</a>, we are building a x-platform native mobile chat client  for Android, iPhone and Blackberry. The earlier post was about the Android  platform, this one is on Blackberry.</p><p>After having gone thru the Android  documentation and videos from Google IO, I found the <a
href="http://us.blackberry.com/developers">documentation</a> on Blackberry to  be somewhat disorganized &#8211; it is not easy to get the big picture and grasp key  concepts. You need to go thru a lot of documentation to figure out how things works  together as a system. I also tried to access the materials from <a
href="http://www.blackberrydevcon.com/2010-on-demand"> http://www.blackberrydevcon.com/2010-on-demand</a> but it turned out that it would  require a registration for $ 20. That was a bit of a WTF moment &#8230; I mean, you  want to make money by selling conference materials? Who does that?? Anyway I  proceeded to register on <a
href="https://www.eventreg.com/pls/pg_rim/reg_rim.reg_login?icpid=&amp;ipr_code=ONDNON"> https://www.eventreg.com/pls/pg_rim/reg_rim.reg_login?icpid=&amp;ipr_code=ONDNON</a> and after filling two lengthy forms, was unable to make a payment despite  getting a registration id. I wrote to <a
href="mailto:BlackBerryDEVCON@eventreg.com">BlackBerryDEVCON@eventreg.com</a> with my registration id (this was on 3rd Mar),  but am yet to hear from them. Rather discourgaing for somene starting off on a  new platform.</p><h2>The Blackberry Platform</h2><p>The Blackberry platform consists of the following key components:</p><ul><li><a
href="http://us.blackberry.com/developers/started/">Phone</a>:  	Blackberry phones support two kind of applications:<ul><li><a
href="http://us.blackberry.com/developers/javaappdev/">Java Apps</a>:  		These are native Blackberry apps which can take full advantages of all  		the features available in a Blackberry phone, for example the <a
href="http://us.blackberry.com/developers/blackberrymessenger/">BBM  		API</a>.</li><li><a
href="http://us.blackberry.com/developers/browserdev/index.jsp"> WebWorks Apps</a>: Browser based apps built using HTML, CSS, JS and capable  		of accessing native APIs thru the <a
href="http://us.blackberry.com/developers/browserdev/widgetsdk.jsp"> WebWorks SDK</a>.</li></ul></li><li><a
href="http://us.blackberry.com/developers/tablet/">Tablet</a>:  	One can build playbook tablet applications in two ways:<ul><li><a
href="http://us.blackberry.com/developers/tablet/adobe.jsp">Adobe  		AIR apps</a>: Based on Action Script 3 with a bunch of Blackberry  		specific APIs. Use this to build apps with a <a
href="http://docs.blackberry.com/en/developers/deliverables/24075/UI_APIs_for_Tablet_OS_look_and_feel_1354896_11.jsp"> native look and feel</a>.</li><li><a
href="http://us.blackberry.com/developers/tablet/webworks.jsp"> WebWorks apps</a>: Similar to WebWorks apps for the Blackberry phone.</li></ul></li><li><a
href="http://us.blackberry.com/developers/platform/">Platform Services</a>:  	Bunch of services provided by RIM including <a
href="http://us.blackberry.com/developers/platform/pushapi.jsp">Push</a>, <a
href="http://us.blackberry.com/developers/platform/locateservice/"> Location</a>, <a
href="http://us.blackberry.com/developers/platform/adservices/"> Advertising</a> and <a
href="http://us.blackberry.com/developers/platform/paymentservice.jsp"> Payment</a> services</li><li>Connectivity Services: RIM also provides a couple of connectivity  	services:<ul><li><a
href="http://us.blackberry.com/apps-software/server/">Blackberry  		Enterprise Server (BES)</a>: Enterprises can make their mail  		infrastructure available over the Blackberry network using BES which  		acts as a relay between the mail server and Blackberry&#8217;s NOC co-located  		with a carrier providing Blackberry services. One of the components of  		BES called the <a
href="http://docs.blackberry.com/en/admin/deliverables/7335/BB_MDS_267706_11.jsp"> Mobile Data System (MDS)</a> provides  HTTP and TCP proxies for a  		Blackberry application. This allows a Blackberry device to communicate  		with app / web-servers behind a corporate firewall without using VPN.</li><li><a
href="http://us.blackberry.com/support/software/internet.jsp"> Blackberry Internet Services (BIS)</a>: For individuals, the alternative  		to BES is the BIS. This provides the ability to access mail over POP and  		IMAP email without going thru BES. The service is provided in  		partnership with a carrier.</li></ul></li><li><a
href="http://us.blackberry.com/developers/appworld/distribution.jsp"> AppWorld</a>: The blackberry application store front, similar to the iPhone  	App Store</li></ul><p>This post is specific to Java development for the Blackberry phones.  Moreover, it is specific to Blackberry 5 since <a
href="http://us.blackberry.com/developers/choosingtargetos.jsp">75-85% of app  downloads from the App World are from Blackberry 5 devices</a>.</p><h2>Java ME</h2><p>Native apps on the Blackberry phone are Java apps that run on RIM&#8217;s JVM  implementation. This Java API is based on Java ME with Blackberry adding its own  extensions. If you are new to Java ME like I am, here is a quick review:</p><p>Java ME (Micro Edition) is Java targeted for small devices. It is a  collection of specs and technologies to address the needs of providing a  programming interface to small devices. Since there is a large variety in the  types of devices and their capabilities, it is quite likely that a device  manufacturer would implement some specs and not others, leading to API  fragmentation and hence non-portable code. To address this, Java ME defines  capabilities for devices which are organized into  Configurations, Profiles and Optional APIs:</p><p><img
src="http://developers.sun.com/mobility/getstart/articles/survey/fig1.jpg" alt="Organization of the Java ME technologies" /></p><p>(Organization of the Java ME technologies &#8211; source: <a
href="http://developers.sun.com/mobility/getstart/articles/survey"> http://developers.sun.com/mobility/getstart/articles/survey/</a>)</p><ul><li><strong>Configurations</strong>: A configuration is a specification for  	a virtual machine + some base APIs designed for a specific kind of device,  	based on memory constraints and processor power.<ul><li><a
href="http://jcp.org/en/jsr/detail?id=139">Connected Limited   		Device Configuration (CLDC)</a>: The CLDC is meant for small wireless  		devices with intermittent network connections, like pagers, mobile  		phones and PDAs</li><li><a
href="http://jcp.org/en/jsr/detail?id=218">Connected Device  		Configuration (CDC)</a>:  The CDC is a superset of the CLDC meant for larger devices with  		robust connections, like Set-top boxes and internet appliances.</li></ul></li><li><strong>Profiles:</strong> A profile is more specific than a  	configuration. A profile is based on a configuration and adds APIs for UI,  	Storage, or whatever else is required to develop apps. For CLDC, a profile called <a
href="http://jcp.org/en/jsr/detail?id=118">Mobile Information  		Device Profile (MIDP)</a> is specified. For CDC, there is a hierarchy of profiles:<ul><li><a
href="http://jcp.org/en/jsr/detail?id=219">Foundation Profile</a></li><li><a
href="http://jcp.org/en/jsr/detail?id=217">Personal Basis Profile</a></li><li><a
href="http://jcp.org/en/jsr/detail?id=215">Personal Profile</a></li><li><a
href="http://jcp.org/en/jsr/detail?id=228">Information Module Profile</a></li></ul></li><li><strong>Optional APIs:</strong> A device implementing Java ME may define  	additional optional APIs besides the ones that are defined as a part of a  	Java ME profile.</li></ul><p>Based on the configurations, profiles and optional APIs, Java ME is able to  address a wide variety of small devices including Handsets, Smart-cards and  embedded devices:</p><p><img
src="http://developers.sun.com/mobility/getstart/articles/survey/fig2.jpg" alt="Java ME technologies by device type" width="600" height="242" /></p><p>(Map of the Java ME Universe &#8211; source: <a
href="http://developers.sun.com/mobility/getstart/articles/survey/"> http://developers.sun.com/mobility/getstart/articles/survey/</a>)</p><p>A device implements a stack consisting of a Configuration, Profile and  Optional APIs. For wireless devices like mobile phones, the stack is based on  CLDC, MIDP and some optional APIs. An application targeting the MIDP profile is  called a MIDlet.</p><p><img
src="http://www.oracle.com/ocom/groups/public/@otn/documents/digitalasset/148676.jpg" alt="Stack for Wireless devices" width="358" height="277" /></p><p>(Stack for wireless devices. Source: <a
href="http://www.oracle.com/technetwork/java/javame/tech/technology-139316.html"> http://www.oracle.com/technetwork/java/javame/tech/technology-139316.html</a>)</p><p>As one can imagine, despite the CLDC and MIDP being defined, a lot of  commonly required functionality and the way it is provided remains un-defined,  thus introducing scope for API fragmentation and un-portable code. This was  addressed thru <a
href="http://jcp.org/en/jsr/detail?id=185">JSR 185 &#8211; Java  Technology for the Wireless Industry</a> &#8211; a specification that addresses  fragmentation by defining key APIs and code portability issues by clarifying  spces and providing a suite of compliance tests. JSR 185 consists of CLDC 1.0 or  1.1, MIDP 2.0 and WMA (Windows Messaging API &#8211; <a
href="http://jcp.org/en/jsr/detail?id=120">JSR 120</a> and <a
href="http://jcp.org/en/jsr/detail?id=205">JSR 205</a>), with support for  MMAPI (Mobile Media API &#8211; <a
href="http://jcp.org/en/jsr/detail?id=135">JSR 135</a>)  being optional:</p><p><img
src="http://developers.sun.com/mobility/getstart/images/stack2.jpg" alt="JSR 185 - JTWI Stack" width="330" height="208" /></p><p>(JSR 185 &#8211; JTWI Stack. Source: <a
href="http://developers.sun.com/mobility/getstart/"> http://developers.sun.com/mobility/getstart/</a>)</p><p>This was superseded by the <a
href="http://developers.sun.com/mobility/midp/articles/msaintro/">MSA (Mobile  Services Architecture) specification</a> which defined two stacks: a full stack  of 16 JSRs and a subset of 8 JSRs:</p><p><img
src="http://developers.sun.com/mobility/midp/articles/msaintro/fig1.gif" alt="MSA Stacks" width="359" height="320" /></p><p>(MSA Stacks. Source: <a
href="http://developers.sun.com/mobility/midp/articles/msaintro/"> http://developers.sun.com/mobility/midp/articles/msaintro/</a>)</p><p>Blackberry 5 <a
href="http://docs.blackberry.com/en/developers/deliverables/9095/Java_ME_and_Java_APIs_for_BB_446980_11.jsp"> is based on</a> <a
href="http://jcp.org/en/jsr/detail?id=139">CLDC 1.1</a>, <a
href="http://jcp.org/en/jsr/detail?id=118">MIDP 2.0</a> and is JSR-185 compliant. It also supports the full MSA stack except for JSR 184  (3-D graphics), JSR 180 (SIP) and JSR 229 (Payment). For other JSRs supported by  Blackberry 5, see <a
href="http://docs.blackberry.com/en/developers/deliverables/9095/Support_for_standard_Java_APIs_446981_11.jsp"> http://docs.blackberry.com/en/developers/deliverables/9095/Support_for_standard_Java_APIs_446981_11.jsp</a></p><h2>Blackberry Applications</h2><p>Blackberry applications are of two types:</p><ul><li> <a
href="http://docs.blackberry.com/en/developers/deliverables/9095/MIDlet_applications_446987_11.jsp">MIDlet  	Application</a>: A MIDlet is an application targeting MIDP  	(javax.microedition namespace). Such an app can run on other MIDP 2.0  	compliant phones from other vendors</li><li> <a
href="http://docs.blackberry.com/en/developers/deliverables/9095/BlackBerry_API_applications_647766_11.jsp">Blackberry Java Application</a>:  	A Blackberry Java Application (also called a RIMlet) targets the CLDC, but  	uses Blackberry specific APIs (from the net.rim namespace). Such an app can  	run only on a Blackberry device.</li></ul><p>Unless your code is targeting mulitple Java ME platforms, it makes sense to  build Blackberry Java applications since they tend to be a lot <a
href="http://docs.blackberry.com/en/developers/deliverables/9095/BlackBerry_API_applications_647766_11.jsp">richer</a> than  MIDlets. Specifically, if you want the look and feel of applications that RIM  ships with a Blackberry (also called &#8220;native apps&#8221;), then you should build a  Blackberry Java application by using <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/package-summary.html"> net.rim.device.api.ui</a> and not use <a
href="http://www.blackberry.com/developers/docs/5.0.0api/javax/microedition/lcdui/package-summary.html"> javax.microedition.lcdui</a>. More differences on the UI capabilities of both  frameworks are given on <a
href="http://www.blackberry.com/developers/docs/5.0.0api/UI-summary.html"> http://www.blackberry.com/developers/docs/5.0.0api/UI-summary.html</a>.</p><p>Note that all APIs are always available to all apps. So if your app is using  the Blackberry UI, you can always invoke MIDP APIs. If your app is using the MIDP UI, you can always  invoke Blackberry APIs, but of course doing so would render the application  non-portable. The only exception to this is on mixing UI. The MIDP UI model  allows concurrent access while the Blackberry UI model does not and hence mixing  UI library calls is unsupported.</p><p>In the rest of this post, I would be focussing mostly on the Blackberry Java  Application.</p><h3>Startup</h3><p>A Blackberry application can start in one of the following ways:</p><ol><li>Use clicking an icon on the Home screen</li><li>By another app</li><li>OS starting an app automatically when the device starts</li><li>OS starting an app at a scheduled time</li></ol><p>This startup is managed by an entity called the <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/ApplicationManager.html"> Application Manager</a>. An running application can obtain a reference to the  Application Manager by calling <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/ApplicationManager.html#getApplicationManager()"> getApplicationManager()</a>, and then, can use this to:</p><ul><li> <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/ApplicationManager.html#runApplication(net.rim.device.api.system.ApplicationDescriptor, boolean)"> Run</a> or <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/ApplicationManager.html#launchApplication(java.lang.String)"> Launch</a> another application</li><li> <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/ApplicationDescriptor.html"> Schedule</a> an application to run at a specific time</li><li> <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/ApplicationManager.html#postGlobalEvent(int, long, int, int, java.lang.Object, java.lang.Object)"> Post</a> a global event to the OS</li><li> <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/ApplicationManager.html#lockSystem(boolean)"> Lock</a> and <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/ApplicationManager.html#unlockSystem()"> Unlock</a> the system</li></ul><p>To start an application, the App Manager creates a new process and a thread  within that process and then starts the main() function on that thread with a  set of input parameters. The parameters are specified in the <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/ApplicationDescriptor.html"> application descriptor</a>. It is possible to create multiple descriptors for  the same app and associate them with different icons on the screen, This allows  the <a
href="http://www.blackberry.com/knowledgecenterpublic/livelink.exe/fetch/2000/348583/800901/How_To_-_Setup_an_alternate_entry_point_for_my_application.html?nodeid=800820&amp;vernum=0"> creation of multiple entry points</a> for the same app.</p><h3>Event Processing</h3><p>Once started, an application can choose to do a series of operations and then  quit, never showing a UI. However, most applications want to stay around and  respond to system events, including UI events. For this, one of the threads of  the application needs to acquire what is called the Event Lock by calling <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/Application.html#getEventLock()"> Application.getEventLock()</a>. Only one thread in the application can acquire  this lock &#8211; this thread is then called the <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/Application.html#isEventThread()"> Event Thread</a>. Typically, the first thread acquires this lock by calling <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/Application.html#enterEventDispatcher()"> enterEventDispatcher()</a>. Once this method is called, the application starts  receiving system events. Note that this does not mean that the application is  showing a UI &#8211; just that is it receiving all system events, including UI events.  So as of now, the application is still running in the background. An application  can control whether it wants to continue processing systems events by toggling <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/Application.html#setAcceptEvents(boolean)"> setAcceptEvents()</a>.</p><h3>Showing a UI</h3><p>All the visible elements in a Blackberry Java Application are of three types:</p><ul><li> <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/Field.html"> Fields</a>: A building block element that is used to work with a particular  	kind of data for which it draws the UI and if required handles user-input.  	It typically corresponds to a control like a <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/component/ButtonField.html"> button</a> / <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/component/TextField.html"> text-field</a>, etc. You can create custom Fields by sub-classing the Field  	class, implementing the <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/Field.html#layout(int, int)"> layout()</a> and <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/Field.html#paint(net.rim.device.api.ui.Graphics)"> paint()</a> abstract methods and optionally overriding <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/Field.html#getPreferredHeight()"> getPreferredHeight()</a> and <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/Field.html#getPreferredWidth()"> getPreferredWidth()</a> methods.</li><li> <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/Manager.html"> Managers</a>: Responsible for arranging fields. Each field must belong to  	only one manager. Since a manager is also a field, managers are composable,  	making it possible to build complex, rich, UI layouts. There are several  	types of managers provided: <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/container/HorizontalFieldManager.html"> Horizontal</a>, <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/container/VerticalFieldManager.html"> Vertical</a>, <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/container/GridFieldManager.html"> Grid</a>, <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/container/FlowFieldManager.html"> Flow</a>, <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/container/AbsoluteFieldManager.html"> Absolute</a>, etc., and you can also build your own by sub-classing the  	Manager class and implementing <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/Manager.html#sublayout(int, int)"> sublayout()</a>, <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/Field.html#getPreferredHeight()"> getPreferredHeight()</a> and <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/Field.html#getPreferredWidth()"> getPreferredWidth()</a> methods and optionally overriding <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/Manager.html#subpaint(net.rim.device.api.ui.Graphics)"> subpaint()</a> method.</li><li> <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/Screen.html"> Screens</a>: A screen is a window that is active at any given time. There  	are two types of screens: <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/container/PopupScreen.html"> PopupScreen</a> and a <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/container/FullScreen.html"> FullScreen</a>.</li></ul><p><img
src="http://www.blackberry.com/developers/docs/5.0.0api/images/UI_BB.jpg" alt="Blackberry UI Framework" width="564" height="315" /></p><p>(Blackberry UI Framework. Source: <a
href="http://www.blackberry.com/developers/docs/5.0.0api/UI-summary.html"> http://www.blackberry.com/developers/docs/5.0.0api/UI-summary.html</a>)</p><p>The way the system works is that the <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/UiEngine.html"> UI Engine</a> maintains a stack of Screen objects. As new screens are created,  they are <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/UiApplication.html#pushScreen(net.rim.device.api.ui.Screen)"> pushed</a> on the display stack. Only the screen at the top receives input  events. When a screen is <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/UiApplication.html#popScreen(net.rim.device.api.ui.Screen)"> popped</a> from the stack, the underlying screens are drawn as necessary. Note  that the Blackberry display stack is not the same as the Android back-stack. The  Android back-stack is a system-managed structure, scoped to a task with  acitivities from different processes getting interleaved. The Blackberry display  stack is a per-process application-managed stack of Screens.</p><p>To show a UI, the application extends the <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/package-summary.html"> net.rim.device.api.ui.UiApplication</a> class (which implements <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/UiEngine.html"> UiEngine</a> and extends the <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/Application.html"> Application</a> class), creates an instance of <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/container/MainScreen.html"> MainScreen</a> (which is of type <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/container/FullScreen.html"> FullScreen</a>) and then pushes this instance on the display stack.</p><pre>
class MyApp extends UiApplication {
	MyApp() {
		MyAppScreen myAppScreen = new MyAppScreen();
		pushScreen(myAppScreen);
	}

	public static void main(String[] args) {
		MyApp app = new MyApp();
		app.enterEventDispatcher();
	}
}

class MyAppScreen extends MainScreen {
	MyAppScreen() {
		// add fields etc.
	}
}
</pre><h3>Termination</h3><p>An application terminates when <a
href="http://www.blackberry.com/developers/docs/5.0.0api/java/lang/System.html#exit(int)"> System.exit()</a> is called. This method causes the Blackberry JVM to terminate  the process of the caller. For applications that show a UI, the typical way of  quitting is that when the <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/ui/Screen.html#close()"> close()</a> method is called on a screen, it pops the screen from the display  stack, and if the display stack is empty, calls <a
href="http://www.blackberry.com/developers/docs/5.0.0api/java/lang/System.html#exit(int)"> System.exit()</a>. So if the user keeps pressing back and hits the first screen  of your application, pressing back once again would by default exit the app. To  avoid this, one needs to override the close() method.</p><h2>Doing Long Running Work</h2><p>As described in the Event-Processing section above, the Blackberry UI model  is a single threaded one. That means that you cannot do long-running work on it,  and instead need to launch a worker thread. This worker thread can update the UI  in two ways:</p><ol><li><strong>Acquire and synchronize on the event lock:</strong> To do this, the  	worker  	thread invokes <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/Application.html#getEventLock()"> Application.getEventLock()</a> and then synchronizes this object to access  	the UI. While this is being done the Event Thread is paused, so the lock  	should be held for a really short period of time.<pre>// on the background thread
synchronized(Application.getEventLock()) {
    // update the UI
}</pre></li><li><strong>Inject an event in the UI message queue:</strong> To inject an event, you model it  	as a class that implements <a
href="http://www.blackberry.com/developers/docs/5.0.0api/java/lang/Runnable.html"> Runnable</a> and inject it into the Event Thread message queue by calling <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/Application.html#invokeAndWait(java.lang.Runnable)"> invokeAndWait()</a> or <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/Application.html#invokeLater(java.lang.Runnable)"> invokeLater()</a>. The event thread would then call the runnable&#8217;s <a
href="http://www.blackberry.com/developers/docs/5.0.0api/java/lang/Runnable.html#run()"> run()</a> method at the next available opportunity. While for  	invokeAndWait(), the worker thread would block till run() returns,  	invokeLater does not block and returns immediately.<pre>// on the background thread:
UiApplication.getUiApplication().invokeLater (new  Runnable() {
    public void run()
    {
        // update the UI - runs on the Event thread
    }
});</pre></li></ol><h2>Interprocess Communication</h2><p>There are two ways in which Blackberry applications can communicate with each  other:</p><ol><li>Using the <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/RuntimeStore.html">RuntimeStore</a>:  	The RuntimeStore is a system-wide volatile storage where applications can  	place data to share with other applications.</li><li>Using Global Events: Use <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/Application.html#addGlobalEventListener(net.rim.device.api.system.GlobalEventListener)"> Application.addGlobalEventListener()</a> to add a <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/GlobalEventListener.html"> GlobalEventListener</a> to listen to global events. Post a Global Event by  	calling <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/system/ApplicationManager.html#postGlobalEvent(long, int, int)"> ApplicationManager.postGlobalEvent()</a>.</li></ol><h2>Security Model</h2><p>Blackberry has a complex security model, owing to its roots in enterprise  software. When businesses issue their employees Blackberry devices, they also  expect to able to exercise control over what the user is able to do on the  device. This is made possible thru a <a
href="http://docs.blackberry.com/en/developers/deliverables/9095/Application_control_447165_11.jsp"> policy</a> which is pushed to the device using BES or the Blackberry Desktop  Manager. Such policies can limit the resources available to an  application, and to handle that gracefully, applications should:</p><ol><li>Find out what permissions they have: By calling <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/applicationcontrol/ApplicationPermissionsManager.html#getApplicationPermissions()"> ApplicationPermissionsManager.getApplicationPermissions()</a>. This returns  	an instance of <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/applicationcontrol/ApplicationPermissions.html"> ApplicationPermissions</a> class which gives all the permissions that are  	set.</li><li>Ask for permissions they need: By calling <a
href="http://www.blackberry.com/developers/docs/5.0.0api/net/rim/device/api/applicationcontrol/ApplicationPermissionsManager.html#invokePermissionsRequest(net.rim.device.api.applicationcontrol.ApplicationPermissions)"> ApplicationPermissionsManager.invokePermissionsRequest()</a> and passing it  	an instance of ApplicationPermissions with the requested permissions set.</li></ol><p>Besides this, RIM has marked a <a
href="http://docs.blackberry.com/en/developers/deliverables/9095/Java_APIs_with_controlled_access_447163_11.jsp"> number of APIs</a> as <a
href="http://docs.blackberry.com/en/developers/deliverables/9095/Controlled_APIs_and_code_signing_447162_11.jsp"> controlled APIs</a>. To access these APIs on the device, you must sign your  application using a key or signature obtained from RIM. For details, see <a
href="http://us.blackberry.com/developers/javaappdev/codekeys.jsp"> http://us.blackberry.com/developers/javaappdev/codekeys.jsp</a>. On the  simulators, these APIs are accessible without code-signing.</p><h2>Packaging</h2><p>When you build your app, you create a JAR, alongwith a manifest. Java ME <a
href="http://developers.sun.com/mobility/midp/ttips/getAppProperty/index.html"> defines certain mandatory and optional attributes</a> that applications should  include in the JAR manifest, besides any other app-specific info that maybe  present.</p><p>However, a Blackberry device cannot directly execute a JAR. It first needs to  be converted into a proprietary format file called a <a
href="http://drbolsen.wordpress.com/2006/07/26/blackberry-cod-file-format/"> .COD</a> file which is essentially RIM&#8217;s own version of the JAR. So if you want  to run a MIDlet on a Blackberry device, you first need to convert the JAR into a  COD file which can be done using the Eclipse plugin or by using the <a
href="http://supportforums.blackberry.com/t5/Java-Development/Use-the-RAPC-compiler/ta-p/444879"> RAPC compiler</a> on the command line. Some more details on using the RAPC  compiler are given <a
href="http://codeforfun.wordpress.com/2008/09/09/how-to-use-rapc-from-rim-dirty-details/"> here</a>.</p><p>However, the max size of a COD file is 128 kb with a 64 kb limit on compiled  code and 64-kb limit on resource data. In case the compiled code produced is  larger than these limits, the compiler breaks the code into mulitple COD files,  naming them in increasing numerical order like this:</p><ul><li>HelloWorld.cod</li><li>HelloWorld-1.cod</li><li>HelloWorld-2.cod</li><li>HelloWorld-3.cod</li><li>&#8230;</li></ul><p>The compiler then takes the numbered files (called sibling COD files), zips  them and names the zip as a COD file (HelloWorld.cod above) &#8211; this is called the  main COD file. There is a max limit to the main COD file too &#8211; it can contain up  to 127 sibling COD files, thus imposing a theoretical size limit. For more  details, see <a
href="http://supportforums.blackberry.com/t5/Java-Development/The-maximum-size-of-a-BlackBerry-smartphone-application/ta-p/502534"> http://supportforums.blackberry.com/t5/Java-Development/The-maximum-size-of-a-BlackBerry-smartphone-application/ta-p/502534</a>.</p><p>Packaging the application for Wireless or Over-the-Air (OTA) distribution requires more work. For this, Java ME defines an additional manifest called  a JAD (Java Application Descriptor) that Java ME applications should provide for  OTA distribution. The main purpose of a JAD is for locating a JAR and figuring  out its size. Besides the JAR URL and JAR size, the JAD must also contain  certain other attributes that are also required in a JAR manifest and may  contain any attribute that is in a JAR manifest. For details, see <a
href="http://developers.sun.com/mobility/midp/ttips/getAppProperty/index.html"> here</a>. Blackberry specific attributes for a JAD are given <a
href="http://docs.blackberry.com/en/developers/deliverables/7693/Attributes_for_jad_files_513047_11.jsp"> here</a>. Also, for OTA distribution, you need to extract the sibling COD files  from the main COD file. For details see <a
href="http://docs.blackberry.com/en/developers/deliverables/7693/Extract_sibling_cod_files_512523_11.jsp"> here</a>.</p><h2>Distribution</h2><p>Blackberry applications can be distributed in two ways:</p><ol><li> <a
href="http://docs.blackberry.com/en/developers/deliverables/7693/App_distribution_through_a_computer_connection_447181_11.jsp"> Direct Connection</a>: Connect the device to your computer and install the  	app. There are two possibilities here:<ul><li> <a
href="http://docs.blackberry.com/en/developers/deliverables/7693/Distribute_an_application_from_a_computer_447182_11.jsp"> Using the Blackberry Desktop Manager</a>: The application needs to have  		a COD file alongwith an additional manifest file called the<a
href="http://docs.blackberry.com/en/developers/deliverables/7693/Elements_and_attributes_for_alx_files_513046_11.jsp"> ALX file</a>.</li><li> <a
href="http://docs.blackberry.com/en/developers/deliverables/7693/Distribute_an_application_from_a_web_page_447183_11.jsp"> Using a Web Page</a>: For this, you need to put up an ActiveX control  		called a  <a
href="http://docs.blackberry.com/en/developers/subcategories/?userType=21&amp;category=BlackBerry+Application+Web+Loader"> Web Loader</a> on the web-page from where you want the installation to  		happen. The user browses to a web-page on his desktop using IE and the  		control starts an installer. In case the user visits the page using a  		Blackberry device, the user is prompted to connect the device to a  		desktop.</li></ul></li><li> <a
href="http://docs.blackberry.com/en/developers/deliverables/7693/Distributing_BB_Java_Apps_over_wireless_network_512513_11.jsp"> OTA</a>: There are two ways in which OTA distribution may happen:<ul><li><strong>BES Push:</strong> An organization with a Blackberry  		Enterprise server deployment can push notifications to Blackberry  		devices for applications based on policies defined by an administrator.</li><li> <a
href="http://docs.blackberry.com/en/developers/subcategories/?userType=21&amp;category=BlackBerry+App+World+storefront"> Blackberry App-World</a>: The process is described <a
href="http://docs.blackberry.com/en/developers/deliverables/24398/Process_flow_title_647894_11.jsp"> here</a>.</li></ul></li></ol> <div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/VineetGupta?a=q3t068xNfHc:gyFLeT9eaE0:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=q3t068xNfHc:gyFLeT9eaE0:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=q3t068xNfHc:gyFLeT9eaE0:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=q3t068xNfHc:gyFLeT9eaE0:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=q3t068xNfHc:gyFLeT9eaE0:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=q3t068xNfHc:gyFLeT9eaE0:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=q3t068xNfHc:gyFLeT9eaE0:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=q3t068xNfHc:gyFLeT9eaE0:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/VineetGupta/~4/q3t068xNfHc" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://www.vineetgupta.com/2011/03/mobile-platforms-part-2-blackberry/feed/</wfw:commentRss> <slash:comments>5</slash:comments> <feedburner:origLink>http://www.vineetgupta.com/2011/03/mobile-platforms-part-2-blackberry/</feedburner:origLink></item> <item><title>Mobile Platforms – Part 1 – Android</title><link>http://feedproxy.google.com/~r/VineetGupta/~3/mQvRIRIZH1U/</link> <comments>http://www.vineetgupta.com/2011/03/mobile-platforms-part-1-android/#comments</comments> <pubDate>Sun, 06 Mar 2011 19:36:38 +0000</pubDate> <dc:creator>Vineet Gupta</dc:creator> <category><![CDATA[Mobile]]></category> <category><![CDATA[Android]]></category><guid isPermaLink="false">http://www.vineetgupta.com/?p=10071</guid> <description>One of the products we are building at Directi is a x-platform mobile chat client. Developing it has been arduous &amp;#8211; we experimented with Phonegap and then Titanium and found both lacking (see this Quora thread for details). So now we are building this as three native apps, targeting iPhone, Android and Blackberry. In this &lt;a
href='http://www.vineetgupta.com/2011/03/mobile-platforms-part-1-android/'&gt;[...]&lt;/a&gt;</description> <content:encoded><![CDATA[<p>One of the products we are building at Directi is a x-platform mobile chat client.     Developing it has been arduous &#8211; we experimented with Phonegap and then Titanium     and found both lacking (see this <a
href="http://www.quora.com/How-do-you-decide-whether-to-go-with-a-native-iPhone-Android-app-or-a-web-app-that-pretends-to-be-an-iPhone-Android-app/answer/Vineet-Gupta?srid=Ysz"> Quora thread</a> for details). So now we are building this as three native apps,     targeting iPhone, Android and Blackberry. In this post I am going to cover the Android platform, and would follow up with iOS and Blackberry.</p><h2>Android Architecture</h2><p>The following diagram (from <a
href="http://developer.android.com/guide/basics/what-is-android.html"> http://developer.android.com/guide/basics/what-is-android.html</a>) gives the     overall architecture of the Android OS:</p><p><img
src="http://developer.android.com/images/system-architecture.jpg" alt="Android Architecture" /></p><p>The Android OS is basically a Linux 2.6.4 kernel fork. The Linux kernel is used     for memory management, process management, security, driver model, shared lib model.     Google has added Android specific enhancements: Power management, Kernel debugger,     Logger, Low Memory killer, Shared Memory Driver, IPC Driver, etc.</p><p>Above the kernel, in the user space, sits the Hardware abstraction layer &#8211; this     deals with stuff like Radio, Wifi, GPS, Graphics, Audio, Camera, Bluetooth, etc.     Basically this defines a set of interfaces that drivers need to implement. These     libaries are loaded by the system at runtime on a need basis.</p><p>Above the HAL, sit a bunch of libraries &#8211; the most important of course is libc &#8211;     Google has written a port called Bionic libc which is optimized for embedded use     &#8211; in terms of being small in size (since it is loaded in every process) and fast     (since CPU is limited), but also because libc is GPL and Google wanted to protect     apps from being GPLed. Besides there is webkit of course and SQLite. Google also     provides a Media Framework, OpenGL library, and libraries for UI rendering and audio.</p><p>The Android programming environment is based on Java. For this, the OS ships with     Dalvik &#8211; a custom implementation of the JVM optimized for the embedded environment.     Dalivk does not run the Java bytecode, but its own custom bytecode called Dex &#8211;     Dalvix Executable. It uses runtime memory very efficiently, the bytecode interpreter     is highly CPU optimized, and because of this low footprint, the VM can be run in     multiple processes &#8211; basically each app gets its own Linux process with its own     instance of a Dalvik VM. The VM is exposed by a set of core Java APIs which provide     the familiar Java library.</p><p>All of the above is wrapped in a set of platform services called the Android Application     Framework &#8211; these services work behind the scenes to provide abstractions like Activities,     Packages, Windows, Resources, Content Providers, View System, hardware access, etc.</p><h2>Startup</h2><p>Like a typical Linux system, in Android, the bootstrapper loads the Linux kernel     and starts the init process, which in turn starts various daemons like Android Debug     Bridge (adbd), Radio Interface Layer Daemon (rild), etc. After this, init starts     what is called the Zygote process. The purpose of this process is to kick off the     rest of the Android system and later help in the instantiation of apps. The Zygote     process initializes a Dalvik VM instance and loads a bunch of libraries and starts     listening on a socket for requests to spawn more processes (with VM instances).     As requests come in, it forks to create new processes with VM instances. Copy-on-write     is used to maximize reuse and minimize footprint.</p><p><img
class="alignnone" alt="Android Startup Sequence" src="http://farm6.static.flickr.com/5291/5502863911_40f78966b5_z.jpg" /></p><p>(Android Startup Sequence &#8211; source: <a
href="http://sites.google.com/site/io/anatomy--physiology-of-an-android"> http://sites.google.com/site/io/anatomy&#8211;physiology-of-an-android</a>)</p><p>Once Zygote is in place, the init process starts what is called the Runtime process.     This basically does two things:</p><ol><li>Start the Service Manager &#8211; this is responsible for managing IPC and all services         are required to register with it. It acts like a local DNS providing a way to bind         to a service given its name.</li><li>Ask Zygote to fork what is called the System Server. This is the first process (besides         Zygote) that has a running VM instance. The System server starts services for display         (Surface Flinger) and audio (Audio Flinger). These services register with the service         manager like all services are expected to. Now other apps can start using display         and audio. After this, the System server starts up all the core platform services         &#8211; Window manager, Telephony manager, Power manager, Activity manager, etc. Each         one of these services also registers with the Service Manager, so that apps can         use these services.</li></ol><p>At this stage, we have the following processes in place:</p><ol><li>Init &#8211; the original init process started by the bootstrapper</li><li>Daemons &#8211; started by init</li><li>Runtime &#8211; also started by init</li><li>Zygote &#8211; the original zygote which will continually get forked as new apps get launched</li><li>System Server &#8211; the first managed process which contains all the core services and         platform components.</li></ol><p>After this, the Home Screen or the Idle screen is launched &#8211; basically the Activity     Manager (which is inside the System Server) sends a message to Zygote to start the     &#8220;Home&#8221; Activity, which causes Zygote to fork into a new process with a     Dalvik VM and the Home activity. Now depending on the action carried out by the     user, the appropriate app would be launched with the Zygote forlking each time to     create a new VM instance inside a new process. So for example, if the user starts     the Contacts app, the apt process would be setup:</p><p><img
class="alignnone" alt="Android Startup Processes" src="http://farm6.static.flickr.com/5171/5503452198_4f8d649c05_z.jpg" /></p><p>(Startup processes in Android -  source: <a
href="http://sites.google.com/site/io/anatomy--physiology-of-an-android"> http://sites.google.com/site/io/anatomy&#8211;physiology-of-an-android</a>)</p><p>Each new app gets a unique user Id which is opaque to the app, and the system sets     permissions for all files in the app so that only the user id assigned to the app     can access those files.This makes the system secure since the app only has access     to the components that it needs to do its work and nothing else. However,      apps may need to communicate with each other and the system services, which are     all running in separate processes. This is accomplished thru Inter Process Communication</p><h2>IPC</h2><p>With its emphasis on isolating apps and services in process boundaries, it is clear     that Android requries a lightweight IPC. The IPC mechanism in Android is called     Binder and is based on shared memory. Recall that when a process starts it registers     itself with the Service Manager. This happens behind the scenes, and on registeration,     each process gets what is called a <a
href="http://developer.android.com/reference/android/content/Context.html"> Context</a> object &#8211; a reference to the Service manager. Now lets say app A     needs to communicate with service B, and the two are running in two separate processes.     To do this, app A asks the Context for the Service B by passing the well known name     of the service. The Context returns a reference to the service to A, on which A     can call a method &#8211; say foo. This method call is intercepted by the Binder driver.     The driver marshals the object and passes a reference of it to the receiver &#8211; B.     Note that this is passing by reference, not passing by value, in which the object     is serialized. On the side of the service B, the Binder maintains a thread-pool     (transparent to the service). One of the threads in this pool receives the incoming     call, locates the actual object in the service B and make the call. The return value     is then similarly passed back to the caller. The following diagram (from <a
href="http://sites.google.com/site/io/anatomy--physiology-of-an-android"> http://sites.google.com/site/io/anatomy&#8211;physiology-of-an-android</a>) illustrates     this:</p><p><img
alt="Android IPC - Binder" src="http://farm6.static.flickr.com/5015/5503451932_08abdba708_z.jpg" /></p><p>The entire operation is synchronous, and the objects are reference counted so that     when they are no longer in use, they can be deleted. Since there is no serialization     / de-serialization, there is no overhead and therefore this model can also be used     to communicate with services that run in your own process without any penalty.</p><h2>Applications</h2><p>An application in Android is a collection of components. There are four types of     components:</p><ul><li><a
href="http://developer.android.com/guide/topics/fundamentals/activities.html"> Activity</a>: An Activity is a UI component corresponding to one screen with which         a user interacts in order to do something.</li><li><a
href="http://developer.android.com/guide/topics/fundamentals/services.html">Service</a>:         A Service is an application component without a UI used to perform long running         operations in the background.</li><li><a
href="http://developer.android.com/reference/android/content/BroadcastReceiver.html"> Broadcast Receiver</a>: A broadcast receiver is a component that responds to system-wide         broadcasts (for example screen turned off, battery low, etc.)</li><li><a
href="http://developer.android.com/guide/topics/providers/content-providers.html"> Content Provider</a>: A content provider is a component that stores and retrieves         data and makes it available to all applications. There are different types of content         providers: audio, video, contacts, etc. and you can create your own custom provider         also.</li></ul><p>As mentioned earlier, each application runs in its own process. By default, all     components used by that app also run in the same process, and on the main thread.     However, it is possible to make a component of your app run in a separate process     thru the manifest file. Thus, an application in Android may span multiple processes.</p><h2>Application Startup</h2><p>Android follows a fairly unique model in that there is no single entry point for     an application &#8211; there is no main() function. Instead, a component in one application     can start another application&#8217;s component, thus bringing the application to life.     This communication across apps happens thru the IPC mechanism descibed above. So     while an Activity is owned by an application, it is possible for another application     to start it (if the owning application allows it). An example is clicking on a hyperlink     in an app opening up the browser. This applies for not just Activities, but other     types of components also. In order for this to happen, there are two steps required:</p><ol><li>In case the application is not already running, the Android system would bring the         application to life in a new process forked from Zygote.</li><li>The desired component inside the app would need to be activated.</li></ol><p>Note that in case the application is already running, the new component would be     by default instantiated in the same process.</p><p>As mentioned above, IPC happens thru a <a
href="http://developer.android.com/reference/android/content/Context.html"> Context</a> object. So when a component A inside one app needs to activate another     component B in a different app, or give it something new to do, it basically uses     the Context object to send a message to the other component. In the case of an Activity,     Service or a BroadcastReceiver, this takes the form of a what is called an <a
href="http://developer.android.com/guide/topics/intents/intents-filters.html"> Intent</a> &#8211; a passive data structure that defines the operation to be performed     for an Activity and a Service, and for a Broadcast Receiver, a defintion of the     announcement being broadcast.</p><p>Content Providers however are not activated thru Intents. Instead, activation happens     on a request from a <a
href="http://developer.android.com/reference/android/content/ContentResolver.html"> ContentResolver</a>, which acts as a mediator between the requesting component     and the Content Provider.</p><h2>The Activity Back-Stack</h2><p>Consider the following scenario:</p><ol><li>You are on the Home Screen. This is Activity 1</li><li>You click on the Mail app icon, and that activates the main activity in the Mail         app which comes to the foreground. This is Activity 2</li><li>You now click on Compose and that activates the Compose activity in the Mail app.         This is Activity 3</li><li>You decide to cancel out of composing a new message and press the back button. You         come back to Activity 2</li></ol><p>Here is what happens in the background:</p><ul><li>When one activity starts another activity, it stops and its state is saved. So when         Activity 1 starts activity 2, activity 1 is stopped, its state saved and so on</li><li>The system maintains a stack (called a back-stack) with the latest activity on top         and the oldest activity at the bottom.</li><li>When the user presses the back-key, Activity 3 is popped and Activity 2 is started         from its saved state</li></ul><p><img
src="http://developer.android.com/images/fundamentals/diagram_backstack.png" alt="Activity Back Stack" width="700" height="300" /></p><p>(The Activity Back Stack &#8211; source: <a
href="http://developer.android.com/guide/topics/fundamentals/tasks-and-back-stack.html"> http://developer.android.com/guide/topics/fundamentals/tasks-and-back-stack.html</a>)</p><p>This approach allows Android to seamlessly transition from one app to the other     in a consistent way.</p><h2>Tasks</h2><p>In the scenario described above, assume that when the user was in Compose mail (Activity     3), he decided to call someone and pressed the Home key. This would not unwind the     back-stack, but start a new stack. In order to do this, the collection of Activities     in the first stack needs to go into the background. This is achieved thru the notion     of a Task &#8211; a cohesive unit of Acitivities. When a task moves to the background,     all activities in the Task are stopped but the back-stack for the task remains intact,     so that when the user returns back to the Task, she can resume where she left off.     However, to save memory, the back-stack for a background Task is not retained for     a very long period and if the user does not go back to the task, the back-stack     is cleared except for the root activity.</p><p><img
src="http://developer.android.com/images/fundamentals/diagram_multitasking.png" alt="Tasks" width="331" height="202" /></p><p>(Background and Foreground Tasks &#8211; source: <a
href="http://developer.android.com/guide/topics/fundamentals/tasks-and-back-stack.html"> http://developer.android.com/guide/topics/fundamentals/tasks-and-back-stack.html</a>)</p><h2>Activity Life Cycle</h2><p>The lifecycle of an activity is affected by its association with other activities,     its task and its back-stack. There are three states in which an activity can exist:</p><ul><li><strong>Resumed / Running:</strong> Activity is in the foreground and has user focus.         Such an activity is never killed by the system,</li><li><strong>Paused:</strong> Activity is visible, but another activity is in the foreground         and has user focus. This could happen if the other activity is at the top, but is         translucent, or because it does not cover the entire screen. In the paused state,         the Activity object is retained in memory, maintains all state and member info and         remains attached to the Window manager. However it can be killed in extremely low         memory conditions.</li><li><strong>Stopped:</strong> The Activity is not visible to the user.While the Activity         object is retained in memory, maintains all state and member info, it does not remain         attached to the Window manager. The activity can be killed by the system to reclaim         memory if required.</li></ul><p><img
src="http://developer.android.com/images/activity_lifecycle.png" alt="Activity State Transitions" width="545" height="711" /></p><p>(Activity Lifecycle &#8211; source: <a
href="http://developer.android.com/guide/topics/fundamentals/activities.html#Lifecycle"> http://developer.android.com/guide/topics/fundamentals/activities.html#Lifecycle</a>)</p><h2>Saving Activity State</h2><p>Note that it is entirely possible that the system, in order to reclaim memory, may     destroy an Activity, or even the process in which the Activity was running. However,     when the user comes back to the activity (thru the back-stack), you still want to     resume at the point the user left. To do this, an activity must save its state.     This happens thru the <a
href="http://developer.android.com/reference/android/app/Activity.html#onSaveInstanceState(android.os.Bundle)"> Activity.onSaveInstanceState()</a> method.</p><p><img
src="http://developer.android.com/images/fundamentals/restore_instance.png" alt="Saving Activity State" width="612" height="400" /></p><p>(Saving and restoring Activity State &#8211; source: <a
href="http://developer.android.com/guide/topics/fundamentals/activities.html#SavingActivityState"> http://developer.android.com/guide/topics/fundamentals/activities.html#SavingActivityState</a>)</p><h2>Process Lifecycle</h2><p>The Android system may need to kill a process in order to reclaim memory. To ensure     that this creates minimal impact on the user expereince, Android ranks processes     in a priority order:</p><ul><li><strong>Foreground Process</strong>: A process that is required for what the user         is currently doing. Such a process is killed only as a last resort</li><li><strong>Visible Process</strong>: This process is not in the foreground but can         affect what the user is seeing on the screen. For example it may host a paused Activity.         Such a process is not killed unless doing so is required to keep all foreground         processes running</li><li><strong>Service Procress</strong>: A process that is running a service and is not         one of the two types above. For example the service may be playing music or downloading         something. The system would keep such a process running unless there is not enough         memory to keep Foreground and Visible processes running.</li><li><strong>Background Process</strong>: A process that is holding an activity not currently         visible to the user (the activity is stopped). Such a process has no impact on the         user experience (if the activity lifecycle is correctly implemented and the activity         state is being properly saved and restored) The system can kill such a process any         time. Typically there are mulitple background processes running and the system maintains         a LRU list which is used to kill such processes.</li><li><strong>Empty Process</strong>: An empty process does not hold any active component.         The only reason such a process is kept alive in the first place is for caching and         to improve startup time. The system often kills such processes to maintain the balance         between process caches and underlying kernel caches.</li></ul><p>Of course it can so happen that a higher priority process is dependent on a lower     priority process. In such a case the ranking of the serving process is increased     to the same level as that of the dependent process.</p><p>A situation where this ranking has a direct impact is this:</p><ul><li>Your app needs to download something big which may take time and the user is likely         to move out of the app&#8217;s activity to something else.</li><li>If you spawn a worker thread for this task and the user moves out, the process would         become a background process.</li><li>However, if you instead spawn a service, the process would be a service process         and be less likely to be killed</li></ul><h2>Main / UI Thread</h2><p>When an application is launched and a new process created to host it, the application     gets a single thread called the main thread or the UI thread since this thread is     responsible for servicing the UI. The way this works is mostly the same for almost     any UI implementation &#8211; be it desktop operating systems or mobile OS. Basically,     the UI thread runs an infinite loop which checks a queue to see if there are any     pending UI events. In the case of Android, this concept is formalized in the form     of a <a
href="http://developer.android.com/reference/android/os/Looper.html">Looper</a>.      The looper loops over a <a
href="http://developer.android.com/reference/android/os/MessageQueue.html"> MessageQueue</a> which contains the <a
href="http://developer.android.com/reference/android/os/Message.html"> messages</a> to be dispatched. The actual task of managing the queue is     done by a <a
href="http://developer.android.com/reference/android/os/Handler.html">Handler</a> which is responsible for handling (adding, removing, dispatching) messages in the     message queue.</p><p><img
src="http://farm6.static.flickr.com/5295/5502982451_1f78830c9e_z.jpg" alt="Android Threading"  /></p><p>(Android Threading &#8211; source: <a
href="http://sites.google.com/site/io/inside-the-android-application-framework">http://sites.google.com/site/io/inside-the-android-application-framework</a>)</p><p>For the main thread, a Looper is setup by the system. However, you can also associate     a Looper with your own thread. Note that the Looper can be associated only with     one thread and that association cannot be changed. The way Android ensures this     is by putting the Looper on the <a
href="http://developer.android.com/reference/java/lang/ThreadLocal.html"> thread local storage</a> of the thread and not exposing a constructor for the     Looper. You get a Looper by calling the static <a
href="http://developer.android.com/reference/android/os/Looper.html#prepare"> prepare</a> method which first checks the TLS to see if there is already a Looper,     and if not creates one. You then call the <a
href="http://developer.android.com/reference/android/os/Looper.html#loop"> loop</a> method on it to start pumping messages. Unlike a Looper, mulitple handlers     can be associated with a message queue.</p><p>So quite obviously, one should not run a blocking operation on the UI thread. If     you do, messages would stop getting processed from the Message Queue and your app     would become unresponsive. This is what leads to the famous <a
href="http://developer.android.com/guide/practices/design/responsiveness.html"> Application Not Responding (ANR) dialog</a>. One solution is decsribed <a
href="http://developer.android.com/guide/appendix/faq/commontasks.html#threading"> here</a>: create a handler, spawn a worker thread, post results back from     the worker thread to the handler, update the UI. But this is a bit cumbersome to     do. A simpler solution is to use a <a
href="http://developer.android.com/reference/android/os/AsyncTask.html"> AysncTask</a> which takes care of the underlying plumbing.</p><h2>Service</h2><p>As mentioned earlier, a Service is a type of component that is used to carry out     a background operation, and does not provide a UI. An app component can start a     service, and even if the user switches out of the application, the service would     keep running. Typical use cases include playing music and downloading a file. Android     ensures that a process running a Service is not killed as far as possible (see Process     lifecycle above).</p><p>Note that a service by default runs on the main UI thread in the same process. This     means that if it carries out a blocking operation, your UI would become unresponsive     since the thread would not be available to run the Activity. To circumvent this,     you should launch a worker thread inside the service.</p> <div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/VineetGupta?a=mQvRIRIZH1U:X1QMVJUfXwM:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=mQvRIRIZH1U:X1QMVJUfXwM:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=mQvRIRIZH1U:X1QMVJUfXwM:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=mQvRIRIZH1U:X1QMVJUfXwM:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=mQvRIRIZH1U:X1QMVJUfXwM:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=mQvRIRIZH1U:X1QMVJUfXwM:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=mQvRIRIZH1U:X1QMVJUfXwM:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=mQvRIRIZH1U:X1QMVJUfXwM:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/VineetGupta/~4/mQvRIRIZH1U" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://www.vineetgupta.com/2011/03/mobile-platforms-part-1-android/feed/</wfw:commentRss> <slash:comments>10</slash:comments> <feedburner:origLink>http://www.vineetgupta.com/2011/03/mobile-platforms-part-1-android/</feedburner:origLink></item> <item><title>How Browsers Work – Part 1 – Architecture</title><link>http://feedproxy.google.com/~r/VineetGupta/~3/fQrhrL-Kr-U/</link> <comments>http://www.vineetgupta.com/2010/11/how-browsers-work-part-1-architecture/#comments</comments> <pubDate>Sun, 07 Nov 2010 19:23:55 +0000</pubDate> <dc:creator>Vineet Gupta</dc:creator> <category><![CDATA[Web]]></category> <category><![CDATA[Browser Architecture]]></category> <category><![CDATA[Browser Internals]]></category><guid isPermaLink="false">http://www.vineetgupta.com/?p=10065</guid> <description>The web-browser is not an easy platform to program against. I realized this in Dec 2008 when we decided to change the architecture of one of our products at Directi to a pure JavaScript client talking to a REST API, and even more so in Apr 2009 when we decided to build our desktop product &lt;a
href='http://www.vineetgupta.com/2010/11/how-browsers-work-part-1-architecture/'&gt;[...]&lt;/a&gt;</description> <content:encoded><![CDATA[<p>The web-browser is not an easy platform to program against. I realized this in Dec 2008 when we decided to change the architecture of one of our products at Directi to a pure JavaScript client talking to a REST API, and even more so in Apr 2009 when we decided to build our <a
href="http://Chat.pw">desktop product</a> on webkit. However, despite all the issues, the decision of betting the client on web technologies has paid off handsomely and since then our investments in using web-technologies on the client have increased manifold. Over the last two years, multiple teams at Directi have discovered the pains and joy of programming the browser, and almost everyday we learn something interesting about how browsers behave. As we kept learning stuff, I kept notes and thought it would be a good idea to annotate them and share them more widely. This is the first in a series of posts, in which I intend to cover (not necessarily in this order or with this breakup):</p><ul><li>Architecture</li><li>HTTP</li><li>Security Model</li><li>Content and Rendering</li><li>JavaScript</li><li>Apps</li></ul><p>Having said that, I have <a
href="http://www.vineetgupta.com/2009/06/notes-on-xmpp-part-i-fundamentals/">a</a> <a
href="http://www.vineetgupta.com/2008/12/usability-part-1-research-findings/">few</a> <a
href="http://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape">posts</a> lying around where I have never gotten around to writing part 2. Let&#8217;s hope it does not happen in this case <img
src='http://www.vineetgupta.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p><h3>1. Rendering Process</h3><p>The job of a browser is to fetch and display a web-page. At a high level, most modern browsers carry out the following steps to render an HTML page:</p><p><img
title="Rendering Process" src="https://developer.mozilla.org/@api/deki/files/236/=Gecko_Overview_9.png" alt="" /></p><p>(Source: <a
href="https://developer.mozilla.org/en/Introduction_to_Layout_in_Mozilla">https://developer.mozilla.org/en/Introduction_to_Layout_in_Mozilla</a>)</p><ul><li>Load the HTML</li><li>Parse it</li><li>Apply styles</li><li>Build frames</li><li>Layout the frames (flow)</li><li>Paint the frames</li></ul><ol><li><strong>Load:</strong> The browser tries to fetch the page from the specified location. Typically this would be thru a HTTP client. However, a HTML page may also be loaded from a filesystem. Irrespective, the loader fetches the HTML page from its location. The super important concept of Browser Cache comes into play over here &#8211; but more on this later. The way the HTML page gets loaded is different from the way the resources get loaded. In WebKit there are two different pipelines &#8211; one for loading the page and another for loading resources:</li><p>(source: <a
href="http://webkit.org/blog/1188/how-webkit-loads-a-web-page/">http://webkit.org/blog/1188/how-webkit-loads-a-web-page/</a>)</p><li><strong>Parse:</strong> As the stream comes thru from the loader, an HTML parser starts building the DOM (also called a &#8220;Content Tree&#8221;) &#8211; each node here is an HTML element. Now a lot of HTML on the net is broken, and each browser has had to implement its own quirks to parse HTML leading to subtle incompatibilities. HTML 5 however specifies the <a
href="http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html">parsing algorithm</a>. As this gets adopted, the x-browser incompatibilities because of parsing should go away.	While parsing, the engine may come across resources (JS, CSS, images, fonts, etc.) When that happens the particular resource is queued for loading and parsing continues. Again, there is more to this, which we&#8217;ll tackle later.</li><li><strong>Compute Styles:</strong> The browser provides a default stylesheet. Often the HTML page also has a set of styles specified. These styles need to be applied to the Content Tree. For this purpose, a &#8220;Rendering Tree&#8221; is built &#8211; this essentially consists of elements that are to be rendered. For example an element with display set to none would not appear in this tree (nor would its descendants). Nor would elements like HEAD and SCRIPT. Nodes in the Render Tree represent style information: CSS box model, z-order, opacity are all specified here</li><li><strong>Construct Frames:</strong> Most render-able elements follow the <a
href="http://www.w3.org/TR/CSS21/box.html">CSS box model</a>: They have height, width, border, spacing, padding, margin and position. For these objects, a rectangular box &#8211; called a Frame &#8211; is  created. Not all objects have a frame &#8211; for example the SVG image above does not have a frame. It is put inside an iframe, which has a frame. A frame has all the information on how the object itself is going to be rendered. What is not known however, is how is the element going to be placed w.r.t other elements.</li><li><strong>Compute Flow:</strong> Flow Computation or Layout Computation is about how elements are placed w.r.t each other and is mostly controlled by the <a
href="http://www.w3.org/TR/CSS21/visuren.html">CSS Visual Rendering Model</a>. This is typically a recursive process from the root of the tree to leafs. Also, this is typically a lazy process &#8211; it is done on a need-basis. Basically when the layout engine determines that an element needs to be laid out (for example a newly added Node), it marks it as such by setting a dirty bit. The actual layout is done only when some method is called which requires the new information. A visual representation of the layout process can be seen in these videos:<ul><li><a
href="http://video.google.com/videoplay?docid=-1471976166301235697">Gecko reflow for Google homepage</a></li><li><a
href="http://video.google.com/videoplay?docid=-5863446593724321515">Gecko reflow for Wikipedia homepage</a></li><li><a
href="http://video.google.com/videoplay?docid=1020647662203348823">Gecko reflow for Mozilla.org homepage</a></li></ul><p>Most browsers do flow calculation at a higher resolution than what any display would have. This is to support zooming &#8211; when the user zooms in or out, the objects can be drawn correctly on the screen without requiring any extra steps other than mapping the coordinates to real pixels.</li><li><strong>Paint:</strong> Once the engine knows exactly where the objects need to be drawn, comes the process of actually rendering the objects on the screen. This process &#8211; called Painting &#8211; is described in agonizing detail in <a
href="http://www.w3.org/TR/CSS21/zindex.html">Appendix E of the CSS 2.1 Spec</a>. This is basically a Tree walk from the root of the Rendering tree, where each node is asked to paint itself. The actual rendering is abstracted out thru a Graphics Engine which is responsible for actually turning on the pixels and things like hardware acceleration.</li></ol><h3>2. Rendering Modes</h3><p>The actual execution of the rendering process described above can change completely based upon the rendering mode the browser decides to use for a particular page. The reason browsers have different rendering modes is because of the history of the web, and understanding rendering modes is very important to understanding how browsers behave. However, I would not touch upon it here since <a
href="http://hsivonen.iki.fi/doctype/">http://hsivonen.iki.fi/doctype/</a> does an excellent job of capturing all the details. If you are just interested in the background, read <a
href="http://en.wikipedia.org/wiki/Quirks_mode">http://en.wikipedia.org/wiki/Quirks_mode</a>.</p><h3>3. Dynamic Pages</h3><p>Pages can change because of JavaScript or because of user interaction which triggers parts of the rendering process:</p><p><img
title="Incremental Rendering" src="https://developer.mozilla.org/@api/deki/files/235/=Gecko_Overview_34.png" alt="" /></p><p>(Source: <a
href="https://developer.mozilla.org/en/Introduction_to_Layout_in_Mozilla">https://developer.mozilla.org/en/Introduction_to_Layout_in_Mozilla</a>)</p><ul><li>If DOM elements are added or removed, the typical response of the browser is to follow the rendering process described earlier in almost serial order</li><li>If the Style attribute on an element is changed, the style for the element needs to be recomputed, the page re-flown and re-painted<ul><li>Browsers may optimize this by batching style re-computes by queuing them</li><li>However, scripts often read back changes that they have just made which requires the re-styling queue to be flushed</li><li>For better performance, make style changes as a batch and then read them in a batch so that the queue is flushed less frequently</li></ul></li><li>Some style changes are cheaper:<ul><li>Changing size / location would not require style re-compute but only re-flowing and re-painting</li><li>Color change does not require re-flowing, but only re-painting</li><li>Scrolling also does not require re-computation, but only re-painting &#8211; this is typically done incrementally and may not even require full repainting (but things like fixed background images would necessitate full repainting). So moving elements by scrolling programmatically can be faster than moving elements by modifying their style attribute</li></ul></li><li>Re-Flow &#8211; because of position or size changes &#8211; is typically recursive (root to leafs)<ul><li>Some attribute changes in a child can trigger changes in the entire ancestry all the way up to the root. Example: Height changes</li><li>Some attribute changes in a parent can trigger changes in all the descendants right down to leaves. Example: Width changes</li><li>Browsers can detect that only a section of the tree may change and do re-flow only on that sub-tree</li></ul></li></ul><h3>4. Resource Loading</h3><p>As the Parser goes over the Content Tree, it may see an element referring an external resource (image, CSS, JS, font, etc), which needs to be loaded. This loading happens as follows:</p><h4>Order of Loading</h4><p>While typically resources get loaded in the order of appearance in the document, browsers optimize this by prioritizing stylesheets and JavaScript files ahead of images. It is also recommended that stylesheets be put at the top. This is because:</p><ul><li>Stylesheets are required to build the Rendering Tree, but have no impact on Content Tree, so HTML parsing and JS execution can continue while CSS is downloaded and loaded.</li><li>A script could ask for style information even as the stylesheet is being downloaded and the rendering tree is being built. If this happens, you get an error. So you want to load styles before JS starts executing</li></ul><h4>Parallel Loading</h4><p>Modern browsers maintain multiple persistent connections to a server. This allows parallel loading. Parallel loading is a good thing because it reduces the overall latency of the page getting delivered to the end-user. However, out of consideration for the load factor on web-servers, The HTTP 1.1 RFC <a
href="http://tools.ietf.org/html/rfc2616#section-8.1.4">recommends</a> that &#8220;Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a given server. A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy&#8221;.</p><p>Note that there is a trade-off here between the overhead of number of open sockets vs. the overhead of opening new sockets, and its impact on latency. With the number of external resources being fetched by a page going up, it makes sense to optimize for reducing the number of times a new connection has to be setup, to reduce the latency and improve the user experience. Indeed, most browsers these days allow more than 2 simultaneous connections per host. Steve Souders summarizes the current situation nicely in his <a
href="http://www.stevesouders.com/blog/2008/03/20/roundup-on-parallel-connections/">Roundup on Parallel Connections</a>.</p><h4>Blocking Loads</h4><p>Since a script can call document.write(), parsing can&#8217;t proceed before the script is fully loaded, executed (if there is any inline script in the script block) and document.write() has been inserted. This means that a script load blocks parsing, and that means that further loading is blocked, preventing the parallelism mentioned above from being exploited. Modern browsers do help a bit. For example, in WebKit, when the main parser gets blocked because of a script load, it starts a side parser that figures out other resources to load in the rest of the HTML. However, that is WebKit &#8211; for other browsers, there are a couple of ways out:</p><ol><li>Put script blocks at the end &#8211; that way they do not pause any further parsing</li><li>Use a hack to download scripts asynchronously &#8211; Souders sums these up in his post on <a
href="http://www.stevesouders.com/blog/2009/04/27/loading-scripts-without-blocking/">Loading Scripts Without Blocking</a></li><li>HTML 5 specifies the <a
href="http://dev.w3.org/html5/spec/Overview.html#attr-script-async">async attribute on the script tag</a> which tells the browser that the script does not require synchronous execution and the parser can continue. WebKit <a
href="http://webkit.org/blog/1395/running-scripts-in-webkit/">recently</a> started supporting this attribute and Firefox has supported this since 3.6.</li></ol><h3>5. Physical Architecture</h3><p>Web browsers started off with a single process, single thread model. This was acceptable since web-pages were just documents that had to be rendered. However, the web has evolved from being document-centric to becoming application-centric &#8211; a lot of sites these days are applications, with a lot of active code, a far-cry from the static content browsers were designed to render. This gives rise to problems of stability, performance and security. To address these, most browsers have moved (or are in the process of moving) to a multi-process architecture. There are three drivers behind this trend:</p><ul><li><strong>Performance:</strong> Multiple processes exploit multiple cores</li><li><strong>Security:</strong> The browser can spin up a new process in a lower privilege mode, reducing / removing the impact of malicious code</li><li><strong>Stability:</strong> A badly behaved page / script / plugin does not impact others since it is isolated in a process.</li></ul><h4>Firefox</h4><p>Firefox uses a single thread, single process model. This means that in Firefox <a
href="http://www.mail-archive.com/mozilla-layout@mozilla.org/msg03579.html">a single UI thread is shared by all windows</a>. The reason for that apparently is to allow X-DOM blocking calls from diff pages of the same origin. More details on <a
href="http://www.mail-archive.com/mozilla-layout@mozilla.org/msg03580.html">http://www.mail-archive.com/mozilla-layout@mozilla.org/msg03580.html</a>. Network calls and web-worker requests are handled on different threads.</p><p>To provide better isolation and reliablility Firefox will move to a multi-process model with its <a
href="https://wiki.mozilla.org/Electrolysis">Electrolysis</a> project. However, this seems to be for plugins alone and pages would continue to be served from a single process.</p><h4>IE</h4><p>The first browser to ship with multiple process support was IE 7, with each browser window running in its own process:</p><p><img
title="IE 7 Process Model" src="http://ieblog.members.winisp.net/images/IE.Process.Model3.png" alt="" /></p><p>(source: <a
href="http://blogs.msdn.com/b/ie/archive/2008/03/11/ie8-and-loosely-coupled-ie-lcie.aspx">http://blogs.msdn.com/b/ie/archive/2008/03/11/ie8-and-loosely-coupled-ie-lcie.aspx</a>)</p><p>IE 8 improved upon this model by putting each tab in its own process, but moving the frame and the broker into a common process for improving startup time. Microsoft calls this architecture Loosely Coupled Internet Explorer (LCIE):</p><p><img
title="IE 8 Process Model" src="http://ieblog.members.winisp.net/images/IE8.Process.Model2.png" alt="" /></p><p>(source: <a
href="http://blogs.msdn.com/b/ie/archive/2008/03/11/ie8-and-loosely-coupled-ie-lcie.aspx">http://blogs.msdn.com/b/ie/archive/2008/03/11/ie8-and-loosely-coupled-ie-lcie.aspx</a>)</p><p>The actual model is however more sophisticated than what the diagram above suggests since IE 8 tries to balance the benefits of more processes with the extra overhead, without compromising on security. The actual process model is:</p><ul><li><strong>Protected Mode processes:</strong> Irrespective of memory overhead, sites with different levels of configured security open in different processes. This approach called <a
href="http://msdn.microsoft.com/en-us/library/bb250462(VS.85).aspx">Protected Mode</a> is based on <a
href="http://en.wikipedia.org/wiki/Mandatory_Integrity_Control">Mandatory Integrity Control</a></li><li><strong>Context-based tab-processes:</strong> The decision on whether to create a new tab-process or not is made depending upon the amount of memory available</li><li><strong>Max tab-processes:</strong> A specific value of maximum tab processes that can be created for a single isolated session at specific MIC</li></ul><p>More details: <a
href="http://blogs.msdn.com/b/askie/archive/2009/03/09/opening-a-new-tab-may-launch-a-new-process-with-internet-explorer-8-0.aspx">http://blogs.msdn.com/b/askie/archive/2009/03/09/opening-a-new-tab-may-launch-a-new-process-with-internet-explorer-8-0.aspx</a></p><h4>Chrome</h4><p>Chrome follows an approach similar to that of IE 8 &#8211; the host process for a tab is called a Renderer and the broker process is called Browser:</p><p><img
title="Google Chrome Process Model" src="http://dev.chromium.org/_/rsrc/1220197832277/developers/design-documents/multi-process-architecture/arch.png" alt="" /></p><p>(source: <a
href="http://dev.chromium.org/developers/design-documents/multi-process-architecture">http://dev.chromium.org/developers/design-documents/multi-process-architecture</a>)</p><p>Chrome supports four <a
href="http://dev.chromium.org/developers/design-documents/process-models">process models</a>:</p><ul><li><strong>Process per site-instance:</strong> Different visits to a site are in separate processes. Provides the highest level of isolation but also creates more overhead.</li><li><strong>Process per site:</strong> Different sites are isolated from each other, but visits to the same site run in the same process. Reduces overall memory overhead, but if you have several pages from a site open, the size of a single Renderer would be quite large, perhaps slowing it down.</li><li><strong>Process per tab:</strong> While the previous models consider source of origin, the process per tab model is based on the choice a user makes. One process is used for rendering one tab, and if in the same tab you switch to a different site, the process would continue.</li><li><strong>Single process:</strong> This is the simplest process with no isolation.</li></ul><p>In both Chrome and IE, a frame runs in the same process as its parent page. Also, separate process may prevent legal interactions between two pages from the same origin. Chrome&#8217;s solution to this is to not permit a x-process call even if it is legal. What IE does is to proxy these specific calls and convert them behind the scenes into some sort of IPC. This may also be supported by Chrome at some later stage.</p> <div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/VineetGupta?a=fQrhrL-Kr-U:0T0DsTdlxm4:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=fQrhrL-Kr-U:0T0DsTdlxm4:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=fQrhrL-Kr-U:0T0DsTdlxm4:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=fQrhrL-Kr-U:0T0DsTdlxm4:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=fQrhrL-Kr-U:0T0DsTdlxm4:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=fQrhrL-Kr-U:0T0DsTdlxm4:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=fQrhrL-Kr-U:0T0DsTdlxm4:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=fQrhrL-Kr-U:0T0DsTdlxm4:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/VineetGupta/~4/fQrhrL-Kr-U" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://www.vineetgupta.com/2010/11/how-browsers-work-part-1-architecture/feed/</wfw:commentRss> <slash:comments>21</slash:comments> <feedburner:origLink>http://www.vineetgupta.com/2010/11/how-browsers-work-part-1-architecture/</feedburner:origLink></item> <item><title>PDC 2010 Announcements</title><link>http://feedproxy.google.com/~r/VineetGupta/~3/394TwO3UZdo/</link> <comments>http://www.vineetgupta.com/2010/11/pdc-2010-announcements/#comments</comments> <pubDate>Tue, 02 Nov 2010 17:35:14 +0000</pubDate> <dc:creator>Vineet Gupta</dc:creator> <category><![CDATA[.Net]]></category> <category><![CDATA[Cloud]]></category> <category><![CDATA[Mobile]]></category> <category><![CDATA[Web]]></category> <category><![CDATA[Azure]]></category> <category><![CDATA[IE 9]]></category> <category><![CDATA[PDC 2010]]></category> <category><![CDATA[Windows Phone 7]]></category><guid isPermaLink="false">http://www.vineetgupta.com/?p=10061</guid> <description>The PDC 2010 got over last weekend and there seems to be some interesting stuff that has come out of Redmond. While most of the big changes are on the Cloud side, the roadmap on the web and mobile platform is much clearer now with some important changes in strategy. Cloud Ok, I&amp;#8217;m gonna stick &lt;a
href='http://www.vineetgupta.com/2010/11/pdc-2010-announcements/'&gt;[...]&lt;/a&gt;</description> <content:encoded><![CDATA[<h1><span
style="font-weight: normal; font-size: 13px;">The <a
href="http://www.microsoftpdc.com">PDC 2010</a> got over last weekend and there seems to be some interesting stuff that has come out of Redmond. While most of the big changes are on the Cloud side, the roadmap on the web and mobile platform is  much clearer now with some important changes in strategy.</span></h1><h2>Cloud</h2><p>Ok, I&#8217;m gonna stick my neck out on this &#8211; IMHO, post PDC 2010, Microsoft has the most comprehensive offering among all the Cloud vendors when it comes to Windows. They are doing what everyone else is doing, and more. Besides, they are making it far simpler to move existing apps to the cloud and integrating your on-premise apps + infra with the cloud. The big problem is that they are only doing this for Windows, and there is nothing on the Linux side. So if you are a pure Microsoft shop, choosing Azure for the Cloud is a no-brainer. However, if you have a mixed environment, you would need to host your Linux workloads somewhere else, which may / may not work for you.</p><p><strong>VM Role:</strong> Azure now offers a <a
href="http://www.microsoft.com/windowsazure/compute/default.aspx">hypervisor in the sky</a> to host your Windows VMs. This is interesting since with the VM Role, Azure now provides both models &#8211; a auto-scalable, compute service exposed thru CLR and Win32, that competes with Google App Engine&#8217;s Python / JVM fabric, as well as the your-own-box-in-the-sky VPS model that competes with Amazon EC2, Rackspace and others. Jargon hungry analysts call the former PaaS (Platform as a Service) and the latter IaaS (Infrastructure as a Service). Both are useful &#8211; the former model works where you need agility of deployment, infinite scalability, and where your app and deployment architecture can be accommodated on top of the primitives provided by the service. The latter is useful if you need more flexibility in your architecture, or if you want to do a quick migration of an existing app without modifying stuff. The VM role would initially be available thru a invite-only beta program starting next year.</p><p><strong>Enhancements to Windows Azure:</strong> While the VM role certainly creates more flexibility, the PaaS version has become more flexible too:</p><ul><li><strong>Full IIS and Elevated Privilege:</strong> So far an Azure deployment could have a web-role, a worker-role or both, with the web-role mapping to a IIS website. What changes is that the web-role would now start mapping to an IIS instance, allowing you to run multiple websites under a single web-role, not to mention the ability to install IIS modules (or configure the APS.net pipeline). Also, the web and worker roles in Azure can now run in an elevated privilege mode. This would again help people migrate their existing apps.</li><li><strong>Azure Connect:</strong> A <a
href="http://www.microsoft.com/windowsazure/virtualnetwork/default.aspx">virtual network</a> that enables direct IP-based connectivity of app instances to and from on-premise infra. Basically sets up a IpSec based tunnel. Very helpful for sysads and devs who need to bring cloud instances under the org AD, or where cloud apps need to access on-premise data.</li><li><strong>Remote Desktop:</strong> Right now the way to manage your Azure deployment is to use the <a
href="http://code.msdn.microsoft.com/azurecmdlets">Powershell Cmdlets for Azure</a>. Most people like them since they are bandwidth friendly and scriptable. But now there is help for the command-line challenged. You can now &#8220;see&#8221; your application host over remote desktop. Not sure if this exposes any new functionality beyond what was already available thru the <a
href="http://msdn.microsoft.com/en-us/library/ee460799.aspx">Management API</a>.</li><li><strong>Better Java Support:</strong> I am not sure if people are aware of this, but it has been possible to run Java apps on Azure since around an year, including the ability to run Tomcat (thru the <a
href="http://code.msdn.microsoft.com/winazuretomcat">Tomcat Acclerator</a>), ability to access Windows Azure services thru the open source <a
href="http://www.windowsazure4j.org/">Azure SDK for Java</a> and a JDBC driver for SQL Azure. What&#8217;s new is an <a
href="http://www.windowsazure4e.org/">Eclipse plugin</a> and <a
href="http://www.jdotnetservices.com/">App Fabric SDK</a> to access App-Fabric (see below) from Java.</li><li><strong>Better PHP Support:</strong> Similar to Java, it is possible to run PHP apps on Azure. And similarly, there is a <a
href="http://phpazure.codeplex.com/">Windows Azure SDK</a>, an <a
href="http://www.windowsazure4e.org/">Eclipse plugin</a> and a PHP driver to talk to SQL Azure. Some extras: <a
href="http://azurephptools.codeplex.com">command-line tools for PHP</a> and a <a
href="http://www.interoperabilitybridges.com/projects/windows-azure-companion">Windows Azure Companion</a> to make it easy to deploy, diagnose and monitor PHP applications.</li></ul><p><strong>Improved App Fabric:</strong> The <a
href="http://www.microsoft.com/en-us/appfabric/azure/">App Fabric</a> is one of the strongest differentiators in the Azure stack. This is some very sophisticated middleware that I think is unique among all the Cloud vendors today:</p><ul><li><strong>Access Control:</strong> Identity integration with AD, Live Id, Facebook, Google and Yahoo. Authorization thru declarative rules that transform incoming security claims into claims understood by app. REST API.</li><li><strong>Service bus:</strong> Allows message exchange across firewalls, NAT gateways, etc. with the same programming model as for WCF or using REST, Pub-Sub system with multiple topics, Full-duplex bi-directional communication, including P2P with NAT traversal. It is now possible to do durable messaging, which increases delivery assurance to mobile devices / 3rd parties. If you have ever tried to do cross network messaging, you know how difficult firewall traversal and NAT traversal can get, and how troublesome DoS attacks can get. The Service Bus addresses all these issues and more.</li><li><strong>Integration:</strong> BizTalk in the sky &#8211; pipelines, transforms, adaptors, BAM, Rules, self-service trading partner community portal and provisioning of B2B pipelines. What, you do not know about BizTalk? Well, think of it as a higher degree of abstraction for the messaging layer, and a lot of pre-built patterns and functionality. If you are running multiple apps in your IT, sooner or later you need something like this. App Fabric Integration brings it to the sky so that you can integrate your on-premise apps and cloud apps using the same model. Not available yet though &#8211; the invite only CTP starts in 2011.</li><li><strong>Caching:</strong> Memcached like distributed in-memory cache, available on the local network. No limits on object size, pre-built ASP.net providers for session state and page output caching (so no app code changes required). Additionally, this can be secured thru Access Control (mentioned above). However, there is no HA yet. Available thru a invite-only CTP.</li><li><a
href="http://www.microsoft.com/en-us/appfabric/azure/composite-apps.aspx">Composite Apps</a>: So you have got your app all nicely written with access control defined, endpoints exposed using Service Bus, message transformation, etc. being handled by the Integration service and frequently used data cached. Is this enough? What about lifetime management? Scale out/scale-in? Deployment automation? Monitoring? Mission critical systems need all this and more. This is where the Composite Apps service comes in. This essentially provides a multi-tenant runtime and tools for handling all of the above thru a declarative model. Again, CTP would be available in 2011</li></ul><p><strong>Improved CDN:</strong> So far Azure CDN was only able to pull content from Azure Storage. Now it can also cache data returned from the app (dynamic caching). Also the CDN is available over SSL/TLS. Besides, they have put up POPs in Middle East and improved the pipe.</p><p><strong>Improved SQL Azure:</strong> SQL Azure would start providing Reporting Services (thru an invite-only CTP to begin with), a Web-based management console (called Database Manager) and ability to <a
href="http://www.microsoft.com/en-us/SQLAzure/datasync/default.aspx">Sync</a> entire databases or specific tables between on-premises SQL Server and SQL Azure, or between SQL Azure datacenters. The data to be synced is declarative defined, and the sync can be scheduled, with conflict handling and logging + monitoring support.</p><p><strong>Azure Marketplace:</strong> The <a
href="http://www.microsoft.com/windowsazure/marketplace/">Azure Marketplace</a> &#8220;is an online marketplace for developers to share, find, buy and sell building block components, training, service templates, premium data sets plus finished services and applications needed to build Windows Azure platform applications.&#8221; The data part (which I think is more interesting) is already available at the <a
href="https://datamarket.azure.com/">Data Market</a>. This includes both public domain data-sets as well as commercial data-sets on a wide variety of topics &#8211; &#8220;everything from demographics, navigation, crime, media, to web services and algorithms for data cleansing.&#8221; Cool stuff!</p><h2>Web and Mobile</h2><p>One good thing that happened at the PDC was that Microsoft came clean on its UI platform strategy, and the strategy is a pretty simple, logical one:</p><ul><li>Web Apps thru HTML 5</li><li>Native apps on Desktop thru WPF</li><li>Native apps on Mobile thru Silverlight</li></ul><p>There is a lot of noise in the media about how Microsoft has dumped Silverlight and is repositioning it for the mobile, etc. There are three aspects to this discussion:</p><p><strong>Why focus on HTML 5 for the Web</strong></p><p>This is a no-brainer:</p><ul><li>HTML 5 already provides most capabilities that required plugins like Flash / SL not so long back, so SL is not adding a ton of value over HTML 5 anymore.</li><li>HTML 5 enjoys tremendous momentum with web-devs</li><li>Firefox and Webkit have better support for HTML 5 than IE, and hence more web-dev mind-share</li><li>IE is losing ground to Chrome, Firefox and Safari</li></ul><p>The choice for Microsoft is obvious &#8211; embrace HTML 5 completely and take a leadership role. And now with this commitment coming right from the top, with SteveB himself vaxing eloquent about HTML 5, you can bet that the IE team will produce the goods. This may not happen in the IE 9 timeframe though, but then this is a long-term play.</p><p><strong>What about SL on the Web</strong></p><p>There are lot of people who think that SL is about to disappear from the web, but that is incorrect. There are lots of things that are still very hard to do in HTML 5, for which developers will continue to rely on plugins like Flash and Silverlight. However, there is no doubt that SL will move into a niche role, being utilized more to fill HTML 5 gaps, at least when it comes to the consumer web. On the enterprise side, SL may enjoy a more mainstream role since x-platform, x-browser and plugin deployment issues are more controllable there, and development on SL is any day simpler than on HTML 5, especially if you have a background in imperative languages.</p><p><strong>Why SL for Mobile</strong></p><p>The vision for CoreCLR &#8211; the version of CLR that powers Silverlight &#8211; has always been to provide the .Net experience on all low footprint environments. If my memory serves me right, a lot of SL core came from .Net Compact Framework which was the version of .Net shipped on Windows Mobile back in 2002-2003. And since SL 2, a team in Hyderabad has been working on making SL available on the mobile. So SL on Mobile is nothing new. It also makes sense as the mainstream platform for the Mobile, since SL is a proper subset of WPF and therefore you can share a lot of code between desktop and mobile. So why not WPF itself? I&#8217;m not very sure, but certainly the lighter footprint of SL as compared to WPF is more suitable for mobile. Also, this would allow Microsoft to optimize SL for the mobile, focussing on issues like reducing power consumption and CPU utilization. Last, but not the least, the developer momentum behind Silverlight is certainly bigger than that behind WPF, and therefore this allows Microsoft to accelerate the app development on Windows Mobile.</p><p><strong>IE 9 HTML 5 Support</strong></p><p>Things are looking better than ever on the <a
href="http://msdn.microsoft.com/en-us/ie/ff468705.aspx">HTML 5 support</a> front. The first <a
href="http://test.w3.org/html/tests/reporting/report.htm">Official HTML5 Test Suite Conformance Results</a> from W3C show that IE 9 is passing most tests, however the coverage of the test itself seems fairly small &#8211; only a few features are being tested. More illuminating are the following comparisons on wikipedia:</p><ul><li><a
href="http://en.wikipedia.org/wiki/Comparison_of_layout_engines_(ECMAScript)">JavaScript</a></li><li><a
href="http://en.wikipedia.org/wiki/Comparison_of_layout_engines_(Cascading_Style_Sheets)">CSS</a></li><li><a
href="http://en.wikipedia.org/wiki/Comparison_of_layout_engines_(Document_Object_Model)">DOM</a></li><li><a
href="http://en.wikipedia.org/wiki/Comparison_of_layout_engines_(HTML5)">HTML 5</a></li><li><a
href="http://en.wikipedia.org/wiki/Comparison_of_layout_engines_(HTML5_Canvas)">HTML 5 Canvas</a></li><li><a
href="http://en.wikipedia.org/wiki/Comparison_of_layout_engines_(HTML5_Media)">HTML 5 Media</a></li></ul><p>After going thru these comparisons, my take is that all browsers are at varying degrees of HTML 5 support and IE 9 is not too far behind. How quickly the various browsers start supporting the major features would be a function of resources deployed and how quickly the release cycles go.</p><p>What should also be obvious from the above comparisons is that it is utterly foolish to do browser detection since the feature set of each browser is going to change very rapidly over the next couple of years. What we should do instead is feature detection and fail gracefully. This is of course done by all the major JS libraries already, the risk is in your own custom code. If you are doing browser detection, you should get rid of it right now to keep it forward compatible.</p><p><strong>Windows Phone 7</strong></p><p>The development model for Windows Phone is clear enough &#8211; Silverlight for apps and XNA for games. Both make sense and provide a ton of power. What Microsoft has done is to provide a bunch of templates, tooling and APIs to make it simpler to do Windows Phone apps. Moreover, this can be done using VS Express, so there is no cost! However, for developing apps, you need to get familiar with the UI, its nothing like iOS or Android. If you do not have a Windows Phone, the simplest way is to run the emulator that ships with VS Express and figure your way around. Finally there is also the <a
href="http://marketplace.windowsphone.com/Default.aspx">Windows Phone Marketplace</a> &#8211; same concept as the App-Store, including an approval process and $ 100 fees for submitting an app. But none of this is really new &#8211; all of this has been in place for a while.</p><p>While we are on this topic, I do want to mention that I am surprised that Microsoft is taking the Apple approach to the marketplace. The real value of a marketplace is in providing consumers and developers a single point for app publication, discovery and access. The approval process and the submission fee are not good for anyone in the long run, including the platform vendor. Especially for Microsoft, who need to steal share from iOS and Android, I think a more open marketplace would have helped:</p><ul><li>Reduced costs for the developer, and therefore for the end consumer</li><li>Less process overhead for the developer</li><li>Freedom of expression &#8211; this counts for a lot in spirit, if not in fact</li><li>Mature content can be marked as such by reviewers or community, same as what happens on sites like YouTube / Facebook</li><li>There is the problem of crapware, but then a rating / popularity / recommendation system would easily address that</li><li>There would be some upfront loss of revenue, but I think right now the real problem is increasing adoption for which you need to drive up the platform value by having more content + functionality available on it.</li></ul><p>A marketplace that requires content approval and a submission fees discourages the developer from building native apps and instead focus on building mobile apps. No platform vendor likes that, since that reduces the tie-in to the platform. Microsoft understands this well enough and despite that we have an Apple like marketplace. Very baffling.</p><h2>Languages and Tools</h2><p>The focus of the next version of C# and VB would be asynchronous programming. There are two motivations for this:</p><ul><li><strong>UI responsiveness:</strong> As apps get more and more connected, the data often comes over a network call, which must not block the UI thread</li><li><strong>Scalability:</strong> Use the thread for doing some useful work and not keep it blocked</li></ul><p>Now at this stage, there is a chance that one would confuse asynchrony with parallelism. That is not the case &#8211; asynchrony is about yielding control while results are awaited, whereas concurrency is about doing two things in parallel. The former is necessarily cooperative, while the latter maybe achieved thru multiple hardware threads or pre-emptive / co-operative scheduling of multiple threds.</p><p>Just to be sure, .Net Framework has provided async support since version 1 via the begin-end pattern. So to appreciate the value of the async support being provided requires us to go a little deeper in terms of semantics, what happens behind the scenes and why the feature has been designed the way it is. These are all involved topics, and I would try and cover async support in a different post. Meanwhile, you can find more details on <a
href="http://msdn.com/vstudio/async">http://msdn.com/vstudio/async</a>.</p> <div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/VineetGupta?a=394TwO3UZdo:mliIsq_WyB4:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=394TwO3UZdo:mliIsq_WyB4:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=394TwO3UZdo:mliIsq_WyB4:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=394TwO3UZdo:mliIsq_WyB4:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=394TwO3UZdo:mliIsq_WyB4:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=394TwO3UZdo:mliIsq_WyB4:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=394TwO3UZdo:mliIsq_WyB4:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=394TwO3UZdo:mliIsq_WyB4:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/VineetGupta/~4/394TwO3UZdo" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://www.vineetgupta.com/2010/11/pdc-2010-announcements/feed/</wfw:commentRss> <slash:comments>1</slash:comments> <feedburner:origLink>http://www.vineetgupta.com/2010/11/pdc-2010-announcements/</feedburner:origLink></item> <item><title>Facebook is the new Microsoft</title><link>http://feedproxy.google.com/~r/VineetGupta/~3/ALDrwlOV11M/</link> <comments>http://www.vineetgupta.com/2010/10/facebook-is-the-new-microsoft/#comments</comments> <pubDate>Sat, 30 Oct 2010 07:47:57 +0000</pubDate> <dc:creator>Vineet Gupta</dc:creator> <category><![CDATA[Web]]></category><guid isPermaLink="false">http://www.vineetgupta.com/?p=10056</guid> <description>Facebook these days reminds me a lot of Microsoft of yore. For one, they are taking a platform approach exactly like Microsoft &amp;#8211;  Facebook Connect is the most powerful and widely used identity platform on the Internet today &amp;#8211; as enticing to developers and businesses as Windows was in its heydays. They have succeeded where &lt;a
href='http://www.vineetgupta.com/2010/10/facebook-is-the-new-microsoft/'&gt;[...]&lt;/a&gt;</description> <content:encoded><![CDATA[<p>Facebook these days reminds me a lot of Microsoft of yore.</p><p>For one, they are taking a platform approach exactly like Microsoft &#8211;  Facebook Connect is the most powerful and widely used identity platform on the Internet today &#8211; as enticing to developers and businesses as Windows was in its heydays. They have succeeded where Microsoft failed (Hailstorm, Passport) as well as the open source community (OpenId). Apparently, <a
href="http://mashable.com/2010/10/26/10000-websites-integrate-with-facebook-every-day/">10,000 websites integrate with Facebook everyday</a> and people are even taking the <a
href="http://techcrunch.com/2010/10/18/likify-qr-code/">Facebook experience offline</a>! This means that as other companies innovative and create value, Facebook gets richer. Exactly like Windows. IIRC, one of the metrics at Microsoft used to be the ratio of ISV revenue to Windows revenue, the idea being that this ratio should keep going up. Facebook today is in a similar position and benefits similarly from their platform.</p><p>Second, they are changing the rules of the game, much like Microsoft did. The model before Microsoft was a single integrated stack and customers had to pay big money for the iron, the software and the services &#8211; all from the same vendor &#8211; IBM. Microsoft changed this by commoditizing software and opening up the vertically integrated stack. In a similar vein, Facebook changes the game by offering an alternative to Google Search, hitherto the de-facto gateway to the web. A lot of content discovery today happens thru the Facebook News-feed. This means that publishers once again need to target their audience and not just the search engine, which is biased for more frequently changing content and can be, and has been gamed by publishers. This helps the small, niche publisher who would otherwise be lost in the long-tail, never landing on the first page of a Google Search. This is not just about stuff you read, but also about what you buy. My wife&#8217;s latest acquisition &#8211; a DSLR camera &#8211; was entirely based on recommendations received from friends and family, mostly on Facebook. An year back, she would have been searching and going thru review sites. Not that we did not search or read reviews, but the decision was mostly based on recommendations.</p><p>Finally, they are using this tremendous leverage to move into markets they find interesting, and much like Microsoft when they decide to move into that market, the incumbents are under threat to becoming irrelevant. This is what happened when they announced the beta of their Q&amp;A system, with Quora and Aardvark coming under threat. This is again happening with Facebook Places and this time Foursquare is under threat. Now one could argue that this is bound to happen anytime the 800 pound gorilla in any industry forays into a market, but this is different. This is about leverage. Google / Apple, despite being larger, would never create the kind of threat that Facebook creates simply because Facebook owns the user like no one else does. Exactly the same as Microsoft owning the PC like no one else. Of course Microsoft was prudent about it &#8211; they focussed only on the software bit (where the margins were highest) and deliberately kept away from the hardware and the services businesses to let the ecosystem thrive and drive the platform forward. Facebook would also need to figure out where they want to pursue opportunities and where they would not go.</p> <div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/VineetGupta?a=ALDrwlOV11M:nyAwQ_HESB0:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=ALDrwlOV11M:nyAwQ_HESB0:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=ALDrwlOV11M:nyAwQ_HESB0:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=ALDrwlOV11M:nyAwQ_HESB0:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=ALDrwlOV11M:nyAwQ_HESB0:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=ALDrwlOV11M:nyAwQ_HESB0:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=ALDrwlOV11M:nyAwQ_HESB0:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=ALDrwlOV11M:nyAwQ_HESB0:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/VineetGupta/~4/ALDrwlOV11M" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://www.vineetgupta.com/2010/10/facebook-is-the-new-microsoft/feed/</wfw:commentRss> <slash:comments>1</slash:comments> <feedburner:origLink>http://www.vineetgupta.com/2010/10/facebook-is-the-new-microsoft/</feedburner:origLink></item> <item><title>Moved to WordPress</title><link>http://feedproxy.google.com/~r/VineetGupta/~3/sSVcFj1o9dY/</link> <comments>http://www.vineetgupta.com/2010/10/moved-to-wordpress/#comments</comments> <pubDate>Sat, 30 Oct 2010 04:20:41 +0000</pubDate> <dc:creator>Vineet Gupta</dc:creator> <category><![CDATA[Personal]]></category><guid isPermaLink="false">http://www.vineetgupta.com/?p=10052</guid> <description>Last year, when I moved to Blogger, my decision was influenced by: Staying with a cloud provider for the blog service Ability to install plugins Now WordPress.com does not allow you to install plugins unless you are really big, and therefore Blogger made sense. However, I have found Blogger to be very limiting and every time &lt;a
href='http://www.vineetgupta.com/2010/10/moved-to-wordpress/'&gt;[...]&lt;/a&gt;</description> <content:encoded><![CDATA[<p>Last year, when I moved to Blogger, my decision was influenced by:</p><ul><li>Staying with a cloud provider for the blog service</li><li>Ability to install plugins</li></ul><p>Now WordPress.com does not allow you to install plugins unless you are <a
href="http://vip.wordpress.com/hosting/">really big</a>, and therefore Blogger made sense. However, I have found Blogger to be very limiting and every time I looked at the WordPress plugin ecosystem, I found myself yearning for all that richness. Finally, yesterday evening, I set up a box in the sky, installed WordPress, added the plugins I wanted and imported my blog from Blogger.</p><p>What went well:</p><ul><li>Installation was a breeze &#8211; took me less than 10 minutes to get WP up and running on a bare Ubuntu box (and I am a newbie to LAMP)</li><li>Plugins &#8211; you are spoilt for choice here, and the time taken was mostly in trying out various plugins and selecting which one to use for a specific functionality.</li><li>Import went off without a hitch, except that it kept throwing an error for one of the posts (the migration from Live to Blogger one)</li><li>Bonus was that I was able to bring back all the comments I had lost on my older Live Space blog &#8211; they could not be imported to Blogger, but WordPress worked like a charm.</li></ul><p>The only thing that did not work was that I could not find any simple way of retaining the permalinks &#8211; Blogger appends a random number after each post and short of entering that manually for each post, I can&#8217;t think of any other mechanism to retaining them. This is bad because it will break people&#8217;s bookmarks, but with the new scheme, I don&#8217;t think this will ever need changing hence.</p> <div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/VineetGupta?a=sSVcFj1o9dY:8j6INd094dQ:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=sSVcFj1o9dY:8j6INd094dQ:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=sSVcFj1o9dY:8j6INd094dQ:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=sSVcFj1o9dY:8j6INd094dQ:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=sSVcFj1o9dY:8j6INd094dQ:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=sSVcFj1o9dY:8j6INd094dQ:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=sSVcFj1o9dY:8j6INd094dQ:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=sSVcFj1o9dY:8j6INd094dQ:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/VineetGupta/~4/sSVcFj1o9dY" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://www.vineetgupta.com/2010/10/moved-to-wordpress/feed/</wfw:commentRss> <slash:comments>0</slash:comments> <feedburner:origLink>http://www.vineetgupta.com/2010/10/moved-to-wordpress/</feedburner:origLink></item> <item><title>NoSql Databases – Part 1 – Landscape</title><link>http://feedproxy.google.com/~r/VineetGupta/~3/FBNYJyxLyMg/</link> <comments>http://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape/#comments</comments> <pubDate>Mon, 04 Jan 2010 23:59:00 +0000</pubDate> <dc:creator>Vineet Gupta</dc:creator> <category><![CDATA[Databases]]></category><guid isPermaLink="false">http://guptavineet.wordpress.com/2010/01/05/nosql-databases-%e2%80%93-part-1-landscape</guid> <description>At Directi, we are taking a hard look at the way our applications need to store and retrieve data, and whether we really need to use a traditional RDBMS for all scenarios. This does not mean that we will eschew relational systems altogether. What it means is that we will use the best tool for &lt;a
href='http://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape/'&gt;[...]&lt;/a&gt;</description> <content:encoded><![CDATA[<p>At Directi, we are taking a hard look at the way our applications need to store and retrieve data, and whether we really need to use a traditional RDBMS for all scenarios. This does not mean that we will eschew relational systems altogether. What it means is that we will use the best tool for the job – we will use non-relational options wherever needed and not throw everything at a relational database with a mindless one-size-fits-all approach.<br
/> This post covers the current landscape of the NoSQL space. In a subsequent post, I intend to cover in more detail the various problem areas addressed by NoSQL systems and the specific algorithms used.<br
/> Caveat: though I have tried quite hard to understand each database, the fact is that most of the systems discussed are quite comprehensive and summarizing their capabilities in a few bullets cannot do them any justice. And it is certainly possible that my interpretation maybe wrong in some cases since I haven’t actually used these systems as yet.</p><h1>Summary</h1><p>The need to look at Non SQL systems arises out of scalability issues with relational databases, which are a function of the fact that relational databases were not designed to be distributed (which is key to write scalability), and could thus afford to provide abstractions like ACID transactions and a rich high-level query model.<br
/> All NoSQL databases try and address the scalability issue in many ways – by being distributed, by providing a simpler data / query model, by relaxing consistency requirements, etc. I find it useful to bucket them as follows:</p><p><strong>Distributed vs. Not-distributed:</strong> Distributed databases take the responsibility of data partitioning (for scalability) and replication (for availability) and do not leave that to the client.</p><table
style="width: 500px;" border="1" cellspacing="0" cellpadding="2"><tbody><tr><th
width="212" valign="top">Distributed</th><th
width="288" valign="top">Not Distributed (responsibility on client)</th></tr><tr><td
width="212" valign="top"><ul><li>Amazon Dynamo</li><li>Amazon S3</li><li>Scalaris</li><li>Voldemort</li><li>CouchDb (thru Lounge)</li><li>Riak</li><li>MongoDb (in alpha)</li><li>BigTable</li><li>Cassandra</li><li>HyperTable</li><li>HBase</li></ul></td><td
width="288" valign="top"><ul><li>Redis</li><li>Tokyo Tyrant</li><li>MemcacheDb</li><li>Amazon SimpleDb</li></ul></td></tr></tbody></table><p>Over here, the model of distribution which seems most compelling is the one used by Dynamo. Indeed, it is copied by Voldemort, Riak and Cassandra – three very different kinds of stores. The reason it is most compelling is because it gives simple knobs to an application to tune its expectations of durability, read performance, consistency, write performance, etc. This makes it very general purpose. The other reason this model is good is because it allows heterogeneous hardware to be used in an efficient way (however Cassandra departs from this).</p><p><strong>Data Model richness</strong>: The other key distinction is in terms of richness of data model:</p><table
style="width: 500px;" border="1" cellspacing="0" cellpadding="2"><tbody><tr><th
width="166" valign="top">Key-Value store</th><th
width="166" valign="top">Document store</th><th
width="166" valign="top">Column-Store</th></tr><tr><td
width="166" valign="top"><ul><li>Amazon Dynamo</li><li>Amazon S3</li><li>Redis</li><li>Scalaris</li><li>Voldemort</li></ul></td><td
width="166" valign="top"><ul><li>Amazon SimpleDb</li><li>Apache Couchdb</li><li>MongoDb</li><li>Riak</li></ul></td><td
width="166" valign="top"><ul><li>Cassandra</li><li>Google BigTable</li><li>HBase</li><li>Hyperbase</li></ul></td></tr></tbody></table><p>On one end of the spectrum are the simple key-value stores: Dynamo, S3, Redis, Scalaris, Voldemort, etc. At the other end of the spectrum are the column-stores which provide a very rich model: BigTable, Cassandra, Hypertable and HBase fall into this bucket. This richness comes at a price – the model is not simple, and you need to think data modeling grounds up. Somewhere in between are the document stores – a sort of schema free cousins of relational databases: CouchDb, Riak, MongoDb, etc. Simple to understand and richer than plain key-value stores.</p><p><strong>Disk vs. Memory:</strong> A third useful dimension is whether the database is memory-driven or disk-driven. This is important since in the latter case you need an explicit cache, while in the former case you are not durable:</p><table
style="width: 500px;" border="1" cellspacing="0" cellpadding="2"><tbody><tr><th
width="166" valign="top">Memory</th><th
width="166" valign="top">Configurable</th><th
width="166" valign="top">Disk</th></tr><tr><td
width="166" valign="top"><ul><li>Scalaris</li><li>Redis</li></ul></td><td
width="166" valign="top"><ul><li>BigTable</li><li>Cassandra</li><li>Hbase</li><li>HyperTable</li></ul></td><td
width="166" valign="top"><ul><li>CouchDb</li><li>MongoDb</li><li>Riak</li><li>Voldemort</li></ul></td></tr></tbody></table><p>On one end of the spectrum is Scalaris which is entirely memory-driven, and Redis which is primarily memory oriented (you can do background snapshots). Cassandra, BigTable, Hypertable, Hbase allow configuring how large the Memtable can get, so that provides a lot of control. The document stores – CouchDb, MongoDb and Riak – all seem to be using on-disk B+ trees, and Voldemort uses BDB and MySQL. Having a pluggable storage engine which is in-memory, does not make it configurable since the in that scenario you are entirely memory driven with no durability at all!</p><p>So which one wins? Well, I am biased towards the database solving the scalability issue by taking over the responsibility of partitioning instead of leaving that problem with me. So theoretically, Cassandra seems to combine the best of both worlds: the sophisticated distribution model of Dynamo with the richness of a column-store. Also, it is in heavy production use despite still being in beta. However, I believe the learning curve here would be higher for modeling data, and it may make sense to opt for the simplicity of a Voldemort for most cases. Ultimately it would depend on your app requirements and development team.</p><h1>1. Why Non-Relational</h1><p>Some of the stuff here may not resonate with you if you are an enterprise developer since enterprise apps don’t have to deal with the kind of gigantic scale that (some) consumer web applications deal with. However, given the rate at which data is growing and the number of users who are using IT systems, these issues are only going to become more and more common – for smaller consumer apps, as well as for enterprise apps. In fact, even today, irrespective of the scale at which your app operates, if you want to take advantage of a Cloud platform like Google App Engine or Microsoft Azure or Amazon Web Services, you would perhaps need to think of some of the issues below, because at the infra level these platforms do have to bother about high scale and may impose constraints on the application / data model to help them scale.</p><h2>1.1 Relational databases are hard to scale</h2><h3>1.1.1 Replication &#8211; scaling by duplication</h3><ul><li>Master-Slave:<ul><li>Each write results in N x writes where N is the number of slaves. This leads to diminishing returns beyond a point, thus imposing a limit</li><li>While reads would get faster (since you can now read from N nodes), writes are still bottle-necked to one node</li><li>Critical reads still need to go the master since the write may not have propagated to all nodes. This logic needs to be built into the app.</li><li>High volumes of data pose a problem since you need to duplicate the data N times. This also leads to limiting how much you can scale with<br
/> this approach.</li></ul></li><li>Multi-Master<ul><li>Can improve write scaling by adding more masters, but this may lead to conflicts</li><li>Conflict resolution happens at O(N<sup>3</sup>) or O(N<sup>2</sup>) &#8211; <a
title="http://research.microsoft.com/~gray/replicas.ps" href="http://research.microsoft.com/~gray/replicas.ps">http://research.microsoft.com/~gray/replicas.ps</a></li></ul></li></ul><h3>1.1.2 Partitioning (sharding) – scaling by division:<strong> </strong></h3><ul><li>Scales reads as well as writes</li><li>Not transparent to the application. The application needs to be partition aware.</li><li>The value of an RDBMS is in relations. Once partitioned, these relations get broken – you cannot do a join across shards – this now needs to be done in the app layer.</li><li>In general, manual sharding in relational databases is <a
href="http://www.25hoursaday.com/weblog/2009/01/16/BuildingScalableDatabasesProsAndConsOfVariousDatabaseShardingSchemes.aspx">not simple</a>.</li></ul><h2>1.2 Don’t need some features</h2><h3>1.2.1 UPDATEs and DELETEs</h3><ul><li>Typically not used since that leads to loss of information<ul><li>May need the record for auditing, or for re-activation</li><li>Typically, the info is never really “deleted” from a domain perspective anyway<ul><li>A user “leaves” a community – his posts would not be removed</li><li>Employees “leave” companies – their employment record is maintained</li><li>The canonical ACID transactions example: debits from a bank account – this is not a DELETE, but an INSERT</li></ul></li><li>Nor is info just “updated”<ul><li>Salary gets “incremented” – the previous record is maintained</li><li>Your bank account balance is not updated – there are “debits” and “credits”</li></ul></li></ul></li><li>So one can typically model an UPDATE / DELETE as an INSERT and version the record.<ul><li>When data gets too large, archive inactive parts of data</li></ul></li><li>Two problems that arise when you go for an INSERT-only system:<ul><li>The database cannot help you with cascades thru triggers &#8211; this needs to be done explicitly in the app layer<ul><li>The cascades are actually far more complex than propagating a DELETE / UPDATE – this is a domain requirement:<ul><li>When an employee leaves, you need to update the payroll system so that full and final compensation can be carried out</li><li>Everytime a bank account gets debited, checks need to be made on standing instructions, minimum account balance, etc.</li></ul></li></ul></li><li>Queries need to filter out the inactive records<ul><li>Can lead to dirty looking code – addressed using views</li><li>There would be some perf penalty that can be addressed by archival</li></ul></li></ul></li></ul><h3>1.2.2 JOINs</h3><ul><li>Why avoid<ul><li>Joins are expensive when data volumes are high since the database server has to perform complex set operations over large volumes of data</li><li>Do not work across partitions</li><li>Techniques like Materialized / Indexed Views not supported by all databases</li></ul></li><li>How to avoid? De-normalize!<ul><li>Purpose of normalization<ul><li>Make it easier to have consistent data by keeping just one copy</li><li>Reduce the amount of storage</li></ul></li><li>With De-normalization<ul><li>Burden of consistency shifts from the database to the application layer</li><li>Easier if you only do INSERTs and no UPDATEs / DELETEs</li><li>Would lead to data bloat – can be significant for large volumes, but storage is cheap and you can archive inactive data</li></ul></li></ul></li></ul><h3>1.2.3 ACID Transactions</h3><ul><li><strong>Atomic</strong> – do not need atomicity on modification of more than one record. Single key atomicity is enough</li><li><strong>Consistency</strong> – <a
href="http://portal.acm.org/citation.cfm?doid=564585.564601">CAP theorem</a> – can get any two of Consistency, Availability, Partition tolerance – not all three. (Also see <a
href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.1915">http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.1915</a>)<ul><li>Most systems need partition tolerance and availability ahead of consistency.<ul><li>Customer wants to place an order – you will accept the order, not return the money saying the system is unavailable – availability is important</li><li>Inventory would be checked asynchronously</li><li>Order details would be checked asynchronously</li><li>… would be done asynchronously</li><li>All this while data would be in an inconsistent state</li><li>This is ok – businesses are like that. They do not operate on a single version of truth. Reconciliation happens all the time.</li></ul></li><li>Therefore our data model need not be strongly / strictly consistent. We can do with <a
href="http://www.allthingsdistributed.com/2008/12/eventually_consistent.html">Eventual Consistency</a>.</li><li>In most scenarios we need Read-Your-Writes Consistency and Monotonic Reads Consistency (as defined by Vogels in the paper above)</li><li>Strong consistency relies upon conflict resolution at write time to keep read complexity simpler. This <a
href="http://research.microsoft.com/apps/pubs/default.aspx?id=68247">does not scale</a>.</li></ul></li><li><strong>Isolation</strong> – do not need isolation beyond Read-Committed. Easy with single key atomicity (above)</li><li><strong>Durability</strong> – need durability till the time that RAM becomes cheap enough that one can afford many peer replicated nodes holding data in memory so that data is available even with node failures.</li></ul><h3>1.2.4 Fixed Schema</h3><ul><li>In an RDBMS you have to define the schema before you can start using data (somewhat like declaring types in statically typed languages)<ul><li>Define each entity (table), its attributes (columns) and relations between entities</li><li>Define usage patterns (indexes)</li></ul></li><li>Modifying schemas is essential<ul><li>Intense competition and rapid growth necessitate adding new features / tweaking existing features rapidly</li><li>Changes usually require modifying the data model, thus precipitating schema changes</li></ul></li><li>Modifying the schema is hard<ul><li>Adding / Deleting / Modifying a column may lock the rows (imagine doing this for several million rows)</li><li>Adding / Removing an index may lock up the table</li></ul></li></ul><h2>1.3 Don’t get some features</h2><ul><li>Hard to <a
href="http://www.slideshare.net/quipo/trees-in-the-database-advanced-data-structures">model hierarchical data</a></li><li>Hard to model graphs</li><li>Don’t rely primarily on main memory<ul><li>Preferable to avoid going to disk as far as possible and serve out of main memory to get faster response time</li><li>Most relational systems are not memory oriented, but disk-oriented. Even with large main memory, relational databases end up going to disk for most queries – they are not aggressive about serving data from main memory and avoiding going to disk.</li><li>Vendors are trying to address this by acquiring / building in-memory database technology, but this is far from mainstream</li></ul></li></ul><h1>2. Desired Characteristics</h1><p>The environment expected is that the system would be spread over 100s to 1000s of commodity machines (hence called nodes) with different capacities. In a system like this failure is expected, tolerated and recovered from, with no loss of data, and without affecting the overall a<br
/> vailability and scalability of the system at large. The following characteristics are explicitly required to address these requirements:</p><h2>2.1 High Scalability</h2><ul><li>Ability to add nodes incrementally to support more users and data</li><li>Achieved via partitioning</li><li>Increasing number of nodes should not result in a diminishing return beyond a threshold</li></ul><h2>2.2 High Availability</h2><ul><li>No single point of failure</li><li>Data should be available even as some nodes go down</li><li>Achieved via replication since data is duplicated</li><li>Also achieved via partitioning since at least some data continues to be available</li></ul><h2>2.3 High Performance</h2><ul><li>Operations should return fast</li><li>Achieved via being main-memory oriented instead of being disk-oriented, using non-blocking writes, lower complexity algorithms, etc.</li></ul><h2>2.4 Atomicity</h2><ul><li>Individual writes need to be atomic – should not expose in-between state to a read operation</li><li>Batching of multiple writes into a single atomic unit not required</li></ul><h2>2.5 Consistency</h2><ul><li>Do not need strong consistency</li><li>Ok to have eventual consistency (read-your-writes consistency)</li></ul><h2>2.6 Durability</h2><ul><li>Data should be persisted to disk and not just kept in volatile memory</li></ul><h2>2.7 Deployment Flexibility</h2><ul><li>Addition / removal of nodes should distribute data and load automatically without requiring manual intervention</li><li>Should not impose constraints like requiring a distributed file system or a shared storage</li><li>Should not require specialized hardware</li><li>Should be able to work with heterogeneous hardware – identical hardware should not be required to use the system optimally (otherwise you need to upgrade all nodes to get max efficiency which is not practical in a system running on 1000s of nodes)</li><li>Application should be able to control the degree of consistency, durability and read / write performance it requires without being aware of the deployment model.</li></ul><h2>2.8 Modeling flexibility</h2><p>Should be able to model various types of data in a simple way:</p><ul><li>Key-Value pairs</li><li>Hierarchical data</li><li>Graphs</li></ul><h2>2.9 Query Flexibility</h2><ul><li>Multi-Gets (retrieve a bunch of values, for the set of keys provided, in one call)</li><li>Range queries (retrieve data based on the specified range of the key)</li></ul><h1>3. Database Managers</h1><p>A <a
href="http://en.wikipedia.org/wiki/Dbm">Database Manager</a> is a software library that is loaded in-process to provide a high performance database. Its focus is typically restricted to the job of maintaining on-disk (or in-memory) data structures. These libraries are often used by distributed databases as a storage backend for managing the CRUD operations.</p><h2>3.1 Berkley DB</h2><ul><li><a
href="http://www.oracle.com/database/berkeley-db/index.html">http://www.oracle.com/database/berkeley-db/index.html</a></li><li>Written in C. Language bindings: C++, Java</li><li>Data Model: <a
href="http://www.oracle.com/technology/documentation/berkeley-db/db/gsg/JAVA/accessmethods.html">BTree, Hash, Queue (fixed length) and Logical Record Number</a></li><li>API model: <a
href="http://www.oracle.com/technology/documentation/berkeley-db/db/gsg/JAVA/javadplconcepts.html#key-data">Key-Value</a>. Value is opaque</li><li>Concurrency support<ul><li>Multi-process and multi-threaded access</li><li><a
href="http://www.oracle.com/technology/documentation/berkeley-db/db/gsg_txn/JAVA/lockingsubsystem.html">Locking subsystem</a>.</li></ul></li><li>Optional ACID transactions<ul><li>Three <a
href="http://www.oracle.com/technology/documentation/berkeley-db/db/gsg_txn/JAVA/isolation.html">isolation levels</a></li><li>Support for <a
href="http://www.oracle.com/technology/documentation/berkeley-db/db/gsg_txn/JAVA/usingtxns.html#nodurabletxn">non-durable transactions</a></li></ul></li><li>Optional Replication<ul><li><a
href="http://www.oracle.com/technology/documentation/berkeley-db/db/gsg_db_rep/JAVA/introduction.html#overview">Master-slave</a> replication (single master). Master selection based on <a
href="http://research.microsoft.com/en-us/um/people/lamport/pubs/lamport-paxos.pdf">Paxos</a> type <a
href="http://www.oracle.com/technology/documentation/berkeley-db/db/gsg_db_rep/JAVA/elections.html">elections</a>.</li><li><a
href="http://www.oracle.com/technology/documentation/berkeley-db/db/gsg_db_rep/JAVA/apioverview.html#repapioverview">Pluggable</a> communication layer</li></ul></li></ul><h2>3.2 Tokyo Cabinet</h2><ul><li><a
href="http://1978th.net/tokyocabinet/">http://1978th.net/tokyocabinet/</a></li><li>Written in C. Language Bindings: <a
href="http://1978th.net/tokyocabinet/javadoc/">Java</a>, <a
href="http://1978th.net/tokyocabinet/rubydoc/">Ruby</a>, <a
href="http://1978th.net/tokyocabinet/perldoc/">Perl</a>, <a
href="http://1978th.net/tokyocabinet/luadoc/">Lua</a></li><li>Data Model: <a
href="http://1978th.net/tokyocabinet/spex-en.html#tchdbapi">Hash</a>, <a
href="http://1978th.net/tokyocabinet/spex-en.html#tcbdbapi">B+ Tree</a>, <a
href="http://1978th.net/tokyocabinet/spex-en.html#tcfdbapi">Fixed length</a> and <a
href="http://1978th.net/tokyocabinet/spex-en.html#tctdbapi">Table</a></li><li>Concurrency:<ul><li>Re-entrant API functions.</li><li>Writers block</li><li>Locking granularity is per record for hash and fixed length databases, Per file for others</li></ul></li><li>Transactions:<ul><li>Atomic operations</li><li>Two isolation levels: serializable and read uncommitted</li><li>Durability thru write ahead logging and shadow paging</li></ul></li></ul><h3></h3><h1>4. Key Value Stores</h1><p>Key-Value stores provide the simplest possible data model. However, this comes at a cost. Range queries are not straightforward (unless the database provides explicit support), and in general modeling applications in general on top of a key-value store <a
href="http://seattleweb.intel-research.net/people/lamarca/pubs/paper-ChaRam.pdf">can get complicated</a>.</p><h2>4.1 Amazon Dynamo</h2><ul><li><a
href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html">http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html</a> – terrific paper, must read!</li><li>Distributed key-value store. Values are opaque to system. Targets objects typically &lt; 1 MB</li><li>Internal to Amazon, not available for direct consumption externally</li><li>Partitioning<ul><li>Uses a variant of <a
href="http://www.akamai.com/dl/technical_publications/ConsistenHashingandRandomTreesDistributedCachingprotocolsforrelievingHotSpotsontheworldwideweb.pdf">consistent hashing</a> that addresses<ul><li>Non-uniform data and load distribution</li><li>Heterogeneity in capacity / performance of each node</li></ul></li><li>The hash ring is divided into V partitions – each being serviced by a virtual node. There are P physical servers, with each physical server being typically responsible for Q/P virtual nodes, however this is decided based on the capacity of each physical node. Number of physical nodes is much less than number of virtual nodes.</li><li>When a physical node starts for the first time, it chooses its set of tokens (virtual nodes) and maps nodes to their tokens. This mapping is stored on disk and initially contains only the local node and token set.</li><li>Mappings stored at different nodes are communicated to peers using a <a
href="http://en.wikipedia.org/wiki/Gossip_protocol">Gossip</a> based protocol.</li><li>Eventually consistent view of mappings – each node connects to a randomly chosen peer every second and efficiently reconciles mappings.</li></ul></li><li>Replication – data is duplicated on multiple hosts.<ul><li>Each key is associated with a<br
/> “Preference List” -  a list of nodes to which it is propagated. This consists of N nodes where replicas are stored, and some more nodes for failure handling (see below)</li><li>Also, since number of physical nodes &lt; virtual nodes, the preference list is built to ensure that the list contains only distinct physical nodes (by skipping positions in the hash ring). Since each node is aware of the token ranges handled by its peers (because of mappings getting propagated), this allows each node to forward a key’s Read / Write operation to the correct set of nodes directly</li><li>There are three configurable values:<ul><li>N = number of nodes that store replicas of data</li><li>W = Min no. of nodes that must acknowledge the receipt of a Write before a Write completes.</li><li>R = Min of nodes that are contacted when a data item is to be read</li><li>If R +W &gt; N, then write and read set overlap, giving <a
href="http://en.wikipedia.org/wiki/Quorum_(Distributed_Systems)">Quorum</a>.</li></ul></li><li>Both Read and Write requests are handled by nodes called coordinators. A coordinator node is one of the top nodes in the Preference List for the key. The selection of a coordinator node is done by either using a partition aware client, or by using a load balancer.<ul><li>Writes: The Write (Put) request is sent to the coordinator. The coordinator stores the value locally and then sends it to N highest ranking nodes in the Preference List that are reachable (healthy). If at least W-1 nodes respond with success, the Write is considered successful.</li><li>Reads: A Read (Get) request is received by the coordinator. The coordinator asks for all existing versions of that key from the top N ranking nodes in the Preference List that are reachable. If at least R nodes respond, the Read is considered a success.</li><li>So performance would be governed by the slowest node in a R/W operation.</li></ul></li><li>This model of configurable (N, R, W) values gives applications control on the desired performance, availability, durability and consistency<ul><li>Increasing W would ensure that data is replicated many times increasing durability. However, since W nodes would also be required to have a successful write, availability would reduce as W increases. Performance would also reduce since the chance that one of the nodes is a low thruput node increases. Also increasing W would potentially increase number of divergent versions (see below) creating reconciliation overhead.</li><li>Increasing R would ensure that consistency is high, but again availability would be reduced and performance may also reduce.</li><li>Reducing R would improve read performance. So for a read-intensive, low updates application, (N, R, W) can be set to (3, 1, 3)</li><li>Typical values of (N, R, W) for Amazon Apps are (3, 2, 2)</li></ul></li></ul></li><li>Consistency – Since updates propagate to replicas asynchronously, the model is eventually consistent<ul><li>Uses object versioning (each update results in a new immutable object) using <a
href="http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf">Vector Clocks</a>.</li><li>Consistency protocol: As mentioned above, a successful read requires that at least R nodes respond. Each response includes all versions of the object. If the coordinator node ends up with multiple versions of data, it does the following:<ul><li>Returns all versions it deems to be causally un-related</li><li>Divergent versions are reconciled</li><li>Reconciled version superseding the current version is written back.</li></ul></li></ul></li><li>Handling Temporary Failures:<ul><li>As mentioned above, the Read and Write propagation considers the first N healthy nodes, not the first N nodes in the Preference List.</li><li>However, when a node that is not in the top N nodes (but in the top N Healthy nodes) receives a Write request, it is also given extra metadata to indicate which node was it actually meant for. This is called a Hinted Replica.</li><li>Each node periodically scans its hinted replica set and tries to connect to the original nodes for which the replica was meant.</li><li>If it is able to connect, the node tries to deliver the replica to the original node and if delivered, may delete the replica</li><li>This process is called Hinted Handoff</li></ul></li><li>Handling Permanent Failures:<ul><li>Permanent failures can arise in case a hinted replica becomes unavailable or some other risk</li><li>To reduce chances of permanent failures, nodes sync replicas thru <a
href="http://en.wikipedia.org/wiki/Hash_tree">Merkel Trees</a></li><li>Each node maintains a Merkel tree for the key range hosted by it</li><li>To sync, two nodes exchange the root of the tree common to the key ranges hosted by them, and in case there are any divergences, the apt sync action is taken.</li></ul></li><li>Ring membership:<ul><li>Addition / removal of node from ring is relatively expensive because of the need to rebalance partition assignment</li><li>Therefore addition of a node to a ring and its departure are explicit and not based on non-availability since non-availability can be because of transient conditions (say network)</li><li>Membership propagation and reconciliation happens during the same communication that is used for token mapping propagation and reconciliation (see partitioning above)</li></ul></li><li>Failure Detection:<ul><li>The need to detect failure is to avoid trying to reach unhealthy (unreachable) nodes.</li><li>When a node A fails to reach another node B, it propagates that information thru Gossip. Checks periodically to see if B comes back. Info on arrival of node also propagates using Gossip.</li></ul></li><li>Persistence:<ul><li>Pluggable model &#8211; Berkley DB Transactional store, MySQL, In-memory buffer with persistent backing store.</li><li>Apps choose the kind of storage they need.</li></ul></li></ul><h2>4.2 Amazon S3</h2><ul><li><a
href="http://aws.amazon.com/s3/">http://aws.amazon.com/s3/</a></li><li>Distributed key-object store. Objects are opaque to system.</li><li>Size of an object can range from 1 byte to 5 GB.</li><li>No limits on number of objects – unlimited storage.</li><li>Geo-aware: You decide the Geo where your object is, and the object would stay within Amazon data-centers in that Geo.</li><li>Consistency: Read your writes on new PUTs, Eventual consistency on overwrite PUTs and DELETEs. Depends on Geo. See <a
title="http://aws.amazon.com/s3/faqs/#What_data_consistency_model_does_Amazon_S3_employ" href="http://aws.amazon.com/s3/faqs/#What_data_consistency_model_does_Amazon_S3_employ">http://aws.amazon.com/s3/faqs/#What_data_consistency_model_does_Amazon_S3_employ</a>.</li><li>Query model<ul><li>Iterable lists: <a
title="http://docs.amazonwebservices.com/AmazonS3/2006-03-01/ListingKeys.html" href="http://docs.amazonwebservices.com/AmazonS3/2006-03-01/ListingKeys.html">http://docs.amazonwebservices.com/AmazonS3/2006-03-01/ListingKeys.html</a></li><li>Retrieve objects using HTTP GET – supports chunked downloads. See <a
title="http://docs.amazonwebservices.com/AmazonS3/2006-03-01/UsingGettingObjects.html" href="http://docs.amazonwebservices.com/AmazonS3/2006-03-01/UsingGettingObjects.html">http://docs.amazonwebservices.com/AmazonS3/2006-03-01/UsingGettingObjects.html</a></li></ul></li><li>Should be useful if objects to be stored are large (files, blobs, persisted graphs, etc.)</li></ul><h2>4.3 Project Voldemort</h2><ul><li><a
href="http://project-voldemort.com/">http://project-voldemort.com/</a></li><li>Distributed <a
href="http://project-voldemort.com/javadoc/all/voldemort/store/Store.html">key-value store</a> written in Java. Inspired from Amazon Dynamo.</li><li>Partitioning and replication – same as Dynamo: Consist<br
/> ent Hashing, virtual nodes, preference lists for <a
href="http://project-voldemort.com/javadoc/all/voldemort/routing/RoutingStrategy.html">routing</a>, Gossip based state propagation, etc.</li><li><a
href="http://project-voldemort.com/javadoc/all/voldemort/client/rebalance/RebalanceClient.html">Cluster Rebalancing</a> – add node to cluster, node steals a partition from another node, pull-based gossip to let other nodes know. Donor nodes handles GETs while new node is transferring partitions.</li><li><a
href="http://project-voldemort.com/javadoc/all/voldemort/versioning/package-summary.html">Versioning</a> – Based on Vector Clocks. Same as Dynamo again</li><li>Consistency – <a
href="http://project-voldemort.com/javadoc/all/voldemort/store/routed/ReadRepairer.html">Read-repair</a>. Write inconsistent versions across nodes, detect conflict during reads and fix them</li><li><a
href="http://project-voldemort.com/javadoc/all/voldemort/store/StorageEngine.html">Pluggable storage</a> using <a
href="http://project-voldemort.com/javadoc/all/voldemort/store/bdb/BdbStorageEngine.html">Berkley DB</a>, <a
href="http://project-voldemort.com/javadoc/all/voldemort/store/memory/InMemoryStorageEngine.html">In-memory</a>, <a
href="http://project-voldemort.com/javadoc/all/voldemort/store/mysql/MysqlStorageEngine.html">MySQL</a></li><li><a
href="http://project-voldemort.com/javadoc/all/voldemort/serialization/package-summary.html">Pluggable serialization</a> – supports <a
href="http://project-voldemort.com/javadoc/all/voldemort/serialization/json/package-summary.html">JSON</a>, <a
href="http://project-voldemort.com/javadoc/all/voldemort/serialization/thrift/package-summary.html">Thrift</a>, <a
href="http://project-voldemort.com/javadoc/all/voldemort/serialization/StringSerializer.html">raw strings</a>, <a
href="http://project-voldemort.com/javadoc/all/voldemort/serialization/ObjectSerializer.html">Java</a>, <a
href="http://project-voldemort.com/javadoc/all/voldemort/serialization/protobuf/package-summary.html">Google Protocol Buffers</a></li><li>Offline index rebuilding (using map-reduce, etc.) and then push to live server, transparently swap</li><li>Support for <a
href="http://project-voldemort.com/javadoc/all/voldemort/store/readonly/package-summary.html">read-only nodes</a> (for offline build and swap)</li><li>Pluggable <a
href="http://project-voldemort.com/javadoc/all/voldemort/server/niosocket/package-summary.html">NIO socket</a> implementation</li><li>Support for <a
href="http://project-voldemort.com/javadoc/all/voldemort/store/compress/package-summary.html">data compression</a></li><li><a
href="http://project-voldemort.com/performance.php">Performance</a> – 10 to 20k reqs / sec<ul><li>Running at: LinkedIn</li></ul></li><li>License: <a
href="http://github.com/voldemort/voldemort/blob/master/LICENSE">Apache</a></li></ul><h2>4.4 Redis</h2><ul><li><a
href="http://code.google.com/p/redis/">http://code.google.com/p/redis/</a></li><li>Key-Value store written in ANSI C<ul><li>Values can be of <a
href="http://code.google.com/p/redis/wiki/IntroductionToRedisDataTypes">multiple types</a>: strings, lists, sets, ordered sets</li></ul></li><li>Memory-driven – loads the entire data in RAM, but data can be persisted:<ul><li>Asynchronously saved snapshot – policies: time since last save, updates since last save</li><li><a
href="http://code.google.com/p/redis/wiki/AppendOnlyFileHowto">Append-only mode</a> (commands logged to a file). Three options: on every command, every second, upto OS</li></ul></li><li>Single Process, no threads</li><li>Memcache like <a
href="http://code.google.com/p/redis/wiki/ProtocolSpecification">protocol</a>. Multiple <a
href="http://code.google.com/p/redis/wiki/SupportedLanguages">client libraries</a> across several languages</li><li>Some of the more interesting <a
href="http://code.google.com/p/redis/wiki/CommandReference">Supported operations</a>:<ul><li>Strings: <a
href="http://code.google.com/p/redis/wiki/IncrCommand">Increments and Decrements</a>, <a
href="http://code.google.com/p/redis/wiki/MsetCommand">atomic multi-set</a></li><li>Lists: <a
href="http://code.google.com/p/redis/wiki/RpushCommand">Push</a> and <a
href="http://code.google.com/p/redis/wiki/LpopCommand">Pop</a>, <a
href="http://code.google.com/p/redis/wiki/RpoplpushCommand">atomic R-pop, L-push</a> (safe queues), <a
href="http://code.google.com/p/redis/wiki/LrangeCommand">Ranged Get</a></li><li>Sets: <a
href="http://code.google.com/p/redis/wiki/SinterCommand">Intersection</a>, <a
href="http://code.google.com/p/redis/wiki/SunionCommand">Union</a> and <a
href="http://code.google.com/p/redis/wiki/SdiffCommand">Diff</a>, and their corresponding operations that store the result against a key</li><li>Sorting:<ul><li><a
href="http://code.google.com/p/redis/wiki/SortCommand">Sort</a> operation that can be applied on a list or a set.</li><li>Sets when ordered return Ordered Sets (implemented using hash tables and skip lists). This means O(Log N) writes and O(1) reads. Ordered sets support Range operations.</li></ul></li></ul></li><li><a
href="http://code.google.com/p/redis/wiki/ReplicationHowto">Replication</a>: Master-Slave</li><li>Partitioning: Up to the client library. Currently supported by<ul><li><a
title="http://github.com/nrk/predis/" href="http://github.com/nrk/predis/">http://github.com/nrk/predis/</a> (PHP)</li><li><a
title="http://github.com/jdp/redisent" href="http://github.com/jdp/redisent">http://github.com/jdp/redisent</a> (PHP)</li><li><a
title="http://github.com/ezmobius/redis-rb" href="http://github.com/ezmobius/redis-rb">http://github.com/ezmobius/redis-rb</a> (Ruby)</li><li><a
title="http://github.com/ezmobius/redis-rb" href="http://github.com/ezmobius/redis-rb">http://github.com/ezmobius/redis-rb</a> (Scala)</li></ul></li><li>Performance:<ul><li><a
href="http://code.google.com/p/redis/wiki/Benchmarks">http://code.google.com/p/redis/wiki/Benchmarks</a> reports 110k sets / sec and 81k gets / sec. Lots of other people have reported results as well</li><li>Replication &#8211; <a
href="http://groups.google.com/group/redis-db/browse_thread/thread/3ab1c8b2126f1b8/29bdb6c5973f0388">http://groups.google.com/group/redis-db/browse_thread/thread/3ab1c8b2126f1b8/29bdb6c5973f0388</a> mentions 10 M keys replicated in 12 seconds on EC2</li><li>Connections on Redis are cheap – no auth, etc. just a TCP handshake. <a
href="http://code.google.com/p/redis/wiki/Pipelining">Pipelining</a> makes it even faster</li></ul></li><li>License: New BSD</li><li>In use at: Github, Engine Yard, VideoWiki, and few others</li></ul><h2>4.5 Scalaris</h2><ul><li><a
href="http://code.google.com/p/scalaris/">http://code.google.com/p/scalaris/</a></li><li>Distributed key-value store based on written in Erlang</li><li>No persistence, entirely memory driven. Can use a backend like <a
href="http://code.google.com/p/scalaris/wiki/Tokyocabinet">Tokyo Cabinet</a> for handling database size &gt; RAM + Swap.</li><li>Partitioning – Uses a modified version of <a
href="http://pdos.csail.mit.edu/chord/">Chord</a> called <a
href="http://www.mps.com.mx/CSR/Publications/2006-schuett-gp2pc.pdf">Chord#</a> that<ul><li>Preserves key order (allowing range queries)</li><li>Does routing in node space</li><li>Delivers proven O(Log N) performance (proof in the paper linked above)</li></ul></li><li>Replication – Essential for being fault tolerant since there is no persistence. Uses a peer to peer replication approach called <a
href="http://www.sics.se/~ali/publications/replication.pdf">Symmetric Replication</a> that reads and writes to a majority of nodes participating in replication.</li><li>Consistency: Strong consistency &#8211; uses a <a
href="http://www.zib.de/CSR/Publications/2007-moser-coregrid.pdf">modified version</a> of <a
href="http://en.wikipedia.org/wiki/Paxos_algorithm">Paxos</a> that provides atomic transactions<br
/> : non-blocking writes that execute in parallel with strong consistency.</li><li>Language support: <a
href="http://code.google.com/p/scalaris/wiki/ErlangAPI">Erlang</a>, <a
href="http://code.google.com/p/scalaris/wiki/APIs">Java, JSON RPC</a></li><li>License: Apache 2.0</li><li>Production Use: None mentioned</li></ul><h2>4.6 Others</h2><ul><li>Tokyo Tyrant<ul><li><a
href="http://1978th.net/tokyotyrant/">http://1978th.net/tokyotyrant/</a></li><li>Network interface to Tokyo Cabinet. Acts as a database server</li><li>Written in C. Language bindings: <a
href="http://1978th.net/tokyotyrant/rubydoc/">Ruby</a>, <a
href="http://1978th.net/tokyotyrant/perldoc/">Perl</a>.</li><li>Basically it is a process that embeds Tokyo Cabinet and then provides an implementation for concurrent and remote access to the database.<ul><li>Uses a thread pool using epoll / kqueue</li><li><a
href="http://1978th.net/tokyotyrant/spex.html#protocol">Protocols</a>: Binary over TCP/IP, Memcache, HTTP</li></ul></li><li>Partitioning: Up to client</li></ul></li><li>MemcacheDb<ul><li><a
href="http://memcachedb.org/">http://memcachedb.org/</a></li><li>Key-Value store (uses <a
href="http://en.wikipedia.org/wiki/Berkeley_DB">BerkleyDB</a>)</li><li><a
href="http://memcachedb.googlecode.com/svn/trunk/doc/protocol.txt">Memcache protocol</a></li><li>Master-slave replication</li><li>Supports <a
href="http://memcachedb.googlecode.com/svn/trunk/doc/rget.txt">range queries</a></li></ul></li><li>Chordless<ul><li><a
href="http://sourceforge.net/projects/chordless/">http://sourceforge.net/projects/chordless/</a></li><li>Java implementation of <a
href="http://pdos.csail.mit.edu/chord/">http://pdos.csail.mit.edu/chord/</a></li><li>Provides ACID transactions</li></ul></li><li>MNesia<ul><li><a
href="http://www.erlang.org/doc/apps/mnesia/">http://www.erlang.org/doc/apps/mnesia/</a></li><li>Written in Erlang, part of OTP</li><li>Erlang specific – tables contain instances of Erlang records and records are represented as Erlang tuples</li></ul></li></ul><h1>5. Document Stores</h1><p>Document stores are a step further from key-value stores in the sense that a value associated with a key is a full blown record (document) which is not opaque to the database, but exposes a structure which allows the database to perform some operations on it. For example, the document  can be a JSON serialized object.<br
/> Note that this approach is different from a relational database where the schema is defined upfront in the form of a table. In a document store, each document can have a different schema. Despite this, it is possible to describe relationships between records, just the way you do in relational databases:</p><ul><li>One to Many:<ul><li>Embed key of foreign entity in the document. Then the foreign entity can be retrieved.</li><li>Embed foreign entity itself in the document. Does not scale, but is faster</li></ul></li><li>Many to Many<ul><li>Embed keys of one type of entities in the other type of entity</li><li>Maintain another entity that embeds the keys of the related entities.</li></ul></li></ul><p>As should be obvious – document stores are conceptually very similar to relational databases except that schema definition is not upfront.</p><h2>5.1 Amazon SimpleDB</h2><ul><li><a
href="http://aws.amazon.com/simpledb/">http://aws.amazon.com/simpledb/</a></li><li>Written in Erlang</li><li>Data is <a
href="http://docs.amazonwebservices.com/AmazonSimpleDB/2009-04-15/DeveloperGuide/index.html?DataModel.html">modeled</a> in terms of<ul><li>Domain (a container for holding related data together)</li><li>Item (a entity that goes into a domain)</li><li>Attribute and its Value (a properties of an item)</li></ul></li><li>So basically each item is a dictionary to which you can add / remove attributes (keys) at any time. Related items go into a Domain.</li><li>When an item is added, it is indexed based on its attributes (and the attributes of other items in the domain).</li><li>You can issue <a
href="http://docs.amazonwebservices.com/AmazonSimpleDB/2009-04-15/DeveloperGuide/index.html?UsingSelect.html">SELECT statements</a> much the way you would issue to a relational system, with the concept of a table being replaced by a Domain.</li><li><a
href="http://docs.amazonwebservices.com/AmazonSimpleDB/2009-04-15/DeveloperGuide/index.html?EventualConsistencySummary.html">Eventually consistent</a></li><li>Partitioning: The domain-item-attribute model imposes <a
href="http://docs.amazonwebservices.com/AmazonSimpleDB/2009-04-15/DeveloperGuide/index.html?SDBLimits.html">limits</a> on size of each domain, number of attributes / domain and number of domains, so in order to scale, you need to <a
href="http://docs.amazonwebservices.com/AmazonSimpleDB/2009-04-15/DeveloperGuide/index.html?SDBLimits.html">partition the data manually</a>.</li></ul><h2>5.2 Apache CouchDB</h2><ul><li><a
title="http://couchdb.apache.org/" href="http://couchdb.apache.org/">http://couchdb.apache.org/</a></li><li>Written in Erlang</li><li>Data and Storage Model<ul><li>Records serialized in the form of JSON strings (documents)</li><li>Each document has a unique DocId</li><li><a
href="http://books.couchdb.org/relax/appendix/btrees">Uses B+ Trees</a>: Both for main data as well as for Indexes</li><li><a
href="http://wiki.apache.org/couchdb/Document_revisions">Append only operations</a> (Every update generates a new Sequence Id for the DocId). Committed data is never over-written<ul><li>Every write operation also results in index updates which are also append only, so a new node is added to the B+ Tree which recursively leads to Log(n) updates to the B+ tree.</li><li>During <a
href="http://wiki.apache.org/couchdb/Compaction">Compaction</a> older versions are removed</li></ul></li><li>Append only model means that Reads are done using <a
href="http://portal.acm.org/citation.cfm?id=319998">Multi-Version Concurrency Control</a>. A client can hold on to the older root of the B+ tree and get a consistent view while the data is being modified continuously.</li><li>Writes are serialized, except for blobs which can be concurrently written.</li><li>An update to a document is atomically committed with data and index updates getting synchronously flushed to disk, and the database header is updated.</li><li>When resuming from a crash, no recovery is required (except for recovering the database header). Partial updates are simply truncated.</li></ul></li><li>Views<ul><li>Required to “add structure back to unstructured and semi-structured data.”</li><li>Can have multiple views for the same data</li><li>Defined using a JavaScript function that maps keys to values in a special type of document called a Design Document.</li><li>View functions are run when your query a view: The system takes the source code and runs it against every document in the database. There are basically two kinds of functions:<ul><li>Map function: takes a single parameter – the document – a single document in the system, does some operation on it (can’t modify the document) and returns a result (mapping an input to a result).</li><li>Reduce function: takes a list of key-values and returns a single result after performing some aggregate operation on it (reducing multiple inputs to a single result)</li></ul></li><li>View results are stored in a B+ tree and unless the document changes or the view definition changes, the results are not re-computed and fetched straight from the B+ tree.</li></ul></li><li><a
href="http://wiki.apache.org/couchdb/Replication">Replication</a><ul><li>Master-slave with multi-master support</li><li>Only latest revision replicated</li></ul></li><li>Conflict Handling in Multi-master<br
/> /&gt;</p><ul><li>CouchDb detects conflicts and uses a deterministic algorithm for picking a “winner”. Deterministic means that all replicas end up winning the same version without talking to each other.</li><li>This winner may or may not be the one your app expected. This will have to fixed by the app itself.</li></ul></li><li>Partitioning and Load Balancing &#8211; Provided thru <a
href="http://tilgovi.github.com/couchdb-lounge/">CouchDb Lounge</a><ul><li>Uses consistent hashing to partition the data</li><li>Dumbproxy used for all requests except View requests. Implemented as an Nginx module</li><li>Smartproxy used for View requests only<ul><li>Twisted python based daemon</li><li>Sends view requests to all shards</li><li>Merges responses before sending them back</li><li>Merges happen in constant memory space</li></ul></li></ul></li><li>License: Apache 2.0</li><li><a
href="http://wiki.apache.org/couchdb/CouchDB_in_the_wild">Production Use</a>: BBC, PylonsHQ, GPirate, and several others</li></ul><h2>5.3 Riak</h2><ul><li><a
href="http://riak.basho.com/">http://riak.basho.com/</a></li><li>Written in Erlang. Uses the same <a
href="http://riak.basho.com/arch.html">architecture</a> and algorithms as Amazon Dynamo.</li><li>Client libraries<ul><li>Erlang (JSON) – <a
href="http://riak.basho.com/edoc/jiak_client.html">Jiak Client</a></li><li>Erlang (raw) – <a
href="http://riak.basho.com/edoc/riak_client.html">Riak Client</a></li><li><a
href="http://hg.basho.com/riak/src/tip/client_lib/jiak.py">Python</a>, <a
href="http://hg.basho.com/riak/src/tip/client_lib/jiak.php">PHP</a>, <a
href="http://hg.basho.com/riak/src/tip/client_lib/jiak.rb">Ruby</a>, <a
href="http://riak.basho.com/java_client_api/index.html">Java</a>, <a
href="http://hg.basho.com/riak/src/tip/client_lib/jiak.js">JavaScript</a></li></ul></li><li>Data Model<ul><li>Key-Value. Value is a document.</li></ul></li><li>Riak nodes go on a ordered consistent hash <a
href="http://riak.basho.com/edoc/riak_ring.html">ring</a><ul><li>160-bit binary hash of key-value pair, maps to a position on the ring.</li><li>Ring divided into partitions. Each partition is designated an owner called <a
href="http://riak.basho.com/edoc/riak_vnode.html">v-node</a>. The v-node is said to <a
href="http://riak.basho.com/edoc/riak_claim.html">claim</a> a partition.</li><li>Nodes try and run equal number of v-nodes. So each node is responsible for 1/(number of nodes) or (number of partitions) / (number of v-nodes). So if 2 nodes define a 16-partition cluster, each node would run 8 v-nodes.</li></ul></li><li><a
href="http://riak.basho.com/edoc/riak_put_fsm.html">Writes (Puts)</a><ul><li>Any node may be chosen as coordinator for the Write request</li><li>Coordinator node figures out the v-node responsible for the partition where the key needs to go.</li><li>Coordinator sends Write request to that v-node as well as next N-1 partitions in the ring, where N is a configurable parameter that defines how many copies of the value need to be created.</li><li>Write request may also specify that at least W (=&lt; N) of the v-nodes respond with success, and that DW (=&lt; W) of the W nodes respond with success only after durably storing the state.</li></ul></li><li><a
href="http://riak.basho.com/edoc/riak_get_fsm.html">Reads (Gets)</a><ul><li>Any node may be chosen as a coordinator for the Read request</li><li>Coordinator node figures out the v-node responsible for the partition for the key</li><li>Sends request to v-node, as well as next N-1 nodes</li><li>Request can also specify R (=&lt; N) nodes that must reply before a response is returned.</li></ul></li><li>Ring state propagation<ul><li>Consists of things like arrival and departure of nodes on ring</li><li>Done using Gossip protocol</li></ul></li><li>Eventual Consistency<ul><li>When a object is added it is tagged with a <a
href="http://riak.basho.com/edoc/vclock.html">vector clock</a> specifying its initial version</li><li>For each update, the clock is updated in such a way that the system can compare two versions of an object and determine how the two objects are related (direct descendant, common parents, unrelated)</li><li>To compare object versions, a <a
href="http://riak.basho.com/edoc/merkerl.html">Merkle tree</a> is used</li></ul></li><li>Failure handling is done using the same hinted replicas and hinted hand-offs approach as described in the Dynamo paper.</li><li>Storage model – pluggable:<ul><li><a
href="http://riak.basho.com/edoc/riak_fs_backend.html">File based</a> in a on-disk nested directory structure</li><li>Erlang <a
href="http://riak.basho.com/edoc/riak_gb_trees_backend.html">gb_trees based</a> (<a
href="http://www.erlang.org/doc/man/gb_trees.html">http://www.erlang.org/doc/man/gb_trees.html</a>)</li><li>Volatile <a
href="http://riak.basho.com/edoc/riak_ets_backend.html">ETS tables</a> (hash based data storage and access – see <a
href="http://www.erlang.org/doc/man/ets.html">http://www.erlang.org/doc/man/ets.html</a>)</li><li>On-disk <a
href="http://riak.basho.com/edoc/riak_dets_backend.html">DETS tables</a> (see <a
href="http://www.erlang.org/doc/man/dets.html">http://www.erlang.org/doc/man/dets.html</a>)</li><li><a
href="http://riak.basho.com/edoc/riak_osmos_backend.html">Osmos tables</a> – uses the <a
href="http://code.google.com/p/osmos/">Osmos</a> file storage system. Useful for write-intensive systems.</li></ul></li><li>Queries<ul><li>Map-reduce model</li><li>Map operations run local to the data on the hash ring – can be run in parallel</li><li>Reduce operations take inputs from the preceding phase and reduce (typically perform an aggregate operation) the input. This cannot be parallelized</li></ul></li><li><a
href="http://hg.basho.com/riak/src/tip/LICENSE">License</a>: Apache 2.0</li><li>Production use: No names mentioned, except that the <a
href="http://riak.basho.com/faq.html">FAQ</a> says that “Riak has robustly served the needs of multiple revenue-generating applications for nearly two years now.”</li></ul><h2>5.4 MongoDB</h2><ul><li><a
title="http://www.mongodb.org/" href="http://www.mongodb.org/">http://www.mongodb.org/</a></li><li>Written in C++</li><li>Language bindings (<a
href="http://www.mongodb.org/display/DOCS/Drivers">drivers</a>): C, C++, Java, JavaScript, Perl, PHP, Python, Ruby. Also community provided support for C#, F#, Erlang, Groovy, etc.</li><li>Data Model<ul><li><a
href="http://www.mongodb.org/display/DOCS/Collections">Collections</a> – similar to Domains in Amazon SimpleDb – a bucket to hold together similar documents</li><li>Key-Value, with value being binary serialized JSON (<a
href="http://www.mongodb.org/display/DOCS/BSON">BSON</a>)<ul><li>4 MB limit on a BSON object</li><li>Large object support via <a
href="http://www.mongodb.org/display/DOCS/GridFS+Specification">GridFS</a></li></ul></li><li>B-Trees used for indexes. You specify the fields on which to index using a function called <a
href="http://www.mongodb.org/display/DOCS/Indexes">db.things.ensureIndex()</a> that takes the field as a parameter.</li></ul></li><li>Storage: <a
href="http://www.mongodb.org/display/DOCS/Caching">Uses memory mapped files</a>. So caching is controlled by the OS VMM</li><li>Writes<ul><li>In-place <a
href="http://www.mongodb.org/display/DOCS/Updating">updates</a></li><li>Provides partial updates – you can update the value without sending a full row update.</li><li>Single document <a
href="http://www.mongodb.org/display/DOCS/Atomic+Operations">Atomic updates</a>. Atomic batch updates possible thru <a
href="http://www.mongodb.org/display/DOCS/Server-side+Code+Execution">db.eval()</a> but may not be supported with partitioned data</li></ul><p>&lt;<br
/> br /&gt;</li><li>Queries<ul><li>Provides a <a
href="http://www.mongodb.org/display/DOCS/Dot+Notation+(Reaching+into+Objects)">JSON style based syntax</a> to pick values from inside documents</li><li><a
href="http://www.mongodb.org/display/DOCS/Advanced+Queries">Support</a> for conditional operators, regular expressions, etc.</li><li>Supports forward-only <a
href="http://www.mongodb.org/display/DOCS/Queries+and+Cursors">cursors</a></li><li><a
href="http://www.mongodb.org/display/DOCS/Query+Optimizer">Query optimizer</a> is not cost-based, instead multiple query plans are tried and the one that works best is picked.</li></ul></li><li>Batch Operations<ul><li>Supports <a
href="http://www.mongodb.org/display/DOCS/MapReduce">Map-Reduce</a> over a Collection</li></ul></li><li>Replication:<ul><li><a
href="http://www.mongodb.org/display/DOCS/Master+Slave">Master-slave</a></li><li><a
href="http://www.mongodb.org/display/DOCS/Replica+Pairs">Replica pairs</a> – automatically figure out master and slave roles</li><li><a
href="http://www.mongodb.org/display/DOCS/Master+Master+Replication">Master-Master</a>: Not recommended. Eventually consistent (how?)</li></ul></li><li>Partitioning:<ul><li><a
href="http://www.mongodb.org/display/DOCS/Sharding">Partitioning</a> In <a
href="http://www.mongodb.org/display/DOCS/Sharding+Limits">alpha</a> stage.</li><li>Data sharded by Collection with order preservation</li><li>A shard consists of one or more servers replicating using an enhanced version of replica pairs</li><li>Shards deal with data in units called <a
href="http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-Chunks">Chunks</a>.<ul><li>A chunk is a contiguous range of data from a collection with a max limit of 50 mb. After that a chunk splits into two chunks</li><li>Data <a
href="http://www.mongodb.org/display/DOCS/Shard+Ownership">migrates</a> from one shard to another in chunks (in case of shard having excess data, or having nearby shards)</li></ul></li></ul></li><li>License: <a
href="http://github.com/mongodb/mongo/blob/master/GNU-AGPL-3.0.txt">AGPL 3.0</a></li><li>In <a
href="http://www.mongodb.org/display/DOCS/Production+Deployments">Production use</a> at: Sourceforge, Github, EA, and several others</li></ul><h1>6. Column-Family Stores</h1><h2>6.1 BigTable</h2><ul><li><a
title="http://labs.google.com/papers/bigtable.html" href="http://labs.google.com/papers/bigtable.html">http://labs.google.com/papers/bigtable.html</a></li><li>Internally used at Google. Exposed to general public thru <a
href="http://googleappengine.blogspot.com/2009/02/back-to-future-for-data-storage.html">Google App Engine</a></li><li>Building Blocks<ul><li><a
href="http://labs.google.com/papers/gfs.html">Google File System</a> – a distributed file system. Provides raw storage<ul><li>Files broken into <strong>chunks</strong> (typically 64 mb)</li><li>Each chunk is replicated across 3 machines for safety.<ul><li>A heuristic here is that one of the replicas tries be on the same rack as the original and the other two replicas somewhere else</li></ul></li><li><strong>Chunk Servers</strong> – responsible for storing chunks</li><li>Data transfer happens directly between clients and chunk servers</li><li>Master – responsible for metadata in the file system</li></ul></li><li>The storage file format is called <strong>SSTable </strong>(sorted strings table)<ul><li>Persistent, ordered, immutable map of keys to values with both keys and values being arbitrary byte strings</li><li>Operations to look up the value specified with a key and to iterate over key/value pairs in a given key range</li><li>Optionally, an SSTable can be completely mapped into memory</li><li>Immutability of SSTables:<ul><li>Means no synchronization required during file access.</li><li>Makes removing permanently deleted data a garbage collection exercise (happens in a major compaction – see below)</li></ul></li></ul></li><li>Scheduler<ul><li>Schedules jobs on machines</li><li>Watches for failures and re-schedules if required</li></ul></li><li><a
href="http://labs.google.com/papers/chubby.html">Chubby Service</a> – Distributed lock / file / name manager<ul><li>Coarse grained lock – can store small amount of data in lock</li><li>5 replicas, 1 master. Need majority vote to be active. Uses Paxos to keep replicas in sync</li><li>Provides a namespace that consists of directories and small files. Each file / directory can be used as a lock, and reads and writes to a file are atomic.</li><li>Each chubby client maintains a session with a chubby service. There is a lease time which needs to renewed.</li><li>Clients can register callbacks on Chubby files and directories for notification of changes / session expiration</li></ul></li><li>Optional Services used by BigTable applications (but not used by BigTable itself)<ul><li><a
href="http://labs.google.com/papers/mapreduce.html">Map-Reduce</a>: Batch processing – clients often use this to read / write BigTable data</li><li><a
href="http://labs.google.com/papers/sawzall.html">Sawzall</a>: to support execution of client provided scripts in the server space for transformation, filtering and summarization</li></ul></li></ul></li><li>Data Model<ul><li>Multi-dimensional map (map of maps)</li><li>Data organized into tables. A table is an entity that holds related data<ul><li>Each record is identified by a row key</li><li>Each row has columns. Columns here should be thought of as attributes of a row. Not all rows would have the same columns</li><li>Intersection of a row and a column creates a cell</li><li>A cell has versioned data based on timestamp. So a single cell may holds multiple versions.</li><li>It is convenient to organize attributes. Thus columns are grouped into <strong>column families</strong>. A column family can be thought of as a namespace for a set of related attributes.</li></ul></li><li>So the dimensions of the multi-dimensional map are:<ul><li>Row key: Uniquely identifies a datum (arbitrary byte string)</li><li>Column-family: A grouping of attributes – a namespace for a set of related attributes.</li><li>Column: A single attribute.</li><li>Timestamp: Denoting the time when the value was changed.<ul><li>Different versions are stored in decreasing order of timestamp, so most recent versions can be read first</li><li>Client can specify that only last n-revisions of cell should be kept, or only values written in the last n-days should be kept</li></ul></li></ul></li><li>Transactional access possible to any column within a single row<ul><li>Can perform atomic read-modify-write sequences on data stored under a single row key</li><li>Transactions across row keys not supported</li><li>Batch operations across row keys are possible but are not atomic</li></ul></li></ul></li><li><strong>Tablets</strong><ul><li>Data is sorted lexicographically by Row key</li><li>Row range for a table is dynamically partitioned.</li><li>A row range is called a Tablet. So you can describe a tablet in terms of a start row key and an end row key</li><li>One tablet = 100 to 200 mb<ul><li>Tablets can be split based on tablet size or load</li></ul></li><li>A <strong>tablet server</strong> may end up serving anywhere from 10 to 1000 tablets (typically hundreds).<ul><li>While each tablet consists of contiguous data, a tablet server would pick up tablets spread across the entire data set and not data that is related. This helps allevi<br
/> ate hot spots since load gets spread evenly</li><li>The high ratio of tablets to tablet servers gives very fast recovery since if a machine that was serving 100 tablets fails, 100 machines can each pick up 1 tablet from the failed machine making recovery highly parallelized</li><li>Also gives fine-grained load balancing – you can migrate tablets away from an overloaded machine.</li></ul></li><li>Servers are further grouped into cells<ul><li>A cell is a collection of servers. Most cells have 10-20 servers</li><li>Some cells have as many as thousands of servers managing several hundred TB of data</li></ul></li></ul></li><li>A typical cluster consists of<ul><li>Commodity Intel based PC hardware with each node running<ul><li>Google’s version of Linux</li><li>GFS Chunk server – for raw reads and writes</li><li>Scheduler slave process – runs jobs</li><li>Tablet server – serves BigTable read / write requests</li><li>Some nodes may also run a BigTable master which is responsible for meta-data requests (create a new table) and load monitoring / balancing</li></ul></li><li>GFS Master</li><li>Cluster scheduling master – talks to the scheduler slave process on each node to startup jobs</li><li>Lock service – used for electing BigTable masters and for bootstrapping the tablet lookup (below)</li></ul></li><li>Locating Tablets<ul><li>To figure out which server is serving which tablet (row key range), this mapping info is kept in special metadata tables (which in turn can split into tablets)</li><li>Three level hierarchical lookup with bootstrap provided by Chubby<ul><li>Chubby file points to a root metadata tablet – this tablet is never split</li><li>This tablet points to other metadata tablets</li><li>Each 2nd level metadata tablet points to app data tablets</li><li>Each metadata row is an encoding of tablet’s table id and end row key, and takes approx 1 kb</li><li>With a 128 mb limit on metadata tables, this scheme can address upto 2^34 tablets (or 2^61 bytes in 128 mb tablets)</li></ul></li><li>The lookup is aggressively pre-fetched and cached by clients so most queries end up going straight to servers<ul><li>Empty cache: There are 3 network round trips including one read from Chubby</li><li>Stale cache: could take up to  round trips since stale entries are discovered via misses</li></ul></li></ul></li><li>Tablet Assignment<ul><li>Each tablet assigned to one server at a time. Master keeps track</li><li>When a tablet gets unassigned, the Master asks a Tablet server with available capacity to take over the tablet</li><li>Tablet assignment uses Chubby<ul><li>On starting, a tablet server creates and acquires an exclusive lock on a uniquely named file in a specific Chubby directory called Servers</li><li>Master monitors this directory to discover tablet servers</li><li>If a tablet server loses its exclusive lock it stops serving the tablets</li><li>Tablet server may try and re-acquire the lock if the file still exists, however if the file is gone, the tablet server kills itself</li></ul></li><li>Master asks each tablet server periodically for the status of its lock<ul><li>If tablet server reports a loss, or if master fails to connect to tablet server, the master tries to acquire an exclusive lock on the tablet server’s file</li><li>If master is able to acquire lock, it means Chubby is alive. So master deletes the file so that the tablet server can never serve again, and the master moves all tablets assigned to the tablet server to the set of unassigned tablets</li><li>If master is unable to reach Chubby, it kills itself</li></ul></li></ul></li><li>Writes<ul><li>When a write request goes to a tablet server, it checks it for well formedness, and authorization (Chubby maintains a list of permitted writers)</li><li>Valid mutations are written to a commit log (grouped for small mutations)</li><li>After the write has been committed, its contents are inserted into a sorted in-memory buffer called <strong>memtable</strong>. From here, updates keep going to a sequence of SSTables as they get older</li></ul></li><li>Reads<ul><li>Similar checks as for a write request – well formedness and auth</li><li>Read op executed on a merged view of SSTable sequence and memtable</li></ul></li><li>Tablet Recovery<ul><li>Maybe required if a Tablet server dies</li><li>Tablet server reads its metadata from the MetaData table – list of SSTables and set of redo points which point to commit logs</li><li>Server reads indices of SSTables into memory and reconstructs memtable by applying all updates that have been committed since redo points</li></ul></li><li>Compactions<ul><li>Minor Compactions:<ul><li>As writes occur, memtable grows till a threshold. Is frozen. New memtable created. Frozen memtable converted to SSTable and written to GFS.</li><li>Shrinks memory usage of tablet server</li><li>Reduces amount of data that needs to be read from Commit log during recovery</li></ul></li><li>SSTable Compactions<ul><li>As minor compactions occur, new SSTables get created. This can lead to a Read operation slowing down as it may need to merge data from multiple SSTables</li><li>So periodically a merging compaction runs which reads a few SSTables and the Memtable and writes out a new SSTable</li><li>This does not remove special deletion entries (used to suppress deleted data)</li></ul></li><li>Major Compactions<ul><li>From time to time BigTable runs a compaction that rewrites all SSTables into exactly one table is. This is called a Major compaction</li><li>During this compaction deleted data / deletion entries are removed.</li><li>Required for claiming resources</li></ul></li></ul></li><li>Refinements<ul><li>Locality Groups: Are used to group together parts of data that have similar usage characteristics.<ul><li>Client can group multiple column families into a locality group</li><li>Separate SSTable generated for each locality group</li><li>Locality group can be marked as being in-memory (lazy loaded). BigTable itself uses it for the Location column-family in metadata table.</li></ul></li><li>Compression: Clients can specify whether or not SSTables for a locality group are compressed or not, and which compression algorithm to use</li><li>Caching: Tablet servers use two levels of caching:<ul><li>Scan cache – higher level. Caches key-value pairs returned by SSTable interface to Tablet server code. Useful for apps that tend to read the same data repeatedly</li><li>Block cache – lower level. Caches SSTable blocks reads from GFS. Useful for apps that read data close to data they recently read</li></ul></li></ul></li></ul><h2>6.2 Cassandra</h2><ul><li><a
href="http://incubator.apache.org/cassandra/">http://incubator.apache.org/cassandra/</a></li><li>Combines the distributed architecture of Dynamo with BigTable’s column-family data model</li><li>Written in Java. Thrift based interface. <a
href="http://wiki.apache.org/cassandra/ClientExamples">High-level libraries</a> available on Ruby, Perl, Python, Scala and Java</li><li>Data Model:<ul><li>Multi-dimensional map indexed by a key like BigTable.<ul><li>Each application creates its own key-space (table in BigTable)</li><li>Key – a name for a record. Can be an arbitrarily long string. Indexed by Cassandra</li><li>Column – an attribute of a record. Smallest datum you can get hold of. This is timestamped.</li><li>Column-family: Grouping of columns. Similar to a relational table.</li><li>Super-Colum<br
/> ns: A list of columns</li><li>A column-family can either contain columns or super-columns, but not both. Coumn family containing super-columns is called super-column-family</li><li>The way to get to a piece of data is: KeySpace.ColumnFamily.Key.[SuperColumn].Column</li></ul></li><li>Sorting<ul><li>Data is sorted at write time (unlike relational)</li><li>Columns are sorted within their row by column-name. The sorting provider is pluggable</li></ul></li></ul></li><li>Partitioning: Similar to Dynamo<ul><li>Consistent hashing using an order preserving hash function.</li><li>However uses the <a
href="http://portal.acm.org/citation.cfm?id=383071">Chord</a> approach to load balancing rather than Dynamo’s v-node approach</li></ul></li><li>Replication<ul><li>Same concept of coordinator nodes and preference lists as Dynamo</li><li>Various replication policies: Datacenter aware, Rack-aware, Rack-unaware</li><li>Rack-unaware approach is same as Dynamo (replicate to N-1 nodes)</li><li>Rack aware and Datacenter aware use a more intricate algorithm which involves a leader (chosen using <a
href="http://portal.acm.org/citation.cfm?id=1582716.1582721">Zookeeper</a>) that maintains the N-1 replicas invariant. This is apparently not there in Apache Cassandra.</li></ul></li><li>Membership: Membership based on <a
href="http://portal.acm.org/citation.cfm?id=1529974.1529983">Scuttlebutt</a> – anti-entropy <a
href="http://wiki.apache.org/cassandra/ArchitectureGossip">Gossip</a> based mechanism. Same concept of explicit membership changes as Dynamo</li><li>Failure detection: Uses a modified version of <a
href="http://www.computer.org/portal/web/csdl/doi/10.1109/RELDIS.2004.1353004">Φ Accrual Failure Detector</a> which gives a probabilistic value for each monitored node. This apparently is fairly accurate and more representative of network conditions.</li><li>Failure Handling: Same concept of hinted replicas and hinted hand offs as in Dynamo</li><li>Persistence:<ul><li>Relies on local file system (unlike BigTable that uses GFS)</li><li>Storage structure and approach is similar to BigTable: <a
href="http://wiki.apache.org/cassandra/ArchitectureSSTable">SSTables</a>, Memtables, <a
href="http://wiki.apache.org/cassandra/ArchitectureCommitLog">Commit logs</a>, Compaction, Bloom filters, etc.</li></ul></li><li>Writes<ul><li>Similar to the BigTable approach of writing to commit log for durability and recoverability, followed by update to Memtable</li><li>Dedicated disk on each machine for commit log makes the writes sequential and thus maximizes thruput</li><li>When memtable crosses a threshold (out of space / too many keys / client provided time duration), it gets written to two files:<ul><li>Data file – SSTable</li><li>Index File – key/offset pair</li></ul></li><li>No seeks – always sequential, thus quite fast</li><li>Atomic within Columnfamily</li></ul></li><li>Reads<ul><li>Similar approach as Dynamo for figuring out which nodes will serve, read-repair, etc.</li><li>At a storage level, approach is similar to BigTable: read from memtable + SSTable sequence</li></ul></li><li>License: Apache 2</li><li>In Production at: <a
href="http://www.new.facebook.com/note.php?note_id=24413138919">Facebook</a>, <a
href="http://about.digg.com/blog/looking-future-cassandra">Digg</a>, Rackspace</li></ul> <div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/VineetGupta?a=FBNYJyxLyMg:30jQVdaB8Kw:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=FBNYJyxLyMg:30jQVdaB8Kw:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=FBNYJyxLyMg:30jQVdaB8Kw:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=FBNYJyxLyMg:30jQVdaB8Kw:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=FBNYJyxLyMg:30jQVdaB8Kw:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=FBNYJyxLyMg:30jQVdaB8Kw:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/VineetGupta?i=FBNYJyxLyMg:30jQVdaB8Kw:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/VineetGupta?a=FBNYJyxLyMg:30jQVdaB8Kw:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/VineetGupta?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/VineetGupta/~4/FBNYJyxLyMg" height="1" width="1"/>]]></content:encoded> <wfw:commentRss>http://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape/feed/</wfw:commentRss> <slash:comments>46</slash:comments> <feedburner:origLink>http://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape/</feedburner:origLink></item> </channel> </rss><!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using apc
Page Caching using apc
Database Caching 2/55 queries in 0.120 seconds using apc
Object Caching 693/866 objects using apc

Served from: www.vineetgupta.com @ 2012-02-21 23:09:56 -->

