<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Deepak Bala</title>
	<atom:link href="http://deepakbala.me/feed/" rel="self" type="application/rss+xml" />
	<link>http://deepakbala.me</link>
	<description>Technology + Startups</description>
	<lastBuildDate>Thu, 27 Jun 2013 18:33:12 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>But&#8230; I could have written that in a week !</title>
		<link>http://deepakbala.me/2013/06/27/but-i-could-have-written-that-in-a-week/</link>
		<comments>http://deepakbala.me/2013/06/27/but-i-could-have-written-that-in-a-week/#comments</comments>
		<pubDate>Thu, 27 Jun 2013 17:58:40 +0000</pubDate>
		<dc:creator>Deepak</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[demach]]></category>
		<category><![CDATA[hacker]]></category>
		<category><![CDATA[phonon]]></category>
		<category><![CDATA[startup]]></category>

		<guid isPermaLink="false">http://deepakbala.me/?p=182</guid>
		<description><![CDATA[One of the activities that I undertake as a co-founder of my company is to try to take a conscious effort to &#8220;get out of the building&#8221;. Like the book  &#8217;The Lean Startup&#8217; suggests, no one knows what you are building unless you meet and tell them. I get to meet many interesting entrepreneurs and [...]]]></description>
				<content:encoded><![CDATA[<p>One of the activities that I undertake as a co-founder of my company is to try to take a conscious effort to &#8220;get out of the building&#8221;. Like the book  &#8217;The Lean Startup&#8217; suggests, no one knows what you are building unless you meet and tell them. I get to meet many interesting entrepreneurs and know the products that they make. The same goes for meeting your customers.</p>
<p>The Microsoft Bizspark challenge is one of the events my company participated in. We made it into a <a href="http://www.microsoft.com/india/bizsparkstartupchallenge/finalists.aspx">regional final</a> but didn&#8217;t progress beyond that ( the <strong>why</strong> part of it probably warrants another blog post ). We presented our product and watched many other startups take their turn telling everyone what they had built. I&#8217;d like to talk about one of the companies that presented their product on MS Bizspark. To keep things anonymous, lets call the company that presented their product &#8211; X.</p>
<p>Company X has a simple solution. They take a process that is performed manually today and automate it through their web portal. All that the portal does is upload / download data in some form. Something that took 1-2 weeks can now be done in 2-3 days. They are cash positive. Whenever I talk about a solution like this with a fellow hacker, some of them go &#8211; &#8216;Really !? I could have done that in under a week !&#8217;. They&#8217;re missing the point.</p>
<p>Read other HN posts like the <a href="http://www.trevormckendrick.com/my-first-year-in-the-app-store/">Spanish Bible app</a> that made 70k in revenue and the 7 minute workout app that clocked near 100k downloads and you know that its not about how much time you spent working on a problem or how complicated the problem was. It is how you execute and pull other strings that differentiate you from the crowd.</p>
<p>So Dear fellow hacker &#8211; Just focus on making something that solves a problem and executes on the idea with good design and simplicity. That is a good first step. The envy behind &#8216;I could have written that in X days&#8217; usually gets you nowhere.</p>
<p><a href="https://news.ycombinator.com/item?id=5953727">Discuss on HN</a></p>
]]></content:encoded>
			<wfw:commentRss>http://deepakbala.me/2013/06/27/but-i-could-have-written-that-in-a-week/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dimensions of recommendations &#8211; What&#8217;s missing ?</title>
		<link>http://deepakbala.me/2013/03/31/dimensions-of-recommendations-whats-missing/</link>
		<comments>http://deepakbala.me/2013/03/31/dimensions-of-recommendations-whats-missing/#comments</comments>
		<pubDate>Sun, 31 Mar 2013 17:50:27 +0000</pubDate>
		<dc:creator>Deepak</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[recommendation]]></category>
		<category><![CDATA[social]]></category>

		<guid isPermaLink="false">http://deepakbala.me/?p=163</guid>
		<description><![CDATA[The average internet user is bombarded with recommendations on e-commerce sites. The variables used to make these recommendation are pretty interesting. Amazon provides so many dimensions to a recommendation. This is what my landing page looks like. Book recommendations based on my entire browsing history I&#8217;ve already read one of these books and wrote a [...]]]></description>
				<content:encoded><![CDATA[<p>The average internet user is bombarded with recommendations on e-commerce sites. The variables used to make these recommendation are pretty interesting. Amazon provides so many dimensions to a recommendation. This is what my landing page looks like.</p>
<p><strong>Book recommendations based on my entire browsing history</strong></p>
<p>I&#8217;ve already read one of these books and wrote a review for it, but not on Amazon. There is a good probability that I&#8217;ll buy some of the books recommended to me.</p>
<p><a href="http://deepakbala.me/wp-content/uploads/2013/03/book.png"><img class="aligncenter size-full wp-image-170" alt="book" src="http://deepakbala.me/wp-content/uploads/2013/03/book.png" width="330" height="252" /></a></p>
<p><strong>Spot recommendations based on what you browsed over the last X minutes</strong></p>
<p>I made the mistake of clicking on a link that led to a page containing a 5 kilo nutella jar for sale. Now my landing page is full of nutella products. No, I&#8217;m not buying  nutella anytime soon.</p>
<p><a href="http://deepakbala.me/wp-content/uploads/2013/03/nutella.png"><img class="aligncenter size-full wp-image-169" alt="nutella" src="http://deepakbala.me/wp-content/uploads/2013/03/nutella.png" width="850" height="255" /></a></p>
<p><a href="http://deepakbala.me/wp-content/uploads/2013/03/recent_books.png"><br />
</a><strong>Hit or miss &#8211; Recommending product to me based on inferred preferences. </strong></p>
<p>I&#8217;ve personally found item based recommendations to be more effective (at least for me) to get me to buy something. User profile based recommendations by Amazon have never worked on me and there is a good reason why. We&#8217;ll get to that in a minute. The video games and watches were recommended to me based on my profile ( Amazon probably thought &#8211; likes programming and is male &#8211; lets show video games and watches ).</p>
<p>It&#8217;s probably been a decade since I played a PC game. I rarely buy watches.</p>
<p><a href="http://deepakbala.me/wp-content/uploads/2013/03/browse_his.png"><br />
</a> <a href="http://deepakbala.me/wp-content/uploads/2013/03/games.png"><img class="aligncenter size-full wp-image-166" alt="games" src="http://deepakbala.me/wp-content/uploads/2013/03/games.png" width="272" height="323" /></a> <a href="http://deepakbala.me/wp-content/uploads/2013/03/watch.png"><img class="aligncenter size-full wp-image-165" alt="watch" src="http://deepakbala.me/wp-content/uploads/2013/03/watch.png" width="258" height="225" /></a></p>
<p><strong>If all else fail, show the user something hip that drive sales for your store</strong></p>
<p><a href="http://deepakbala.me/wp-content/uploads/2013/03/customer.png"><img class="aligncenter size-full wp-image-164" alt="customer" src="http://deepakbala.me/wp-content/uploads/2013/03/customer.png" width="673" height="291" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>What is this recommendation page missing ? &#8211; Personalized recommendations based on what your friends / colleagues are doing. You see, amazon does not know enough about me to stop up-selling PC video games and watches. Unlike Google / Facebook, the engine does not have enough social context to make user recommendations. The reason why the acquisition of <a href="http://arstechnica.com/business/2013/03/in-further-effort-to-conquer-the-literary-world-amazon-buys-goodreads/" target="_blank">GoodReads</a> is a major win for Amazon is the additional social context it gets to make personalized recommendations (at least for books). With the integration of GoodReads on Kindle, the combo is pretty powerful. This adds a new dimension to recommendations.</p>
<p>Looking at the bigger picture, products like Wanelo / Fancy encourage crowd curated niche marketplaces that are pretty social. If you can successfully build an engaged social audience for a popular e-commerce category (fashion / books / jewellery ), it may well interest Amazon.  The last time I checked, MyHabit (Amazon&#8217;s fashion site) does not use anything other than your gender and location to personalize content. These are good opportunities to fill that gap.</p>
<p><a href="https://news.ycombinator.com/item?id=5469674" target="_blank">Discuss on hacker news</a>.</p>
<p style="text-align: left;">Like this post ? Follow me on twitter.</p>
<p><a class="twitter-follow-button" href="https://twitter.com/deepakbsub" data-show-count="false">Follow @deepakbsub</a></p>
]]></content:encoded>
			<wfw:commentRss>http://deepakbala.me/2013/03/31/dimensions-of-recommendations-whats-missing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The value of ideas</title>
		<link>http://deepakbala.me/2013/03/07/the-value-of-ideas/</link>
		<comments>http://deepakbala.me/2013/03/07/the-value-of-ideas/#comments</comments>
		<pubDate>Thu, 07 Mar 2013 12:34:48 +0000</pubDate>
		<dc:creator>Deepak</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[idea]]></category>

		<guid isPermaLink="false">http://deepakbala.me/?p=151</guid>
		<description><![CDATA[It&#8217;s 1 AM and I get a SMS on my phone asking if I&#8217;m awake. It&#8217;s from a friend and he wants to discuss something with me. After sharing the usual pleasantries he asks me &#8211; &#8216;Do you know how to integrate a Google map on an Android app ?&#8217;. I didn&#8217;t, but seeing as it was a trivial use case [...]]]></description>
				<content:encoded><![CDATA[<p>It&#8217;s 1 AM and I get a SMS on my phone asking if I&#8217;m awake. It&#8217;s from a friend and he wants to discuss something with me. After sharing the usual pleasantries he asks me &#8211; &#8216;Do you know how to integrate a Google map on an Android app ?&#8217;. I didn&#8217;t, but seeing as it was a trivial use case I told him I should be able to do it within the hour. The conversation started to get interesting at this point. I asked him what this was about and he was reluctant to answer. It was clear that he had an idea but didn&#8217;t want to share what it was.</p>
<p>After much prodding he finally told me what the idea was and concluded his sentence with &#8216;.. yes and this is <strong>my</strong> idea btw&#8217;. That obviously meant he was going to try to kill me if I implement it without his knowledge <img src='http://deepakbala.me/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   His proposition was that I write the code that implements the idea and give it to him. He wanted to keep his idea a secret.</p>
<p>My friend is not alone. He does not have a background in programming but has one in management. This idea sprouted in his head a while back and he kept it locked away in the hopes that he can pay a programmer to get it implemented some day. There is just one problem. His idea alone is worth zilch / Nada / Nil. The minute you have an idea, 10 other people probably had the same one. Building products for companies before and also nurturing them on my own from scratch have taught me that if you have an idea, it&#8217;s closest representation in the physical world would probably look like this.</p>
<p><a href="http://deepakbala.me/wp-content/uploads/2013/03/udiamond.jpg"><img class="aligncenter size-full wp-image-152" alt="" src="http://deepakbala.me/wp-content/uploads/2013/03/udiamond.jpg" width="299" height="364" /></a></p>
<p>That is an uncut diamond. In its raw form it has great potential. A diamond cutter with skill can fashion a diamond that can sell for thousands of dollars. He can also botch it up into a worthless rock. Ideas are like that. They need refinement / cutting / polishing before they can turn into something useful. Executing and refining the idea is infinitely more difficult than coming up with one. You never make money just because you have this brilliant idea that no one else thought of. You make money because you executed your idea better than someone else. Case in point &#8211; Google and Facebook. There were search engines before Google. There were social networks before Facebook (Heck Google created <a href="http://en.wikipedia.org/wiki/Orkut" target="_blank">Orkut</a> at the same time).</p>
<p>As for my friend&#8217;s idea&#8230;</p>
<ul>
<li>It was a location based app which was already implemented (Not that competition is a bad thing).</li>
<li>His ideas had many holes in them which we discovered while discussing it. The holes would have remained hidden had he kept from talking about it.</li>
</ul>
<p><strong> PS:</strong> Yes I&#8217;m aware that uncut diamonds still have value but that is the closest analogy I could come up with <img src='http://deepakbala.me/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p style="text-align: left;">Like this post ? Follow me on twitter.</p>
<p><a class="twitter-follow-button" href="https://twitter.com/deepakbsub" data-show-count="false">Follow @deepakbsub</a></p>
]]></content:encoded>
			<wfw:commentRss>http://deepakbala.me/2013/03/07/the-value-of-ideas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pragmatic tips and tricks for the Android programmer</title>
		<link>http://deepakbala.me/2013/02/27/pragmatic-tips-and-tricks-for-the-android-programmer/</link>
		<comments>http://deepakbala.me/2013/02/27/pragmatic-tips-and-tricks-for-the-android-programmer/#comments</comments>
		<pubDate>Wed, 27 Feb 2013 06:53:45 +0000</pubDate>
		<dc:creator>Deepak</dc:creator>
				<category><![CDATA[android]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://deepakbala.me/?p=128</guid>
		<description><![CDATA[A few days back I deployed a couple of Android apps to the play store. Here are some things I wish I knew before deploying to the store. They could have saved me several hours in productivity. Test your app on the phone and not the emulator Time &#8211; The emulator is slow. It takes [...]]]></description>
				<content:encoded><![CDATA[<p>A few days back I deployed a couple of Android apps to the play store. Here are some things I wish I knew before deploying to the store. They could have saved me several hours in productivity.</p>
<h2>Test your app on the phone and not the emulator</h2>
<ul>
<li><strong>Time</strong> &#8211; The emulator is slow. It takes up to 2-5 minutes to load the app on the first run and it can take 40+ seconds to load it from a snapshot.</li>
<li><strong>Resources</strong> &#8211; The emulator is bulky and can slow your computer down. Not what you want when in any situation.</li>
<li><strong>Hardware</strong> &#8211; Features such as recording audio need access to hardware on the phone. While there are options to enable certain hardware access on the emulator,  it never works consistently. Some of our code would work seamlessly on the phone but fail on the emulator.</li>
</ul>
<p>There are solutions out there that attempt to provide better / faster emulation. Out of personal experience I&#8217;ve found that nothing beats the real thing.</p>
<h1>Parsing data structures ? Watch out</h1>
<h2>Memory</h2>
<p><span style="font-size: 16px;">It is common to come across this code snippet while parsing certain data structures (XML / JSON etc)</span></p><pre class="crayon-plain-tag">BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) 
{
    sb.append(line + &quot;\n&quot;);
}</pre><p>&nbsp;</p>
<p>While this would work in most cases, there are phones out there whose memory limit cannot handle large chunks of strings. Even when data structures are reasonably short, the application can force close. Use a <a href="http://developer.android.com/reference/android/util/JsonReader.html" target="_blank">JsonReader</a>. It parses the json in stream and occupies less memory.</p>
<h1>GSON</h1>
<h3>Packages</h3>
<p>Have you packaged gson with your build ? Change the <a href="https://sites.google.com/site/gson/gson-on-android" target="_blank">package name of the gson jar</a> to ensure it parses structures correctly at runtime. Some HTC phones have old versions of gson pre-packaged in them. Parsing json on these phones will fail.</p>
<h3>Proguard</h3>
<p>Gson and proguard do not work well with each other unless you follow the <a href="https://code.google.com/p/google-gson/source/browse/trunk/examples/android-proguard-example/proguard.cfg?r=878" target="_blank">recommended rules set out in the google-gson project</a>.</p>
<h1>Exception handling</h1>
<p>Handle exceptions aggressively. No one likes to see an &#8216;Application force closed&#8217; message. Use frameworks like <a href="http://code.google.com/p/acra/" target="_blank">ACRA</a> to report error back to you. Fix them quickly. Whatever you do, do not use a google spreadsheet for bugs reported by ACRA. The spreadsheet will eventually stop you from exporting data in any format. ACRA has also been contacted by Google to<a href="https://github.com/ACRA/acra/wiki/Notice-on-Google-Form-Spreadsheet-usage" target="_blank"> stop using spreadsheets</a> as the default method to report bugs. Use another provider instead or write your own.</p>
<h1>Debugging</h1>
<h2>Service</h2>
<p>Services cannot be debugged by default. You must explicitly tell the device that you wish to connect to the debugger <a href="http://stackoverflow.com/questions/4008081/debugging-a-service" target="_blank">by writing a line of code</a>. I lost an hour on this one alone !</p>
<h2>Memory leaks</h2>
<p>Attaching a debugger to your program can make it seem like there are memory leaks. Quoting the <a href="http://developer.android.com/tools/debugging/index.html" target="_blank">android documentation</a> on this</p>
<blockquote><p>The debugger and garbage collector are currently loosely integrated. The VM guarantees that any object the debugger is aware of is not garbage collected until after the debugger disconnects. This can result in a buildup of objects over time while the debugger is connected. For example, if the debugger sees a running thread, the associated <code><a href="http://developer.android.com/reference/java/lang/Thread.html">Thread</a></code> object is not garbage collected even after the thread terminates.</p></blockquote>
<p>So if you ever wondered why your program crashes eventually when you try to debug it but runs fine when it is run in stand-alone mode, this could be the reason.</p>
<h2>Async task and the UI thread</h2>
<p>Async tasks should be instantiated from a UI thread. Quoting the Android documentation.</p>
<blockquote><p>The AsyncTask class must be loaded on the UI thread. This is done automatically as of <code><a href="http://developer.android.com/reference/android/os/Build.VERSION_CODES.html#JELLY_BEAN">JELLY_BEAN</a></code>.</p></blockquote>
<p>Failure to do this will result in <a href="http://stackoverflow.com/questions/4280330/onpostexecute-not-being-called-in-asynctask-handler-runtime-exception" target="_blank">weird runtime errors</a>. Even if you instantiate the first Async task incorrectly, it can affect the others.</p>
<h1>Prepare for release</h1>
<h2>Dry run</h2>
<p>If this is the first time you are releasing to the android store, release a simple app before releasing the one you were working seriously on. The dry run will give you a fair idea of what the process is and what to expect.</p>
<p>Android already offers <a href="http://support.google.com/googleplay/android-developer/answer/113469?hl=en" target="_blank">guidelines on what to expect</a> when you upload your app. There is also a separate guideline on <a href="http://support.google.com/googleplay/android-developer/answer/1078870" target="_blank">graphical assets</a> associated with your project. Take a look at the <a href="http://developer.android.com/tools/publishing/preparing.html" target="_blank">preparing for release guide</a> before you take the plunge.</p>
<h1>Shameless plug</h1>
<p>You can find these tips implemented on my voice messaging app &#8211; <a href="http://goo.gl/FfhYj" target="_blank">PhonOn</a>. Check it out on the play store.</p>
<p style="text-align: center;"><a href="http://goo.gl/FfhYj"><img class="aligncenter size-full wp-image-134" alt="PhonOn on Android" src="http://deepakbala.me/wp-content/uploads/2013/02/android.png" width="149" height="53" /></a></p>
<p style="text-align: center;">
<p style="text-align: left;">Like this post ? Follow me on twitter</p>
<p><a class="twitter-follow-button" href="https://twitter.com/deepakbsub" data-show-count="false">Follow @deepakbsub</a></p>
]]></content:encoded>
			<wfw:commentRss>http://deepakbala.me/2013/02/27/pragmatic-tips-and-tricks-for-the-android-programmer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deploying Riak on ec2</title>
		<link>http://deepakbala.me/2013/02/08/deploying-riak-on-ec2/</link>
		<comments>http://deepakbala.me/2013/02/08/deploying-riak-on-ec2/#comments</comments>
		<pubDate>Fri, 08 Feb 2013 11:04:43 +0000</pubDate>
		<dc:creator>Deepak</dc:creator>
				<category><![CDATA[riak]]></category>
		<category><![CDATA[deployment]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://deepakbala.me/?p=123</guid>
		<description><![CDATA[I&#8217;ve been playing around with Riak for a while now and have deployed it to a couple of production environments. I&#8217;ll share my experiences here on ec2 deployments since there are some gray areas to cover at the time of writing this. I assume you already know what Riak is and have already worked with [...]]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve been playing around with Riak for a while now and have deployed it to a couple of production environments. I&#8217;ll share my experiences here on ec2 deployments since there are some gray areas to cover at the time of writing this. I assume you already know what Riak is and have already worked with it. The same goes for EC2.</p>
<h2>Deployment strategy</h2>
<p>There are a couple of problems to field when deploying Riak.</p>
<p>1. The EC2 instances that are provisioned by default change the following on restart.</p>
<ul>
<li>Private IP address</li>
<li>Public IP address</li>
<li>Private DNS</li>
<li>Public DNS</li>
</ul>
<p>2. EBS instances provide stable durable storage while Ephemeral storage provides for better predictable performance at the cost of losing data on restarts.</p>
<p>3. Performance.</p>
<h2>IP Address change</h2>
<p>Riak does not like change in IP addresses. You have a couple of solutions.</p>
<p>1. Listen on 0.0.0.0 to accept any connection.</p>
<p>2. Configure a host on /etc/hosts with a valid FQDN. For example on a 3 node cluster you would have beam listening in on riak1.something.com ; riak2.something.com ; riak3.something.com. When IP addresses change you must change the IPs that these FQDNs point to.</p>
<p>3. Same as #2, except you can use a private DNS server or Route 53 to manage the mapping.</p>
<p>4. Assign a public static EIP to your node.</p>
<p>5. Deploy to Amazon VPC where IP addresses do not change.</p>
<h3>What to pick ?</h3>
<p>#1 does not change the fact that beam instances need to find other Riak nodes in the cluster. You are still stuck trying to get Riak to find other nodes correctly.</p>
<p>#2 is a pain in terms of maintenance. The time taken to boot back a cluster is proportional to the number of nodes in the cluster even if you have automated scripts to change IPs.</p>
<p>#3 Same as #2. An over-engineered solution.</p>
<p>#4 You can assign at most 5 EIPs per account and that will cost you. It is also not a good idea to expose Riak to the internet with a static IP.</p>
<p><strong>#5</strong> I&#8217;d recommend this. Your IP address never changes on a private VPC which makes configuration and cluster management all the much easier. The nodes cannot reach the internet without a NAT instance. The con is that it takes some work to understand a VPC environment and set it up. Trust me though, it is worth the effort.</p>
<h2>Storage</h2>
<p>There is a post on the <a href="http://riak-users.197444.n3.nabble.com/EC2-and-RIAK-td2754409.html">riak-users list</a> that does justice to discussing points about storage and which one you should use. There is no single answer to this really. Your storage requirements vary widely based on the backend you choose to run Riak on. Bitcask stores all keys in memory while LevelDB stores x % of the keys in memory.</p>
<p>One thumb rule to guide you &#8211; if you can sleep at night knowing that you have reliable backups and can survive disk failures then you can go with ephemeral storage. EBS provides varying levels of performance based on load. However the chances that you will lose data are much lower. Remember that Riak nodes are meant to recover from failures anyway. Choose what storage option seems best for you.</p>
<h2>Performance</h2>
<p>I would recommend that you execute 2 performance tests.</p>
<p>1. Use <a href="http://docs.basho.com/riak/latest/cookbooks/Benchmarking/">Basho bench</a>.</p>
<p>2. Use a custom load test case of your own.</p>
<h3>Why ?</h3>
<p>Primarily because the needs of your load will vary from what basho bench can produce. Don&#8217;t get me wrong. I think Basho Bench is awesome. But there are cases where it will not do. Consider a M/R query that is executed at 50 Ops/s. By default the number of VMs allocated to handle M/Rs cannot handle this load. This failure will not show up on basho bench since we are loading the DB with GET / PUT / DELETE and not using complex features like M/R ; linking ; link-walking ; full-text query ; index query.</p>
<p>Whether you use these additional features on your implementation is up to you. Just keep in mind that load test results vary depending on how the database is used.</p>
<h3>Load and its characteristics</h3>
<p>Your performance curve will vary drastically depending on the load you generate. If your requests are write heavy and the write_buffer configured is insufficient, you will witness horrible performance from good machines. The number of vnodes per instance also determine how much runway you have. Choose these carefully.</p>
<h2>Best practices</h2>
<p>Basho has a good deal of advice on its site about <a href="http://docs.basho.com/riak/latest/cookbooks/Performance-Tuning-AWS/">tuning performance</a>. Take a look at the <a href="http://basho.com/resources/videos/">riak operations</a> webinar to gain some additional insights on deployment. Feel free to drop an email on the <a href="http://riak-users.197444.n3.nabble.com/">riak users</a> list too should you need more information. Someone always replies with good advice.</p>
]]></content:encoded>
			<wfw:commentRss>http://deepakbala.me/2013/02/08/deploying-riak-on-ec2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Android speech bubbles</title>
		<link>http://deepakbala.me/2013/01/18/android-speech-bubbles/</link>
		<comments>http://deepakbala.me/2013/01/18/android-speech-bubbles/#comments</comments>
		<pubDate>Fri, 18 Jan 2013 05:18:25 +0000</pubDate>
		<dc:creator>Deepak</dc:creator>
				<category><![CDATA[android]]></category>
		<category><![CDATA[bubble]]></category>
		<category><![CDATA[java]]></category>

		<guid isPermaLink="false">http://deepakbala.me/?p=108</guid>
		<description><![CDATA[Android&#8217;s 9 patch drawable is a useful technique to stretch images. The documentation page for this feature could really use some work. Not that it is insufficient, but it is difficult to understand. The tool to create the 9 patch drawable looks pretty scary too. Put in simple terms, the 9 patch drawable allows you [...]]]></description>
				<content:encoded><![CDATA[<p>Android&#8217;s 9 patch drawable is a useful technique to stretch images. The <a title="9 patch" href="http://developer.android.com/tools/help/draw9patch.html">documentation page</a> for this feature could really use some work. Not that it is insufficient, but it is difficult to understand. The tool to create the 9 patch drawable looks pretty scary too. Put in simple terms, the 9 patch drawable allows you to define an image that can be stretched beyond its original height and width. This is useful in several scenarios. Like say speech bubbles.</p>
<p>To start with create an image for your speech bubble.</p>
<h2>Image &#8211; Original dimensions</h2>
<p><a href="http://deepakbala.me/wp-content/uploads/2013/01/bubble_bl.png"><img class="alignleft size-full wp-image-110" title="bubble_bl" src="http://deepakbala.me/wp-content/uploads/2013/01/bubble_bl.png" alt="" width="43" height="27" /></a></p>
<p>&nbsp;</p>
<p>This speech bubble is so small that any text that is placed inside it will cause the image to expand and become distorted. Lets try setting this image as the background for a cell on a list view.</p>
<h2>Stretched Image</h2>
<p><a href="http://deepakbala.me/wp-content/uploads/2013/01/distort_bubble.png"><img class="alignleft size-full wp-image-111" title="distort_bubble" src="http://deepakbala.me/wp-content/uploads/2013/01/distort_bubble.png" alt="" width="224" height="356" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>The result is not pretty. To get around the problem, the image must define boundaries that will tell the Android framework which parts of the image can be stretched. You can use the <a href="http://developer.android.com/tools/help/draw9patch.html">draw9patch</a> tool to set these boundaries or use a custom tool like the <a href="http://www.roundrect.kr/en/application/a-better-9-patch-tool">better-9-patch</a> tool (which I find easier to use).</p>
<p>So how do you use a 9 patch tool to tell android which areas of the image can be stretched ?  You define that by tweaking the boundaries on the tool until you get a defined area within the image that is confined by a rectangle.</p>
<h2>Define stretchable area</h2>
<p><a href="http://deepakbala.me/wp-content/uploads/2013/01/stretch.png"><img class="alignleft size-full wp-image-113" title="stretch" src="http://deepakbala.me/wp-content/uploads/2013/01/stretch.png" alt="" width="779" height="408" /></a></p>
<p><a href="http://deepakbala.me/wp-content/uploads/2013/01/bubble_bl.9.png"><br />
</a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>When you save the image you get a new image with marked boundaries. The name of the image should end with &#8220;.9&#8243; (say bubble.9.png). This tells android that it is a 9 patch drawable.</p>
<h2>Image with boundaries</h2>
<p><a href="http://deepakbala.me/wp-content/uploads/2013/01/bubble_bl.9.png"><img title="bubble_bl.9" src="http://deepakbala.me/wp-content/uploads/2013/01/bubble_bl.9.png" alt="" width="45" height="29" /></a></p>
<p>And that is it ! Pop this image into your &#8216;drawable&#8217; folder and you can start using it. Here is a project that uses this image. If you&#8217;d like to look at some source-code, this project is available on <a href="https://github.com/deepak-bala/SpeechBubble">github</a>. You can also install the app on your phone by visiting the <a href="https://play.google.com/store/apps/details?id=com.demach.bubblespeechexample&amp;feature=search_result">play store</a>, where the app has been deployed.</p>
<p><a href="http://deepakbala.me/wp-content/uploads/2013/01/final_image_speech_bubble.png"><img class="alignleft size-full wp-image-114" title="final_image_speech_bubble" src="http://deepakbala.me/wp-content/uploads/2013/01/final_image_speech_bubble.png" alt="" width="495" height="490" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://deepakbala.me/2013/01/18/android-speech-bubbles/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Desktop and smart phone apps</title>
		<link>http://deepakbala.me/2012/12/28/desktop-and-smart-phone-apps/</link>
		<comments>http://deepakbala.me/2012/12/28/desktop-and-smart-phone-apps/#comments</comments>
		<pubDate>Fri, 28 Dec 2012 21:58:23 +0000</pubDate>
		<dc:creator>Deepak</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[android]]></category>
		<category><![CDATA[apps]]></category>
		<category><![CDATA[desktop]]></category>
		<category><![CDATA[iphone]]></category>

		<guid isPermaLink="false">http://deepakbala.me/?p=103</guid>
		<description><![CDATA[It has occurred to me lately that the developer community is shifting back to writing desktop style apps. I&#8217;m not referring to an app written for the PC. When you think about it apps written for the iPhone / Android share common traits with desktop apps. 1. You must download and install a bundle. 2. Bug fixes [...]]]></description>
				<content:encoded><![CDATA[<p>It has occurred to me lately that the developer community is shifting back to writing desktop style apps. I&#8217;m not referring to an app written for the PC. When you think about it apps written for the iPhone / Android share common traits with desktop apps.</p>
<p>1. You must download and install a bundle.</p>
<p>2. Bug fixes require you to download a new version.</p>
<p>3. Getting users to upgrade to version X is cumbersome.</p>
<p>4. &#8216;Upgrade&#8217; code is necessary between versions ( Like the onUpgrade() call to SQLiteOpenHelper that handles database changes)</p>
<p>and so on&#8230;</p>
<p>You can use HTML 5 and create a web version of your app but you would miss out on the features exposed by native code (widgets etc). Features that require controlling hardware or accessing them can get tricky. HTML is not the ideal markup you would want to use to communicate a view to a rich touch device like a phone / tablet anyway.</p>
<p>We are stuck with this fate for the foreseeable future. I&#8217;m not sure what would make a good solution either. You could create a markup language written in a well formed document that can be sent from a server to a rich client device, and the device would interpret the markup and draw a screen using the native UI. It would not matter whether the UI should be rendered on Android / iPhone / BlackBerry. A &#8220;spinner&#8221; element for example can be rendered equally well on all devices. But I seriously doubt such markups would come into existence. You&#8217;d need several vendors to implement the UI consistently and adhere to standards. We have enough trouble getting HTML and CSS to work consistently across browsers today <img src='http://deepakbala.me/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://deepakbala.me/2012/12/28/desktop-and-smart-phone-apps/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>5 things your grandchildren will probably say in the future</title>
		<link>http://deepakbala.me/2012/11/02/5-things-your-grandchildren-will-probably-say-in-the-future/</link>
		<comments>http://deepakbala.me/2012/11/02/5-things-your-grandchildren-will-probably-say-in-the-future/#comments</comments>
		<pubDate>Fri, 02 Nov 2012 19:23:27 +0000</pubDate>
		<dc:creator>Deepak</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://deepakbala.me/?p=84</guid>
		<description><![CDATA[On a lazy Friday my co-founder @vignesh_giri decided to challenge me to a race. No we don&#8217;t own sports cars. While not slogging over our startup &#8211; Phonon we take stress relieving breaks by playing Need for speed &#8211; Porsche unleashed. It is a relatively old game and I still have the CD. While rummaging through my old [...]]]></description>
				<content:encoded><![CDATA[<p>On a lazy Friday my co-founder <a href="https://twitter.com/vignesh_giri">@vignesh_giri</a> decided to challenge me to a race. No we don&#8217;t own sports cars. While not slogging over our startup &#8211; <a href="http://phononapp.com/">Phonon</a> we take stress relieving breaks by playing Need for speed &#8211; <a href="http://en.wikipedia.org/wiki/Need_for_Speed:_Porsche_Unleashed">Porsche unleashed</a>. It is a relatively old game and I still have the CD. While rummaging through my old pile of junk to retrieve the CD, I stumbled upon some gems. They helped remind me of the drastic change in technology and product distribution strategy over the last 2 decades.</p>
<h2>A Microsoft Home CD</h2>
<p><a href="http://deepakbala.me/wp-content/uploads/2012/11/ms_home.png"><img class="alignleft size-full wp-image-87" title="ms_home" src="http://deepakbala.me/wp-content/uploads/2012/11/ms_home.png" alt="" width="314" height="271" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>MS used to sell these to encourage folks to learn about the ocean. This is just one product under the <a href="http://en.wikipedia.org/wiki/Microsoft_Home">Microsoft Home</a> family. This was before Google , wikipedia became popular. Heck this was even before the internet became popular (the year &#8211; 1993). They cost 100$ a pop <img src='http://deepakbala.me/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h2> A floppy drive and some floppies</h2>
<p><a href="http://deepakbala.me/wp-content/uploads/2012/11/fl_drive.png"><img class="alignleft size-full wp-image-88" title="fl_drive" src="http://deepakbala.me/wp-content/uploads/2012/11/fl_drive.png" alt="" width="504" height="476" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><a href="http://deepakbala.me/wp-content/uploads/2012/11/fl_disk.png"><img class="alignleft size-medium wp-image-89" title="fl_disk" src="http://deepakbala.me/wp-content/uploads/2012/11/fl_disk-300x216.png" alt="" width="300" height="216" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>Oh the excitement of splitting a 5 MB file into several 1.44 MB disks only to later be told that there was a checksum error on disk 2. No one born after 2000 can relate to that pain. Even installation disks were distributed this way. &#8211; &#8216;Please insert disk 3 to continue installing blah&#8217;</p>
<h2>A 512 MB hard disk (placed near a pencil for scale)</h2>
<p><a href="http://deepakbala.me/wp-content/uploads/2012/11/hard_disk.png"><img class="alignleft size-full wp-image-90" title="hard_disk" src="http://deepakbala.me/wp-content/uploads/2012/11/hard_disk.png" alt="" width="424" height="397" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>This thing weighs in at ~7 kilos. Who needs gym equipment when you have hard disks from the 1990s ? When I showed this to my 10 year old nephew, he said &#8216;Whats that ?&#8217; . I told him it was a hard disk and it can hold 512 MB of memory. He refused to believe me.</p>
<p>I think I&#8217;ll ask Seagate to replace it someday. The seal is still unbroken so I guess that means its still covered under a warranty :p</p>
<h2>A shareware CD</h2>
<p><a href="http://deepakbala.me/wp-content/uploads/2012/11/cd.png"><img class="alignleft size-full wp-image-91" title="cd" src="http://deepakbala.me/wp-content/uploads/2012/11/cd.png" alt="" width="289" height="296" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>During the early days of the internet, download speeds sucked (33 kbps modem &#8211; You know, the ones that make a weird sound when they connect to the internet. Yes, no one from this generation will hear that either). CDs like these were a popular way to distribute a shareware version of your product. This one came with a car magazine and &#8216;Need for Speed&#8217; was one of the bundled games. Wallpapers and screensavers were the rage back then.</p>
<h2>What does this have to do with your grandchildren ?</h2>
<p>Just like we are nostalgic about these devices / distribution methods, our grandchildren will experience something similar. Here are 5 thing I expect them to say.</p>
<ol>
<li>Hey remember when you had to drive your own car ?  (This will be a sad day)</li>
<li>Here&#8217;s a picture of my granddad working at his office. Oh that giant box near him ? That was called a PC.</li>
<li>This is an old TV my grandmother used to have. You actually had to wait for a program to air at a particular time to watch it ! How did people ever tolerate that ?</li>
<li>That&#8217;s a picture of my dog frisky. No he is not trying to eat paper. Folks used to predominantly read news from paper back in the day.</li>
<li>Email ? Whats that ?</li>
</ol>
<p>I sincerely hope someone takes #5 by the horns and succeeds. It is a behemoth which if you solve successfully, can change the way tons of people communicate everyday. I think it is a matter of &#8216;when&#8217; and not &#8216;if&#8217;.</p>
]]></content:encoded>
			<wfw:commentRss>http://deepakbala.me/2012/11/02/5-things-your-grandchildren-will-probably-say-in-the-future/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Visualizing progress on github issues</title>
		<link>http://deepakbala.me/2012/09/26/visualizing-progress-on-github-issues/</link>
		<comments>http://deepakbala.me/2012/09/26/visualizing-progress-on-github-issues/#comments</comments>
		<pubDate>Wed, 26 Sep 2012 19:56:58 +0000</pubDate>
		<dc:creator>Deepak</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[github]]></category>

		<guid isPermaLink="false">http://deepakbala.me/?p=65</guid>
		<description><![CDATA[&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; Countless hours of mine have been spent gawking at the graphs section of my github repos.  Its a great place to visualize how your project is growing / shrinking with time. While there is support for visualizing data on commits made on github, there is no way [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://deepakbala.me/wp-content/uploads/2012/09/octocat.png"><img class="alignleft size-medium wp-image-66" title="octocat" src="http://deepakbala.me/wp-content/uploads/2012/09/octocat-300x300.png" alt="" width="300" height="300" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>Countless hours of mine have been spent gawking at the graphs section of my github repos.  Its a great place to visualize how your project is growing / shrinking with time. While there is support for visualizing data on commits made on github, there is no way to visualize issues that were created on the code that was commited. Fortunately the github <a href="http://developer.github.com/v3/issues/">issues API</a> provides a RESTFul way to extract data related to the repo&#8217;s issues.</p>
<p>I wrote a short script that accepts the following parameters for input and churns out some CSV data to the console.</p>
<p>-r &lt;The name of your github repo&gt;</p>
<p>-u &lt;Your user name in github&gt;</p>
<p>-p &lt;Optional parameter &#8211; Your github password. This is needed if your repo is private&gt;</p>
<p>The CSV data is printed on the console under two groups</p>
<ul>
<li>Issue timeline &#8211; A sorted data set of the issue fields created_date , closed_date. This can help you visualize the time span between any issue creation and closure. The larger the area between the two curves, the larger the inactivity on issues.</li>
</ul>
<p style="padding-left: 60px;">
<p style="padding-left: 60px;"><strong> Issues &#8211; Created Vs Closed time lapse</strong><a href="http://deepakbala.me/wp-content/uploads/2012/09/github_c_c_2.png"><img class="alignleft size-full wp-image-74" title="github_c_c_2" src="http://deepakbala.me/wp-content/uploads/2012/09/github_c_c_2.png" alt="" width="881" height="438" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<ul>
<li>Issues Open / Closed at any time &#8211; A sorted data set that shows the number of open / closed issues at any point in time. Larger space between the lines indicate more open issues at that point in time.</li>
</ul>
<p style="padding-left: 60px;">
<p style="padding-left: 60px;"><strong> Issues &#8211; Created Vs Closed count</strong></p>
<p style="padding-left: 60px;"><a href="http://deepakbala.me/wp-content/uploads/2012/09/github_vis.png"><img class="alignleft size-full wp-image-69" title="github_vis" src="http://deepakbala.me/wp-content/uploads/2012/09/github_vis.png" alt="" width="940" height="473" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>The script is available <a href="https://github.com/deepak-bala/github-issue-vis">on github</a>. Feel free to execute it on a repo of your liking. Some things to note</p>
<ol>
<li>The lines on the second graph will always converge since only closed issues are taken into consideration.</li>
<li>The graphs themselves can be misinterpreted because open issues do not form part of the data set. You can always add open issues into the list (on the python program) by changing the &#8216;status&#8217; query param on the github call.</li>
</ol>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://deepakbala.me/2012/09/26/visualizing-progress-on-github-issues/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analyzing H1B data with python and pandas</title>
		<link>http://deepakbala.me/2012/08/26/analyzing-h1b-data-with-python-and-pandas/</link>
		<comments>http://deepakbala.me/2012/08/26/analyzing-h1b-data-with-python-and-pandas/#comments</comments>
		<pubDate>Sun, 26 Aug 2012 16:53:23 +0000</pubDate>
		<dc:creator>Deepak</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[pandas]]></category>

		<guid isPermaLink="false">http://deepakbala.me/?p=33</guid>
		<description><![CDATA[The United States department of labor releases a CSV file every quarter with data pertaining to H-2B / H-1B Visas and the like. Under the LCA program the VISAs of interest are H1B, H1B1 (Chile / Singapore) and E3 (Australian). The format of the CSV file is well documented. I diced through some of this [...]]]></description>
				<content:encoded><![CDATA[<p>The United States department of labor releases a <a href="http://www.foreignlaborcert.doleta.gov/quarterlydata.cfm">CSV file every quarter</a> with data pertaining to H-2B / H-1B Visas and the like. Under the LCA program the VISAs of interest are H1B, H1B1 (Chile / Singapore) and E3 (Australian). The format of the CSV file is <a href="http://www.foreignlaborcert.doleta.gov/pdf/quarter_2_2012/LCA_Record_Layout_FY12.doc">well documented</a>. I diced through some of this data with python and the data analysis library <a href="http://pandas.pydata.org/">pandas</a>.</p>
<h2>Before we get to the details&#8230;</h2>
<ul>
<li>If you are interested in the data alone, that is fine. Just skim the sections that involve programming / analysis.</li>
<li>If you know python but do not know pandas &#8211; no worries. The code is self explanatory in most places.</li>
<li>If you know python and pandas, that is awesome. You can play around with the code and analyse the data in greater depth. The full program is available on <a href="https://github.com/deepak-bala/python/blob/master/LcaReader.py">github</a> under the <a href="http://opensource.org/licenses/mit-license.php">MIT license</a>.</li>
</ul>
<p><a href="http://en.wikipedia.org/wiki/H-1B_visa">H1B</a> &#8211; Generic foreign workers</p>
<p><a href="http://en.wikipedia.org/wiki/H-1B_visa#Congressional_yearly_numerical_cap">H1B1</a> &#8211; Workers from Chile and Singapore</p>
<p><a href="http://en.wikipedia.org/wiki/E-3_visa"> E3</a> &#8211; Workers from Australia</p>
<h2>Data of interest</h2>
<p>The following columns on the CSV are the ones I found most interesting.</p>
<table width="743" border="1" cellspacing="0" cellpadding="0">
<colgroup>
<col width="233" />
<col width="510" /> </colgroup>
<tbody>
<tr>
<td width="233" height="20"><strong>Column</strong></td>
<td width="510"><strong>Description</strong></td>
</tr>
<tr>
<td height="38"><span style="color: #0000ff;">STATUS</span></td>
<td width="510"><span style="color: #0000ff;">Status associated with the last significant event or decision. Valid values include “Certified,” “Certified-Withdrawn,” Denied,” and “Withdrawn”</span></td>
</tr>
<tr>
<td height="57"><span style="color: #de1c01;">VISA_CLASS</span></td>
<td width="510"><span style="color: #de1c01;">Indicates the type of temporary application submitted for processing. R = H-1B; A = E-3 Australian; C = H-1B1 Chile; S = H-1B1 Singapore.  Also referred to as “Program” in prior years.</span></td>
</tr>
<tr>
<td height="20"><span style="color: #0000ff;">LCA_CASE_EMPLOYER_NAME</span></td>
<td width="510"><span style="color: #0000ff;">Employer&#8217;s name</span></td>
</tr>
<tr>
<td height="20"><span style="color: #de1c01;">LCA_CASE_EMPLOYER_CITY</span></td>
<td width="510"><span style="color: #de1c01;">Employer&#8217;s city</span></td>
</tr>
<tr>
<td height="20"><span style="color: #0000ff;">LCA_CASE_EMPLOYER_STATE</span></td>
<td width="510"><span style="color: #0000ff;">Employer&#8217;s state</span></td>
</tr>
<tr>
<td height="20"><span style="color: #de1c01;">LCA_CASE_JOB_TITLE</span></td>
<td width="510"><span style="color: #de1c01;">Job title</span></td>
</tr>
<tr>
<td height="20"><span style="color: #0000ff;">LCA_CASE_WAGE_RATE_FROM</span></td>
<td width="510"><span style="color: #0000ff;">Employer&#8217;s proposed wage rate</span></td>
</tr>
</tbody>
</table>
<h2> Munging</h2>
<p>Before the data is sliced and diced, it must be munged (a fancy way of saying cleaned).</p>
<ul>
<li>The wage rate column is empty in some cases &#8211; Set these to 0.</li>
<li>The rate of wage is non-uniform (Salary per Hour vs Year vs Week etc) &#8211; Normalize this to a year.</li>
<li>Employer name contains whitespaces / periods , which make grouping difficult &#8211; Clean the employer string.</li>
<li>Some of the wage data appears to be incorrect. Like the preschool spanish teacher that makes 2 mil $ a year &#8211; Set any wage above 1 million to 0.</li>
</ul>
<h2>Slice and dice</h2>
<p>The python program begins by reading the CSV data (all 48 MB of it) into a dataframe. The dataframe is a SQL-table like structure that is composed of series (columns) and can be sliced and diced in various ways through applied functions.</p><pre class="crayon-plain-tag">main_frame_chunks = pd.read_csv('/usr/local/LCAFY2012_Q2.csv')</pre><p>Apply a function to this frame to clean the data. Each row on the frame is processed using this function.</p><pre class="crayon-plain-tag">main_frame_chunks = main_frame_chunks.apply(normalize_sal, axis=1)

def normalize_sal(x):
#    Normalize salary to a year
    if x.ix[RATE_FROM] is None or &quot;&quot; or math.isnan(x.ix[RATE_FROM]) or x.ix[RATE_FROM] &amp;gt; 1000000:
        x.ix[RATE_FROM] = 0
    if x.ix[WAGE_RATE_UNIT] == WAGE_RATE_UNIT_HOUR:
        x.ix[RATE_FROM] *=  (8 * 240)
    elif x.ix[WAGE_RATE_UNIT] == WAGE_RATE_UNIT_MONTH:
        x.ix[RATE_FROM] *= 12
    elif x.ix[WAGE_RATE_UNIT] == WAGE_RATE_UNIT_BI_WEEK:
        x.ix[RATE_FROM] *= 52 / 2
    elif x.ix[WAGE_RATE_UNIT] == WAGE_RATE_UNIT_WEEK:
        x.ix[RATE_FROM] *= 52
#    Clean employer name
    if isinstance(x[EMPLOYER_NAME],str):
        x[EMPLOYER_NAME] = x[EMPLOYER_NAME].replace('.','').strip()
    return x</pre><p>Obtaining the top N counts of a particular column requires only one line of code</p><pre class="crayon-plain-tag">pretty_print('Most sought after VISAs')
    print(main_frame_chunks[VISA_CLASS].value_counts())

# Output
# ********************************************************
# Most sought after VISAs
# ********************************************************
# H-1B                          165099
# E-3 Australian                  3248
# H-1B1 Chile                      186
# H-1B1 Singapore                  184
# Select Visa Classification         1</pre><p>Grouping data and then applying a function over them (say sum()) requires about 4-5 lines of code</p><pre class="crayon-plain-tag">pretty_print('Employers with most number of VISAS in any status')
    name_group = main_frame_chunks.groupby([main_frame_chunks [EMPLOYER_NAME], main_frame_chunks[STATUS]])
    name_copy_group = name_group[STATUS].count().copy()
    name_copy_group.sort()
    print(name_copy_group[-top:])</pre><p>You can visualize the data using pandas (since it has the ability to plot graphs)  , but I chose to export the data to excel to make pie charts out of it. Here is some of the visualized data followed by the program&#8217;s full textual output</p>
<h3>Most sought after VISAS</h3>
<p><a href="http://deepakbala.me/wp-content/uploads/2012/08/visas.png"><img class="alignleft size-full wp-image-47" title="visas" src="http://deepakbala.me/wp-content/uploads/2012/08/visas.png" alt="" width="548" height="244" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3>Popular job titles</h3>
<p><a href="http://deepakbala.me/wp-content/uploads/2012/08/positions.png"><img class="alignleft size-full wp-image-46" title="positions" src="http://deepakbala.me/wp-content/uploads/2012/08/positions.png" alt="" width="631" height="362" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3>Jobs with the highest pay across all companies</h3>
<p><a href="http://deepakbala.me/wp-content/uploads/2012/08/jobs.png"><img class="alignleft size-full wp-image-45" title="jobs" src="http://deepakbala.me/wp-content/uploads/2012/08/jobs.png" alt="" width="497" height="316" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3>Companies with the most certified VISAs</h3>
<p><a href="http://deepakbala.me/wp-content/uploads/2012/08/company.png"><img class="alignleft size-full wp-image-44" title="company" src="http://deepakbala.me/wp-content/uploads/2012/08/company.png" alt="" width="601" height="440" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h2>Full text output</h2>
<p></p><pre class="crayon-plain-tag">Finished reading data
Processing 168718 rows
Finished normalizing data
********************************************************
Most sought after VISAs
********************************************************
H-1B                          165099
E-3 Australian                  3248
H-1B1 Chile                      186
H-1B1 Singapore                  184
Select Visa Classification         1
********************************************************
Top ten job titles
********************************************************
PROGRAMMER ANALYST          13388
SOFTWARE ENGINEER            6069
COMPUTER PROGRAMMER          4035
SYSTEMS ANALYST              3651
BUSINESS ANALYST             1874
COMPUTER SYSTEMS ANALYST     1851
SENIOR CONSULTANT            1723
PHYSICAL THERAPIST           1676
SOFTWARE DEVELOPER           1495
SENIOR SOFTWARE ENGINEER     1382
********************************************************
Employers with most number of VISAS in any status
********************************************************
LCA_CASE_EMPLOYER_NAME             STATUS   
LARSEN &amp;amp; TOUBRO INFOTECH LIMITED   CERTIFIED     952
ERNST &amp;amp; YOUNG US LLP               CERTIFIED    1142
PATNI AMERICAS INC                 CERTIFIED    1161
ACCENTURE LLP                      CERTIFIED    1255
IBM INDIA PRIVATE LIMITED          CERTIFIED    1464
WIPRO LIMITED                      CERTIFIED    1940
MICROSOFT CORPORATION              CERTIFIED    2085
TATA CONSULTANCY SERVICES LIMITED  CERTIFIED    2134
DELOITTE CONSULTING LLP            CERTIFIED    2194
INFOSYS LIMITED                    CERTIFIED    5276
********************************************************
148021 VISAs are certified and 20697 are not
********************************************************
********************************************************
Employers with the highest salary budget
********************************************************
LCA_CASE_EMPLOYER_NAME
MICROSOFT CORPORATION                       2.239855e+08
BANYAN ENERGY, INC                          2.304000e+08
VIRGO INC                                   2.332100e+08
INTEL CORPORATION                           2.359001e+08
IGATE GLOBAL SOLUTIONS, AN IGATE COMPANY    2.730697e+08
AMAZON CORPORATE LLC                        4.185291e+08
NEWMONT INTERNATIONAL SERVICES LIMITED      4.237428e+08
GOOGLE INC                                  4.445123e+08
INFOSYS LIMITED                             6.228491e+08
QUALCOMM INCORPORATED                       7.204821e+08
Name: LCA_CASE_WAGE_RATE_FROM
********************************************************
The city offering the highest dough (summed over all positions)
********************************************************
LCA_CASE_EMPLOYER_CITY  LCA_CASE_EMPLOYER_STATE
IRVING                  TX                         4.118216e+08
GREENWOOD VILLAGE       CO                         4.368157e+08
SEATTLE                 WA                         4.586871e+08
BELLEVUE                WA                         4.989642e+08
MOUNTAIN VIEW           CA                         6.737251e+08
HOUSTON                 TX                         6.851985e+08
CHICAGO                 IL                         7.032560e+08
PLANO                   TX                         7.072252e+08
SAN DIEGO               CA                         7.694944e+08
NEW YORK                NY                         1.280386e+09
Name: LCA_CASE_WAGE_RATE_FROM
********************************************************
Jobs with the most dough (summed across offers from all employers)
********************************************************
LCA_CASE_JOB_TITLE
RESEARCH SCIENTIST II                                2.258712e+08
CONSULTANT                                           2.291776e+08
CHIEF EXECUTIVE OFFICER                              2.445069e+08
DATABASE ADMINISTRATOR                               2.625037e+08
COMPUTER SYSTEMS ANALYST                             3.641315e+08
REGIONAL DIRECTOR HEALTH SAFETY &amp;amp; LOSS PREVENTION    4.224000e+08
COMPUTER PROGRAMMER                                  4.587185e+08
SYSTEMS ANALYST                                      6.021489e+08
SOFTWARE ENGINEER                                    1.809624e+09
PROGRAMMER ANALYST                                   2.060111e+09
Name: LCA_CASE_WAGE_RATE_FROM
Done</pre><p></p>
<h2> More on pandas</h2>
<p>Pandas is a great library to perform quick analysis on data. The ease with which you can prune and analyze the data is impressive. Knowledge of the library and its internal structure is of course necessary to make the best use of it. If you&#8217;d like to know more about pandas, the following book is a great read.</p>
<p>&nbsp;</p>
<p><a href="http://www.amazon.com/exec/obidos/ASIN/1449319793/jr_bunk-20"><img src="http://ecx.images-amazon.com/images/I/51PSZ4VIblL._SL500_AA300_.jpg" alt="Python for Data Analysis" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://deepakbala.me/2012/08/26/analyzing-h1b-data-with-python-and-pandas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
