<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">

<channel>
	<title>tomkleinpeter.com</title>
	
	<link>http://www.tomkleinpeter.com</link>
	<description />
	<lastBuildDate>Tue, 08 May 2012 19:35:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/Spiteful" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="spiteful" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>In San Francisco Next Week</title>
		<link>http://www.tomkleinpeter.com/2011/10/18/in-san-francisco-next-week/</link>
		<comments>http://www.tomkleinpeter.com/2011/10/18/in-san-francisco-next-week/#comments</comments>
		<pubDate>Tue, 18 Oct 2011 18:41:13 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.tomkleinpeter.com/?p=139</guid>
		<description><![CDATA[I&#8217;m going to be down in San Francisco on the 26th next week. Anyone want to get some breakfast/lunch/coffee? Shoot me an email at tomkleinpeter@gmail.com.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m going to be down in San Francisco on the 26th next week.  Anyone want to get some breakfast/lunch/coffee?  Shoot me an email at tomkleinpeter@gmail.com.</p>
<img src="http://feeds.feedburner.com/~r/Spiteful/~4/YqK63gbuS8Q" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.tomkleinpeter.com/2011/10/18/in-san-francisco-next-week/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Android Market Search Bug</title>
		<link>http://www.tomkleinpeter.com/2011/07/27/android-market-search-bug/</link>
		<comments>http://www.tomkleinpeter.com/2011/07/27/android-market-search-bug/#comments</comments>
		<pubDate>Wed, 27 Jul 2011 18:49:32 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.tomkleinpeter.com/?p=127</guid>
		<description><![CDATA[Hey Google folks, I love that Android exists, but these kinds of bugs sort of drive me insane. I know the app is still there. I know the name is &#8220;Audiogalaxy Music&#8221;. Please help my users find it when they search for it by name. Are you caching search results from a device that we [...]]]></description>
			<content:encoded><![CDATA[<p>Hey Google folks, I love that Android exists, but these kinds of bugs sort of drive me insane.  <a href="http://www.tomkleinpeter.com/wp-content/uploads/2011/07/missing-search-small.png"><img src="http://www.tomkleinpeter.com/wp-content/uploads/2011/07/missing-search-small.png" alt="Guess what the name of my app is?" title="No match" width="480" height="288" class="alignnone size-full wp-image-128" /></a></p>
<p>I know <a href="https://market.android.com/details?id=com.audiogalaxy&#038;feature=search_result">the app</a> is still there.  I know the name is &#8220;Audiogalaxy Music&#8221;.  Please help my users find it when they search for it by name.  </p>
<p>Are you caching search results from a device that we don&#8217;t support and then showing them to me?  This feels like that sort of bug.</p>
<p><em>Update &#8212; and now we are back (though that text about no matches still cracks me up):</em><a href="http://www.tomkleinpeter.com/wp-content/uploads/2011/07/missing-search-2-small.png"><img src="http://www.tomkleinpeter.com/wp-content/uploads/2011/07/missing-search-2-small.png" alt="" title="Much better" width="480" height="288" class="alignnone size-full wp-image-134" /></a></p>
<img src="http://feeds.feedburner.com/~r/Spiteful/~4/0O4QFa7UZk8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.tomkleinpeter.com/2011/07/27/android-market-search-bug/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Introducing the Audiogalaxy API</title>
		<link>http://www.tomkleinpeter.com/2011/03/31/introducing-the-audiogalaxy-api/</link>
		<comments>http://www.tomkleinpeter.com/2011/03/31/introducing-the-audiogalaxy-api/#comments</comments>
		<pubDate>Thu, 31 Mar 2011 18:50:05 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.tomkleinpeter.com/?p=110</guid>
		<description><![CDATA[I never posted about it, but in 2008, I started working with a group to relaunch Audiogalaxy. It looked like we had an opportunity to (legally) run the same thing we had before: a DRM-free, all you can eat music downloading service. We spent almost 2 years on that effort, but ultimately it didn&#8217;t work [...]]]></description>
			<content:encoded><![CDATA[<p>I never posted about it, but in 2008, I started working with a group to relaunch Audiogalaxy.  It looked like we had an opportunity to (legally) run the same thing we had before: a DRM-free, all you can eat music downloading service.  We spent almost 2 years on that effort, but ultimately it didn&#8217;t work out (that advice about staying out of the music business is pretty solid, btw).  So we pivoted, and after looking at the chunks of technology we had laying around the shop, we decided a music place-shifting service would be a good start.  </p>
<p>After a year of working on it, we are happy with our backend and our functionality, so we&#8217;ve exposed an API.  Check out my post on the <a href="http://www.audiogalaxy.com/blog/2011/03/introducing-the-audiogalaxy-api/">Audiogalaxy Blog</a> for some more information, and follow <a href="http://www.twitter.com/audiogalaxy/">@audiogalaxy</a> for all the latest news.</p>
<img src="http://feeds.feedburner.com/~r/Spiteful/~4/jUKI3AHnzcM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.tomkleinpeter.com/2011/03/31/introducing-the-audiogalaxy-api/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Where Are the AB Testing Frameworks?</title>
		<link>http://www.tomkleinpeter.com/2009/01/21/where-are-the-ab-testing-frameworks/</link>
		<comments>http://www.tomkleinpeter.com/2009/01/21/where-are-the-ab-testing-frameworks/#comments</comments>
		<pubDate>Wed, 21 Jan 2009 23:18:49 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/?p=69</guid>
		<description><![CDATA[I read news.yc and reddit/programming pretty regularly to keep up with what is going on in the biz. Based on that reading, I can probably name a dozen different systems for building high scale applications (distributed storage, message queues, caching layers, search engines, etc), but I can&#8217;t name a single AB testing framework other than [...]]]></description>
			<content:encoded><![CDATA[<p>I read <a href="http://news.ycombinator.com">news.yc</a> and <a href="http://www.reddit.com/r/programming">reddit/programming</a> pretty regularly to keep up with what is going on in the biz.  Based on that reading, I can probably name a dozen different systems for building high scale applications (distributed storage, message queues, caching layers, search engines, etc), but I can&#8217;t name a single AB testing framework other than <a href="https://www.google.com/analytics/siteopt">Google Website Optimizer</a>.  That seems like a serious inversion of priorities for most startups.  Everyone with a sign up page should use AB testing.  Not everyone needs a message queue.</p>
<p>Is this because:
<ul>
<li>Nobody needs anything other than Google Website Optimizer?</li>
<li>Startups don&#8217;t actually do AB testing, possibly because they don&#8217;t get enough traffic to get meaningful results, or maybe because they don&#8217;t have time?</li>
<li>AB testing (including the statistical analysis to determine if results are valid) is so simple that everyone just bangs out their own?</li>
<li>As a largely theoretical issue for most startups, scalability is more fun to talk about on the Internet?</li>
<li>Everyone that is using AB testing is so happy that they are trying to suppress information about it so their competitors don&#8217;t start doing it too?</li>
</ul>
<p>If everyone is secretly using some great framework please shoot me an email and let me know.</p>
<p>If you haven&#8217;t thought much about it before, here is <a href="http://exp-platform.com/Documents/GuideControlledExperiments.pdf">a short paper on AB testing</a> from some folks that made Amazon a ton of money.</p>
<img src="http://feeds.feedburner.com/~r/Spiteful/~4/O06600UX24Q" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.tomkleinpeter.com/2009/01/21/where-are-the-ab-testing-frameworks/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Two and a Half Months of Twitter</title>
		<link>http://www.tomkleinpeter.com/2008/09/20/two-and-a-half-months-of-twitter/</link>
		<comments>http://www.tomkleinpeter.com/2008/09/20/two-and-a-half-months-of-twitter/#comments</comments>
		<pubDate>Sat, 20 Sep 2008 20:10:44 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/?p=68</guid>
		<description><![CDATA[After a few months of playing around with Twitter, the service is really growing on me. The ability to have casual IM-ish conversations without any immediacy is nice. Also, having a place to record short thoughts and interesting links that other people might like scratches some sort of itch for me. I wouldn&#8217;t want to [...]]]></description>
			<content:encoded><![CDATA[<p>After a few months of playing around with <a href="http://www.twitter.com/tklein">Twitter</a>, the service is really growing on me.  The ability to have casual IM-ish conversations without any immediacy is nice.  Also, having a place to record short thoughts and interesting links that other people might like scratches some sort of itch for me.  I wouldn&#8217;t want to write up a whole blog post for any of these, but they were all interesting enough to post on twitter:</p>
<ul>
<li>A clever proposal from Google: <a href="http://groups.google.com/group/SDCH">Shared Dictionary Compression over HTTP</li>
<li><a href="http://technet.microsoft.com/en-us/sysinternals/bb897561.aspx">Cacheset</a> &#8211; a tool for clearing the windows disk cache (useful for testing cold starts).</li>
<li>Fun fact: the Tesla Roadster carries <a href="http://www.teslamotors.com/blog4/?p=68">3 milligrams of electrons</a> when fully charged.</li>
<li>The ultimate Airplane on a Treadmill debate resource: <a href="http://www.airplaneonatreadmill.com/">www.airplaneonatreadmill.com</a></li>
<li>A 728-ton <a href="http://blog.longnow.org/2008/06/25/728-ton-pendulum/">tuned mass damper</a> in a skyscraper</li>
</ul>
<p>But, I don&#8217;t think I&#8217;ve reached the critical mass of followers necessary to really unlock the Q&#038;A potential of the site.  Having a few hundred technical folks all following each other would be a tremendously useful resource for everyone involved.  For example, I&#8217;m considering upgrading my desktop to 8 or 16GB of RAM.  I&#8217;m going to need a new motherboard, processor, and RAM.  My normal approach for this would be to spend a few hours on Newegg and the hardware review sites trying to figure out where the price/performance curve is and making sure I&#8217;m not getting ripped off.  If someone else has done this same research it would be nice to use their information as a starting point, and twitter provides the kind of free-form conversation necessary for that kind of sharing.  </p>
<p>To really make this work, you need to run one of the desktop apps so you don&#8217;t have to constantly reload the website (I use <a href="http://www.twhirl.org/">Twhirl</a>). </p>
<img src="http://feeds.feedburner.com/~r/Spiteful/~4/PiSTGV83H5o" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.tomkleinpeter.com/2008/09/20/two-and-a-half-months-of-twitter/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Next Gen Productivity Monitoring Software</title>
		<link>http://www.tomkleinpeter.com/2008/08/25/next-gen-productivity-monitoring-software/</link>
		<comments>http://www.tomkleinpeter.com/2008/08/25/next-gen-productivity-monitoring-software/#comments</comments>
		<pubDate>Mon, 25 Aug 2008 16:58:45 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Neat Ideas]]></category>
		<category><![CDATA[intolerance]]></category>
		<category><![CDATA[productivity]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/?p=67</guid>
		<description><![CDATA[Now that I have a new baby, it is even more important to me that the time I spend in front of the computer is spent efficiently and productively. I’ve played around with productivity-monitoring software like RescueTime and TimeSnapper, and they provide a convenient way to record how I wasted my day. It’s a nice [...]]]></description>
			<content:encoded><![CDATA[<p>Now that I have a new baby, it is even more important to me that the time I spend in front of the computer is spent efficiently and productively. I’ve played around with productivity-monitoring software like <a href="http://www.rescuetime.com/">RescueTime</a> and <a href="http://www.timesnapper.com/">TimeSnapper</a>, and they provide a convenient way to record how I wasted my day.  It’s a nice first step, but I’d like to see this class of application expand into 3 new areas: positive feedback, targeted recommendations, and an attention API.</p>
<p><strong>Positive Feedback</strong><br />
Being told that I only spent 10% of my day doing work is good to know, but getting a low number might depress me rather than motivate me.  I suggest a system that actually rewards me when I have a killer day or a great week.   For example, I give the service $25 or $50 up front, and after I meet some sort of goal it buys me something off my Amazon wish list.  </p>
<p>Wouldn’t that be neat?  You’re having a good week, and suddenly a book you want shows up at your door.  The key to this is making the rewards <a href="http://serendip.brynmawr.edu/bb/neuro/neuro05/web1/isiddiqui.html">somewhat random</a>: </p>
<blockquote><p>Several studies have been conducted which targeted neural response to rewards.  The results were unanimous in the fact that when one performed an action over and over again, and was given a reward randomly, dopamine levels rose.  If the reward was given consistently, i.e. every four times the action was performed, the dopamine levels remained constant.</p></blockquote>
<p>A slight variation that might work better would be for each contiguous block of productivity over a certain length, you have a chance of earning a credit towards a purchase.  After N credits, the service automatically buys and sends you the item.  Structuring it like this would make the feedback more rapid and allow for a little burst of dopamine each time you get an email saying you earned a credit.  Isn&#8217;t this why MMORPGs are so much fun?.  </p>
<p>A program to help you get addicted to work is either terrifying or a big win.  Either way, it would be really neat to try.</p>
<p><strong>Targeted Recommendations</strong><br />
Some software is just better than the default stuff that ships with Windows.  For example, I like Textpad and Paint.net a lot more than Notepad and MS Paint.  I’ve also been pleased with my switch from Bloglines to Google Reader, and from web based twitter to Twhirl.  If a program spends all day monitoring my activity, it would be a cinch for it to recommend the tools and websites that are considered “best in class.”  </p>
<p>There is obvious potential here in terms of sponsored recommendations, but it would be nice to see those separated out from community or editor controlled listings.  Recommendations could be driven by some sort of wiki, which would make for all kinds of interesting fights over things like whether Google Reader is better than Bloglines.  Any recommendation could also come with an estimate of the number of people that are currently using it, which would help the cream rise to the top.</p>
<p>Ultimately, it’s not just about how much time you spend slogging away – making good use of computer time is an important dimension of productivity as well.</p>
<p>A slight variation on this idea is to recommend something like <a href="https://addons.mozilla.org/en-US/firefox/addon/4476">LeechBlock</a> if the user is spending too much time on the web.    </p>
<p><strong>Attention API</strong><br />
I know I’ve seen this idea somewhere before, but because things like RescueTime are actually in a position to make it happen, I’m going to mention it here.  Interruptions are usually bad, but there are some times that they are worse than others.  If I’ve been focused on Visual Studio and Windbg for 30 minutes with no breaks, I’m almost certainly in that fascinating “<a href="http://en.wikipedia.org/wiki/Flow_(psychology)">flow</a>” state, and I’m going to be angry if I get an IM or (even worse) if some random app asks me to download a new version.  </p>
<p>To deal with this kind of thing, it would be great to have a standard for publishing my current tolerance for interruptions, just like IM apps publish my presence.  Both desktop apps and remote users could use this to determine if what they want to tell me is important enough to interrupt me.  Of course, this only works if apps pay attention to it,  but first we would need some apps that can accurately measure it.  I&#8217;m not terribly good with naming things, but unless someone has something better, I’m going to suggest calling this value your “inTolerance”.</p>
<p>So there you go.  One idea to help you spend more time being productive, another to help you make better use of the time you are actually working, and a third to keep you from getting interrupted.  Anyone want to go and implement this stuff?  I’ll be happy to beta test it for you.  </p>
<img src="http://feeds.feedburner.com/~r/Spiteful/~4/RsBi9LZQO-U" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.tomkleinpeter.com/2008/08/25/next-gen-productivity-monitoring-software/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Netflix Prize Concept  + Google 411 Data</title>
		<link>http://www.tomkleinpeter.com/2008/08/13/netflix-prize-concept-google-411-data/</link>
		<comments>http://www.tomkleinpeter.com/2008/08/13/netflix-prize-concept-google-411-data/#comments</comments>
		<pubDate>Wed, 13 Aug 2008 19:15:14 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Neat Ideas]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[ideas]]></category>
		<category><![CDATA[prizes]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/?p=66</guid>
		<description><![CDATA[I’ve really enjoyed watching the Netflix Prize develop. Amazingly, over 3600 teams have submitted a prediction, which makes Netflix the big winner in this contest. The company will undoubtedly end up with a better product due to the amount of interest and research in collaborative filtering they have generated. But ultimately, better movie recommendations don’t [...]]]></description>
			<content:encoded><![CDATA[<p>I’ve really enjoyed watching the <a href="http://www.netflixprize.com/">Netflix Prize</a> develop.  Amazingly, over 3600 teams have submitted a prediction, which makes Netflix the big winner in this contest.  The company will undoubtedly end up with a better product due to the amount of interest and research in collaborative filtering they have generated.  </p>
<p>But ultimately, better movie recommendations don’t matter a whole lot to me.  I’m more interested in the fact that by providing a unique set of data and a prize, they’ve been able stimulate so much interest.  The other day I was thinking about which companies are in a position to sponsor contests in other fields that might have a bigger impact on my life, and one thought jumped into my head – Google’s 411 phoneme collection service.  <a href="http://www.infoworld.com/archives/emailPrint.jsp?R=printThis&#038;A=/article/07/10/23/Google-wants-your-phonemes_1.html">Marissa Meyers says</a>:</p>
<blockquote><p>
You may have heard about our [directory assistance] 1-800-GOOG-411 service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model &#8230; that we can use for all kinds of different things, including video search.</p>
<p>The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that</p></blockquote>
<p>Presumably, Google has already done the heavy lifting to manually transcribe a large number of these samples so that they can train their own algorithms.  Why not create a contest that lets teams submit an algorithm that gets trained on a subset of the data and then tested against the rest?  Speech recognition is more complicated than movie recommendations, but making it easy to train and test an algorithm against an interesting number of samples would certainly lower the barrier to entry.  </p>
<p>Google would benefit from this in hiring, if nothing else.  It would give them a chance to realistically evaluate the work of all kinds of grad students and researchers, and demonstrate to the candidates the advantages of working for the company with the biggest databases.  </p>
<img src="http://feeds.feedburner.com/~r/Spiteful/~4/5or4s7tMThs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.tomkleinpeter.com/2008/08/13/netflix-prize-concept-google-411-data/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Handling Human Error In the Datacenter</title>
		<link>http://www.tomkleinpeter.com/2008/08/11/handling-human-error-in-the-datacenter/</link>
		<comments>http://www.tomkleinpeter.com/2008/08/11/handling-human-error-in-the-datacenter/#comments</comments>
		<pubDate>Mon, 11 Aug 2008 19:21:19 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[startups]]></category>
		<category><![CDATA[uptime]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/?p=64</guid>
		<description><![CDATA[When I was working on Live Mesh at Microsoft, I had the good fortune to meet James Hamilton. James is full of good ideas, many of which are captured in his paper “On Designing and Deploying Internet-Scale Services.” There is a lot of wisdom in those pages (Greg Linden had some thoughts on it), but [...]]]></description>
			<content:encoded><![CDATA[<p>When I was working on Live Mesh at Microsoft, I had the good fortune to meet <a href="http://perspectives.mvdirona.com/">James Hamilton</a>.  James is full of good ideas, many of which are captured in his paper <a href="http://research.microsoft.com/~jamesrh/TalksAndPapers/JamesRH_Lisa.pdf">“On Designing and Deploying Internet-Scale Services.”</a>  There is a lot of wisdom in those pages (Greg Linden had <a href="http://glinden.blogspot.com/2008/03/designing-for-internet-scale.html">some thoughts on it</a>), but I’d like to focus in on this snippet in particular:</p>
<blockquote><p>Design the system to never need human interaction, but understand that rare events will occur where combined failures or unanticipated failures require human interaction. </p></blockquote>
<p>Yes, designing the system to never need human interaction is a <a href="http://www.25hoursaday.com/weblog/2008/08/11/ManagingLargeWebServerFarmsMicrosoftsAutoPilot.aspx">great ideal to shoot for</a>, but when you are working for a startup with three guys and a dozen servers, you don’t have the resources or the justification to do it from Day 1.  It is entirely likely that your business model will <a href="http://teddziuba.com/2008/04/im-going-to-scale-my-foot-up-y.html">fail</a> before you lose a single disk.  And since backend refinements don’t pay the bills at a small scale, something with a pair of hands is going to be interacting with your system until you get enough people and servers to justify more automation.  </p>
<blockquote><p>These events will happen and operator error under these circumstances is a common source of catastrophic data loss.</p></blockquote>
<p>That is a wonderfully simple and accurate summary of How Bad Things Happen in your datacenter.  It starts when you lose a hard drive or MySQL crashes, and you have to promote your slave until you can check the master tables, or anything painful but routine happens.  But then, as you are trying to fix things, you notice, for example, that you are almost out of disk space.  When you start trying to fix more than one problem under pressure, you are entering a world of pain.  </p>
<p>The big issue here as James points out is that you are going to do something wrong.  You’ll probably use a much stronger word than “wrong” once it is all over, but let’s settle on “stupid” for right now.  </p>
<p>It won&#8217;t feel stupid until after you hit “enter,” but when you are making unfamiliar decisions quickly under pressure, you are extremely likely to overlook something.  Maybe you won&#8217;t shut down mysql before you start a myisamchk from the shell, or maybe you&#8217;ll reverse the arguments to &#8220;tar -cvzf&#8221; and wipe out something important.  Or perhaps you&#8217;ll screw up a firewall rule and block ssh access to the machine you are frantically trying to fix.  Accidently killing the ssh daemon is another favorite.  The point is that during a stressful situation in the datacenter, the human operator is the biggest potential source of more downtime or “catastrophic data loss.”  </p>
<p>Assuming you can’t automate everything, what can you do?  Well, the absolute best thing you can do is practice.  Corrupt some data on your dev master db, and see how long it takes you to get it restored from backups or a slave copy.  Practice what would happen if you lost a database slave and had to activate a spare machine to take its place (I hope you have at least one spare machine).  But of course, no one at a small startup has time to practice.  Maybe once you hire a full time ops guy, it would be good to make sure he is practicing this sort of thing occasionally.  But when practicing is going to take away from writing code, you aren&#8217;t going to practice.  </p>
<p>Since you aren’t going to practice, what else can you do?  The next best thing is to cultivate the attitude that you are the most likely source of problems.  Don’t worry about hard drives, worry about bad decisions.  Develop some humility about how you expect to behave when you get woken up at 4am to fix a database the morning of your launch or when a switch fails an hour before your big demo.  From that mindset, here are a few things to do:</p>
<p><strong>Script what you can</strong><br />
Off the top of my head, a good place to start would be writing scripts for some of the steps in setting up master/slave replication and manipulating firewall traffic (allowing or blocking external traffic, for instance).  </p>
<p><strong>Use the buddy system</strong><br />
It is not a bad idea to have somebody else there looking at what you are typing, or at least on the phone confirming things verbally.</p>
<p><strong>Take your fingers off the keyboard before you hit enter</strong><br />
Are you in the right directory?  Are you on the right machine?  Are those arguments in the right order?  Can you just rename this old stuff instead of deleting it?  All of these are excellent questions to ask yourself or your coworker while you have your hands in your lap. This is also a good idea when you are doing something scary with SQL, like running any query that doesn&#8217;t have a where clause.  </p>
<p><strong>Slow things down </strong><br />
As soon as you make one mistake, no matter how minor, it is time to slow things down.  Beyond the fact that making a mistake will fluster you, making one mistake demonstrates that right now, you are likely to make mistakes.  That is a huge red flag.  At this point, the safest thing may be to accept a slightly longer downtime just so you can slow things down, get some water, and relax.  Trying to compensate for a little mistake by doing things faster can result in a much, much worse mistake.  Unless you’ve just rolled a server cage down the stairs, there is always a worse mistake you can make.</p>
<p><strong>Make it hard for people you work with to make mistakes</strong><br />
A quality server naming scheme is the easiest thing you can do here.  No colors, deities, countries, snack foods, snakes, etc.  I like $machineType-$number myself, but with distinct number ranges, even between different machine types.  So, don&#8217;t have SQL-001 and Web-001.  One day, some very sleepy datacenter employee may get things mixed up when you call and ask him to reboot Web-001.  I’m sure you’ll get an apology, but you won’t get your uptime back.  So make it harder for him to screw up: if your web machines start at Web-201, he&#8217;ll have to make 2 mistakes before he accidently reboots your primary database.  </p>
<p><strong>Talk about this stuff ahead of time</strong><br />
You probably have plenty of stuff to talk about at lunch with your coworkers, but here are a few convers	ation starters if you want to sharpen your disaster recovery skills:</p>
<ul>
<li>&#8220;What happens if we lose power to one of our racks?&#8221;</li>
<li>&#8220;How many of our switches could we lose and get the site back up?&#8221;</li>
<li>&#8220;What is the smallest amount of hardware we could lose that would knock us 100% offline?&#8221;</li>
</ul>
<p>This stuff isn’t theoretical.  I woke up at 2am one weekend during FolderShare with a ton of text messages from our cluster.  The kindly folks at the datacenter had been doing power supply maintenance.  At some point, they powered down 2 of our racks.  Then, they powered them right back up.  It wasn’t tough to fix, but it was so unexpected that it took me a few minutes to even realize what had happened.  </p>
<p><strong>Use tricks to deal with the general class of &#8220;running a command on the wrong machine&#8221; problem.</strong><br />
Typing the right command on the wrong machine is obviously something to avoid.  But when you have a sea of ssh windows open, what can you do?  </p>
<ul>
<li>Use a different color background for your terminal to machines hosting master databases versus slaves</li>
<li>Make sure the machine name shows up in the command prompt </li>
</ul>
<p>Does anyone else have any good ideas or horror stories to tell?  Post a comment and share your wisdom and/or pain.</p>
<img src="http://feeds.feedburner.com/~r/Spiteful/~4/ZX4-riukSVw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.tomkleinpeter.com/2008/08/11/handling-human-error-in-the-datacenter/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>tklein on twitter</title>
		<link>http://www.tomkleinpeter.com/2008/06/30/tklein-on-twitter/</link>
		<comments>http://www.tomkleinpeter.com/2008/06/30/tklein-on-twitter/#comments</comments>
		<pubDate>Mon, 30 Jun 2008 21:03:54 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/?p=62</guid>
		<description><![CDATA[I&#8217;m on twitter now. Follow me at http://twitter.com/tklein]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m on twitter now.  Follow me at <a href="http://twitter.com/tklein">http://twitter.com/tklein</a></p>
<img src="http://feeds.feedburner.com/~r/Spiteful/~4/BJp-9y1kOCU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.tomkleinpeter.com/2008/06/30/tklein-on-twitter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dev Diligence: Don’t Invest in the Wrong Code</title>
		<link>http://www.tomkleinpeter.com/2008/04/25/dev-diligence-dont-invest-in-the-wrong-code/</link>
		<comments>http://www.tomkleinpeter.com/2008/04/25/dev-diligence-dont-invest-in-the-wrong-code/#comments</comments>
		<pubDate>Fri, 25 Apr 2008 20:32:54 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Dev Diligence]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/?p=61</guid>
		<description><![CDATA[When I&#8217;m starting a project or thinking about adding functionality to an existing code base, I always consider using any existing code. Sometimes this is obvious &#8211; I&#8217;m not going to write my own RDBMS &#8212; but frequently, it is a more difficult decision than it should be. In making a decision, I look first [...]]]></description>
			<content:encoded><![CDATA[<p>When I&#8217;m starting a project or thinking about adding functionality to an existing code base, I always consider using any existing code.   Sometimes this is obvious &#8211; I&#8217;m not going to write my own RDBMS &#8212; but frequently, it is a more difficult decision than it should be.  In making a decision, I look first at the questions that I can actually get answers to:</p>
<ul>
<li>Am I getting more than I need?  It pains me to add a multi megabyte DLL to a client download for a small amount of functionality.</li>
<li>Will I spend more time learning the interface than I would writing the functionality I need myself?</li>
<li>Is this an active project, and is there any documentation?</li>
<li>If scheduling isn&#8217;t an issue, how much fun would it be to write my own version?</li>
</ul>
<p>Next comes a set of questions that are oftentimes harder to answer:</p>
<ul>
<li>Who else is using it?</li>
<li>Will I be using it the same way as other people who are successfully using it?</li>
<li>What am I going to find out when I put more stress on it than anyone else?</li>
</ul>
<p>One library that passed my gauntlet of questions is <a href="http://libredblack.sourceforge.net/">libredblack</a>. It ended up on a bunch of production servers at FolderShare, and it worked out great.  But there was a catch: I wanted to use it to store large numbers of items, but for every item I put in the tree, the library would allocate an object that held 4 pointers and an enum.  That took 40 bytes on my dev box.  Throw in malloc&#8217;s overhead, and I was up to 48 bytes.  The objects I was storing pointers to would also have some heap overhead, which may be as much as 24 bytes.  So to store 10M items in memory, I&#8217;d need an extra half gigabyte of memory just for overhead. </p>
<p>A second example from personal experience is <a href="http://librsync.sourceforge.net/">librsync</a>.  Again, the library works exactly as advertised.  But if you want to transfer deltas for large (gigabyte+ files) on machines that have hard memory limits (like embedded devices), you need to know that the memory usage is proportional to the file size.  For my situation, I ended up having to adjust the window size as file sizes grew just to keep the memory usage reasonable for large files.  </p>
<p>I don&#8217;t want anyone to think I&#8217;m complaining about this stuff &#8211; I&#8217;m a fan of both libraries.  But both of these examples illustrate a class of problem that is particularly frustrating: the one you might not find until you are heavily invested in a solution.  These gotchas won&#8217;t affect most people, and thus aren&#8217;t likely to show up when you are researching possible solutions.  They aren&#8217;t bugs, either, but they might be something you have to deal with.  So the sooner you can find out about them, the better.</p>
<p>Fortunately, the internet has plenty of software built for solving problems like this.  <a href="http://www.devdiligence.com/wishlist">Dev Diligence</a><a href="#footer_1"><sup>[1]</sup></a> is a new wiki I&#8217;ve started to collect details like these.  My goal is to have a reference page for any library or service developers might consider using in their solution.  For sufficiently large libraries, pages for classes or functions might be necessary, but let’s not get ahead of ourselves.  Ultimately, I’d like to have 5 headings for everything in the wiki:</p>
<ul>
<li>Overview: Brief description of the software and a link to the homepage</li>
<li>Short case studies or war stories:  These would include a brief description of how you are using the software, the version you used, and ideally some metrics.  If you used it for a while and then switched to something else, an explanation of that decision is very valuable information.  For libredblack, the relevant metrics would be things like average number of elements in your trees or insertions/deletions per second.</li>
<li>“Gotchas” (like the ones I&#8217;ve mentioned above): Subtle problems (hello, <a href="http://blogs.msdn.com/ricom/archive/2006/02/02/523626.aspx">heap fragmentation</a>) and things that aren&#8217;t necessarily bugs, but issues that may affect your design or help you choose one solution over another.</li>
<li>Alternatives: The name pretty much says it all.  With links, please.</li>
<li>Other Resources:  Links to blog posts, email threads, or reference pages would be great.</li>
</ul>
<p>I’ve gone ahead and created entries for <a href="http://www.spiteful.com/dd/libredblack">libredblack</a>, <a href="http://www.spiteful.com/dd/librsync">librsync</a>, and <a href="http://www.spiteful.com/dd/zlib">zlib</a> based on my experiences. I’d love to see some entries for the following and things like it:</p>
<ul>
<li>libev, libevent, boost.asio, and Twisted</li>
<li>openssl</li>
<li>sqlite and berkeleydb</li>
<li>memcached, spread, the reliable queue solutions (Starling, TheSchwartz, etc), and anything that uses “pubsub” in its description</li>
<li>libcurl and wininet (stuff like <a href="http://nick.typepad.com/blog/2006/06/microsoft_pleas.html">Nick Bradbury&#8217;s description of a CPU spike in WinInet that can be triggered by chunked-encoding</a> is gold)</li>
</ul>
<p>All of these and more are linked to from the <a href="http://www.spiteful.com/dd/wishlist">WishList</a> page.  </p>
<p>Can you guys help me out?  I’ve got enough people subscribed to this feed that I’m certain at least one of you has used everything on my list.  If you take 10 minutes to write down your experiences, you can make the software world a better place.  To justify doing it on your company’s time, keep this in mind: if you document the fact that you are successfully using a solution, you increase the chance that other people will use it as well.  The more users a solution has, the the better it will become.  </p>
<p><a name="footer_1">[1]</a> &#8220;Dev Diligence&#8221; is of course a play on the term <a href="http://en.wikipedia.org/wiki/Due_diligence">&#8220;Due Diligence&#8221;</a>.  </p>
<img src="http://feeds.feedburner.com/~r/Spiteful/~4/caK0AvxhL2A" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.tomkleinpeter.com/2008/04/25/dev-diligence-dont-invest-in-the-wrong-code/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
