<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/rss2full.xsl" type="text/xsl" media="screen"?><?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/itemcontent.css" type="text/css" media="screen"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">

<channel>
	<title>Spiteful.com</title>
	
	<link>http://www.spiteful.com</link>
	<description />
	<pubDate>Fri, 25 Apr 2008 20:32:54 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
	<language>en</language>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/Spiteful" type="application/rss+xml" /><item>
		<title>Dev Diligence: Don’t Invest in the Wrong Code</title>
		<link>http://www.spiteful.com/2008/04/25/dev-diligence-dont-invest-in-the-wrong-code/</link>
		<comments>http://www.spiteful.com/2008/04/25/dev-diligence-dont-invest-in-the-wrong-code/#comments</comments>
		<pubDate>Fri, 25 Apr 2008 20:32:54 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
		<category><![CDATA[Dev Diligence]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/?p=61</guid>
		<description><![CDATA[When I’m starting a project or thinking about adding functionality to an existing code base, I always consider using any existing code.   Sometimes this is obvious—I’m not going to write my own RDBMS —but frequently, it is a more difficult decision than it should be.  In making a decision, I look first [...]]]></description>
			<content:encoded><![CDATA[<p>When I’m starting a project or thinking about adding functionality to an existing code base, I always consider using any existing code.   Sometimes this is obvious—I’m not going to write my own RDBMS —but frequently, it is a more difficult decision than it should be.  In making a decision, I look first at the questions that I can actually get answers to:</p>
<ul>
<li>Am I getting more than I need?  It pains me to add a multi megabyte DLL to a client download for a small amount of functionality.</li>
<li>Will I spend more time learning the interface than I would writing the functionality I need myself?</li>
<li>Is this an active project, and is there any documentation?</li>
<li>If scheduling isn’t an issue, how much fun would it be to write my own version?</li>
</ul>
<p>Next comes a set of questions that are oftentimes harder to answer:</p>
<ul>
<li>Who else is using it?</li>
<li>Will I be using it the same way as other people who are successfully using it?</li>
<li>What am I going to find out when I put more stress on it than anyone else?</li>
</ul>
<p>One library that passed my gauntlet of questions is <a href="http://libredblack.sourceforge.net/">libredblack</a>. It ended up on a bunch of production servers at FolderShare, and it worked out great.  But there was a catch: I wanted to use it to store large numbers of items, but for every item I put in the tree, the library would allocate an object that held 4 pointers and an enum.  That took 40 bytes on my dev box.  Throw in malloc’s overhead, and I was up to 48 bytes.  The objects I was storing pointers to would also have some heap overhead, which may be as much as 24 bytes.  So to store 10M items in memory, I’d need an extra half gigabyte of memory just for overhead. </p>
<p>A second example from personal experience is <a href="http://librsync.sourceforge.net/">librsync</a>.  Again, the library works exactly as advertised.  But if you want to transfer deltas for large (gigabyte+ files) on machines that have hard memory limits (like embedded devices), you need to know that the memory usage is proportional to the file size.  For my situation, I ended up having to adjust the window size as file sizes grew just to keep the memory usage reasonable for large files.  </p>
<p>I don’t want anyone to think I’m complaining about this stuff—I’m a fan of both libraries.  But both of these examples illustrate a class of problem that is particularly frustrating: the one you might not find until you are heavily invested in a solution.  These “gotchas” won’t affect most people, and thus aren’t likely to show up when you are researching possible solutions.  They aren’t bugs, either, but they might be something you have to deal with.  So the sooner you can find out about them, the better.</p>
<p>Fortunately, the internet has plenty of software built for solving problems like this.  <a href="http://www.spiteful.com/dd/wishlist">Dev Diligence</a><a href="#footer_1"><sup>[1]</sup></a> is a new wiki I’ve started to collect details like these.  My goal is to have a reference page for any library or service developers might consider using in their solution.  For sufficiently large libraries, pages for classes or functions might be necessary, but let’s not get ahead of ourselves.  Ultimately, I’d like to have 5 headings for everything in the wiki:</p>
<ul>
<li>Overview: Brief description of the software and a link to the homepage</li>
<li>Short case studies or war stories:  These would include a brief description of how you are using the software, the version you used, and ideally some metrics.  If you used it for a while and then switched to something else, an explanation of that decision is very valuable information.  For libredblack, the relevant metrics would be things like average number of elements in your trees or insertions/deletions per second.</li>
<li>“Gotchas” (like the ones I’ve mentioned above): Subtle problems (hello, <a href="http://blogs.msdn.com/ricom/archive/2006/02/02/523626.aspx">heap fragmentation</a>) and things that aren’t necessarily bugs, but issues that may affect your design or help you choose one solution over another.</li>
<li>Alternatives: The name pretty much says it all.  With links, please.</li>
<li>Other Resources:  Links to blog posts, email threads, or reference pages would be great.</li>
</ul>
<p>I’ve gone ahead and created entries for <a href="http://www.spiteful.com/dd/libredblack">libredblack</a>, <a href="http://www.spiteful.com/dd/librsync">librsync</a>, and <a href="http://www.spiteful.com/dd/zlib">zlib</a> based on my experiences. I’d love to see some entries for the following and things like it:</p>
<ul>
<li>libev, libevent, boost.asio, and Twisted</li>
<li>openssl</li>
<li>sqlite and berkeleydb</li>
<li>memcached, spread, the reliable queue solutions (Starling, TheSchwartz, etc), and anything that uses “pubsub” in its description</li>
<li>libcurl and wininet (stuff like <a href="http://nick.typepad.com/blog/2006/06/microsoft_pleas.html">Nick Bradbury&#8217;s description of a CPU spike in WinInet that can be triggered by chunked-encoding</a> is gold)</li>
</ul>
<p>All of these and more are linked to from the <a href="http://www.spiteful.com/dd/wishlist">WishList</a> page.  </p>
<p>Can you guys help me out?  I’ve got enough people subscribed to this feed that I’m certain at least one of you has used everything on my list.  If you take 10 minutes to write down your experiences, you can make the software world a better place.  To justify doing it on your company’s time, keep this in mind: if you document the fact that you are successfully using a solution, you increase the chance that other people will use it as well.  The more users a solution has, the the better it will become.  </p>
<p><a name="footer_1">[1]</a> &#8220;Dev Diligence&#8221; is of course a play on the term <a href="http://en.wikipedia.org/wiki/Due_diligence">&#8220;Due Diligence&#8221;</a>.  </p>
<ul class="promotions" style="padding: 0; list-style-image: none; list-style-type: none;"><h5 style="margin: 0pt; font-size:1.2em;">Would you like to...</h5><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Save this for later, <a href="http://www.spiteful.com/lets_bounce.php?p=61&key=delicious&f"><img src="http://www.spiteful.com/wp-content/plugins/sociable/images/delicious.small.gif"/> del.ici.ously?</a></li><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Help those web 2.0 valuations? <a href="http://www.spiteful.com/lets_bounce.php?p=61&key=reddit&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/reddit_tiny_2.png" /></a> <a href="http://www.spiteful.com/lets_bounce.php?p=61&key=digg&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/digg-it-tiny.gif"/></a></li></ul></ul><img src="http://feeds.feedburner.com/~r/Spiteful/~4/277855015" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spiteful.com/2008/04/25/dev-diligence-dont-invest-in-the-wrong-code/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Crashing When Something Feels Wrong</title>
		<link>http://www.spiteful.com/2008/04/14/crashing-when-something-feels-wrong/</link>
		<comments>http://www.spiteful.com/2008/04/14/crashing-when-something-feels-wrong/#comments</comments>
		<pubDate>Mon, 14 Apr 2008 19:47:37 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[Assertions]]></category>

		<category><![CDATA[FolderShare]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/?p=56</guid>
		<description><![CDATA[I’m sort of lazy, so I really like the idea of code that continually checks itself by using assertions.  I even like running production services with assertions turned on.  To be clear, I’m talking about assertions that check for actual bugs in your code – not assertions that socket() didn’t fail.  Still, [...]]]></description>
			<content:encoded><![CDATA[<p>I’m sort of lazy, so I really like the idea of code that continually checks itself by using assertions.  I even like running production services with assertions turned on.  To be clear, I’m talking about assertions that check for actual bugs in your code – not assertions that socket() didn’t fail.  Still, crashing production servers is a contentious issue, but sometimes (hopefully rarely) it is the best thing to do.  For something like FolderShare, crashing a server as soon as there is any hint of an error is vastly safer than possibly deleting someone’s files due to a bug.  Of course, this introduces the risk that you could have multiple servers fail in a short amount of time, but you need to design for that case anyway.  </p>
<p>I originally fell in love with assertions after reading Steve Maguire’s <em>Writing Solid Code</em> many years ago.  After I saw how helpful they could be, I started to structure my code to make it more “assertable.”  For example, I like state machines that use a table of valid transitions combined with assertions.  This prevents anything I don’t explicitly anticipate from happening and is really helpful in a networked, asynchronous world.  </p>
<p>But after developing a few long running services, I’ve started to perceive the need for a new type of assertion that is a bit higher level than a single conditional.  I want something that will let me crash a service (and save a memory dump) when something “feels” wrong.  For the moment, I call these “probabilistic assertions,” and I would have slept better while running FolderShare if I’d implemented them then.</p>
<p>Like all synchronization software, FolderShare had a few nightmare scenarios that I worried about all the time.  If, due to a bug, the service told every client that its files were all deleted, I’d probably have to read a lot of nasty blog posts, and I would feel like crap for a few years.  I would much rather crash all my servers and debug the problem with the system offline than take a chance on pissing off so many people.</p>
<p>Anyway, a normal assertion that looks at a single conditional wouldn’t help in this case.  Asserting that no files have been deleted when a client connects doesn’t make any sense.  But for a large enough sample size, I could assert that 80% of clients that connect shouldn’t have any files deleted.  And I could probably assert that 95% of clients that connect shouldn’t have all their files deleted.  </p>
<p>Functions like these cover most of what I’m thinking about:</p>
<p><img src="http://www.spiteful.com/wp-content/uploads/2008/04/prob_assertions.png" alt="" title="prob_assertions" width="497" height="132" class="aligncenter size-full wp-image-58" /></p>
<p>Depending on your application, these functions could be implemented either as assertions or as triggers to alert an administrator.  For alerts, you could really take this to another level by checking that incoming events are following a certain distribution for example.  Granted that it’s a bit over the top, but I can’t help but imagine checking if a stream of incoming events obeys a Poisson distribution or if the sizes of certain data are following a Gaussian distribution.  </p>
<p>Has anyone seen anything like this in the wild?  I’d love to hear about it.</p>
<ul class="promotions" style="padding: 0; list-style-image: none; list-style-type: none;"><h5 style="margin: 0pt; font-size:1.2em;">Would you like to...</h5><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Save this for later, <a href="http://www.spiteful.com/lets_bounce.php?p=56&key=delicious&f"><img src="http://www.spiteful.com/wp-content/plugins/sociable/images/delicious.small.gif"/> del.ici.ously?</a></li><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Help those web 2.0 valuations? <a href="http://www.spiteful.com/lets_bounce.php?p=56&key=reddit&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/reddit_tiny_2.png" /></a> <a href="http://www.spiteful.com/lets_bounce.php?p=56&key=digg&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/digg-it-tiny.gif"/></a></li></ul></ul><img src="http://feeds.feedburner.com/~r/Spiteful/~4/270205934" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spiteful.com/2008/04/14/crashing-when-something-feels-wrong/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Housekeeping</title>
		<link>http://www.spiteful.com/2008/04/14/housekeeping/</link>
		<comments>http://www.spiteful.com/2008/04/14/housekeeping/#comments</comments>
		<pubDate>Mon, 14 Apr 2008 19:43:33 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/?p=55</guid>
		<description><![CDATA[I don&#8217;t plan on having many non-technical posts here, but I&#8217;m breaking my rule today for a good reason.  I&#8217;ve got a kid now!  My first child, Margot Lee Kleinpeter, was born about 10 days ago.  Between a long, drawn out labor, a few nights on a hospital couch, and fatherhood in [...]]]></description>
			<content:encoded><![CDATA[<p>I don&#8217;t plan on having many non-technical posts here, but I&#8217;m breaking my rule today for a good reason.  I&#8217;ve got a kid now!  My first child, Margot Lee Kleinpeter, was born about 10 days ago.  Between a long, drawn out labor, a few nights on a hospital couch, and fatherhood in general, I&#8217;ve fallen a bit behind on publishing.  Much to my surprise, Margot prefers clean diapers and songs to essays on startups and programming.  But, I&#8217;ve got a new post for today and I&#8217;ll hopefully be back on a more normal schedule soon.  In the meantime, enjoy this picture of her sleeping:</p>
<p><img src="http://www.spiteful.com/wp-content/uploads/2008/04/margot.jpg" alt="" title="margot" width="500" height="333" class="aligncenter size-full wp-image-60" /></p>
<ul class="promotions" style="padding: 0; list-style-image: none; list-style-type: none;"><h5 style="margin: 0pt; font-size:1.2em;">Would you like to...</h5><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Save this for later, <a href="http://www.spiteful.com/lets_bounce.php?p=55&key=delicious&f"><img src="http://www.spiteful.com/wp-content/plugins/sociable/images/delicious.small.gif"/> del.ici.ously?</a></li><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Help those web 2.0 valuations? <a href="http://www.spiteful.com/lets_bounce.php?p=55&key=reddit&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/reddit_tiny_2.png" /></a> <a href="http://www.spiteful.com/lets_bounce.php?p=55&key=digg&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/digg-it-tiny.gif"/></a></li></ul></ul><img src="http://feeds.feedburner.com/~r/Spiteful/~4/270203269" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spiteful.com/2008/04/14/housekeeping/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Things That Are Important: Where Clauses</title>
		<link>http://www.spiteful.com/2008/03/24/things-that-are-important-where-clauses/</link>
		<comments>http://www.spiteful.com/2008/03/24/things-that-are-important-where-clauses/#comments</comments>
		<pubDate>Mon, 24 Mar 2008 20:03:45 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
		<category><![CDATA[Audiogalaxy]]></category>

		<category><![CDATA[MySQL]]></category>

		<category><![CDATA[Startup Lessons]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/2008/03/24/things-that-are-important-where-clauses/</guid>
		<description><![CDATA[When you are running a distributed service in a datacenter, you encounter a lot of interesting problems.  At Audiogalaxy, I ran into all the standard application level bugs, crashes, and race conditions.  Once we had a certain number of machines, we even had to deal with flaky memory, disks, and networking cards.  [...]]]></description>
			<content:encoded><![CDATA[<p><img style="margin: 0pt 0pt 2px 7px; padding: 4px; float: right; display: inline;"  src='http://www.spiteful.com/wp-content/uploads/2008/03/quake-3-bones.jpg' alt='quake-3-bones.jpg' />When you are running a distributed service in a datacenter, you encounter a lot of interesting problems.  At Audiogalaxy, I ran into all the standard application level bugs, crashes, and race conditions.  Once we had a certain number of machines, we even had to deal with flaky memory, disks, and networking cards.  But all of that was pretty typical compared to the weirdest bug I ever had to deal with – the one that was caused by Quake III Arena.  </p>
<p>Audiogalaxy had a small client that simply handled the P2P transfers and a complicated website for everything else, including account settings.  One of the adjustable account settings on the website was the “max number of transfers.”  To encourage users to send as much as they received, we only gave them a single number for this setting.  With a value of 1, a Satellite would only send a single file at a time, but it could only download one file a time as well.</p>
<p>Things were not so simple on the back-end.  For better or for worse, I had designed some flexibility into the system.  The max transfers value was actually stored in two columns in the Users table – MaxSend and MaxRecv.  The back-end – the part that actually looked at these values when it was setting up transfers–had no idea these columns were linked.  The front-end enforced what went into the database, and the back-end obeyed it.  Whenever the Satellite reconnected to the cloud, our server would read the value out of the database and store it in memory for the duration of that connection.  </p>
<p>Of course, somewhere between the frontend and the backend is mysqlclient, but I’ll get to that in a moment.  </p>
<p>Quake III Arena was my game of choice at the time I worked for AG.  We had a few developers that also enjoyed the game, and it was common to find people staying late on the weekend to take advantage of our nice internet connection.  Unfortunately, our nice internet connection had a dozen people running our p2p music sharing client on one side, so it would periodically slow down when someone’s computer started blasting a file out at high speed.  These slowdowns drove us crazy, particularly when they prevented us from using the game’s rail gun effectively.</p>
<p>Good developers like to fix problems, and developers at startups also tend to have access to the database.  So, you can probably imagine what a developer might do.  And if you know a little bit about SQL, you can also imagine what might go horribly wrong.  I never found out who issued the bad query, but I can just imagine how it played out:</p>
<blockquote><p>Hey, I&#8217;ve got an idea about how we can keep the games from lagging tonight.  I’ll just block everyone in the office from sending files.  One simple &#8216;Update Users set MaxSend = 0&#8242; and we should be good to go for the evening…  Why is that query taking so long?  Uh oh&#8230;</p></blockquote>
<p>SQL is good for a lot of things, but I’ve always marveled at how easy it is to destroy an entire table simply by forgetting a where clause.  And thus, in a few short minutes, every one of our 30 million users had a subtle change applied to their accounts.  Did I mention that the single value we displayed on the website for this setting came from the MaxRecv column?  Whoops…</p>
<p>Monitoring the health of the system was one of my jobs, so I kept a close eye on my graph of the “current transfer rate.”  Ultimately, most problems in the system resulted in less files getting transferred, so the global transfer rate was a good proxy for the health of the system.  </p>
<p>Every day of the week plotted a unique and predictable curve that I knew by heart, and so it didn’t take me long to realize that something was wrong.  Transfer rates were dropping.  But why?  I called our ISP and asked if they knew of any problems with the Internet.  Nope.  We had exactly the right number of clients connected.  No one had trenched over a fiber optic cable in the middle of nowhere.  Requests were coming into the system at the normal rate; they just weren’t getting fulfilled.  Microsoft hadn’t pushed any patches out that might have firewalled off half the world. </p>
<p>Clients generally stayed connected for days or weeks at a time.  As they gradually reconnected, more and more of the network got their new MaxSend setting and dutifully started not sending anything. Users weren’t complaining – it was perfectly normal for rare songs to be inaccessible, and nobody noticed if his client just wasn’t sending anything.  </p>
<p>After tearing my hair out for a day or so about this, I finally realized I was seeing a lot more “client busy – no free slots” type messages than I usually did while tail –f’ing the log files. Digging into that, I noticed some other funny messages, and eventually I was staring in shock at the results of a “select MaxSend, MaxRecv from Users limit 1000.”  </p>
<p>Fixing the problem was easy enough: &#8220;Update Users set MaxSend = MaxRecv,&#8221; but you can imagine I spent quite some time staring at that query before issuing it.  </p>
<p><img style="margin: 0pt 0pt 2px 7px; padding: 4px; float: right; display: inline;"  src='http://www.spiteful.com/wp-content/uploads/2008/03/mysql-i-am-a-dummy.png' alt='mysql-i-am-a-dummy' />So what&#8217;s the moral of the story?  Don&#8217;t let your developers have access to the production database?  Maybe, but that isn’t practical for a small startup.  Better logging?  That certainly could help.  Force everyone to access the database using the &#8211;i-am-a-dummy flag for MySQL?  That is not a bad idea and will get you some of the way there, but a shoddily written script can do exactly the same kind of damage.  Backups?  Sure, we had backups, but we were adding customers so quickly that restoring data more than a few hours old would have pissed off many thousands of people.  An Admin class of users, with configurable policy that prevented them from sending files between 7pm and 3am on weekends?   Yeah, right.  </p>
<p>If you run a big and complicated system, problems you will never predict are going to happen and cause your system to do impossibly weird things you don&#8217;t expect.  You must invest in tools to give you visibility into your system.  My transfer rate graph was the only reason I was even able to go looking for a problem.  I knew something was wrong, and it was just a matter of digging until I found it.  Let your admins see into the system (specifically – how the system is behaving right now) so that they can develop intuition about what it should look like.  Finding a bug in production is never fun.  But it is going to happen, and it is always better if you find it before your users do.</p>
<ul class="promotions" style="padding: 0; list-style-image: none; list-style-type: none;"><h5 style="margin: 0pt; font-size:1.2em;">Would you like to...</h5><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Save this for later, <a href="http://www.spiteful.com/lets_bounce.php?p=47&key=delicious&f"><img src="http://www.spiteful.com/wp-content/plugins/sociable/images/delicious.small.gif"/> del.ici.ously?</a></li><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Help those web 2.0 valuations? <a href="http://www.spiteful.com/lets_bounce.php?p=47&key=reddit&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/reddit_tiny_2.png" /></a> <a href="http://www.spiteful.com/lets_bounce.php?p=47&key=digg&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/digg-it-tiny.gif"/></a></li></ul></ul><img src="http://feeds.feedburner.com/~r/Spiteful/~4/257222199" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spiteful.com/2008/03/24/things-that-are-important-where-clauses/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Programmer’s Toolbox Part 3: Consistent Hashing</title>
		<link>http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/</link>
		<comments>http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/#comments</comments>
		<pubDate>Mon, 17 Mar 2008 20:32:11 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
		<category><![CDATA[Toolbox]]></category>

		<category><![CDATA[consistent hashing]]></category>

		<category><![CDATA[dynamo]]></category>

		<category><![CDATA[memcached]]></category>

		<category><![CDATA[partitioning]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/</guid>
		<description><![CDATA[Next up in the toolbox series is an idea so good it deserves an entire article all to itself: consistent hashing.  
Let’s say you&#8217;re a hot startup and your database is starting to slow down.  You decide to cache some results so that you can render web pages more quickly.  If you [...]]]></description>
			<content:encoded><![CDATA[<p>Next up in the toolbox series is an idea so good it deserves an entire article all to itself: consistent hashing.  </p>
<p>Let’s say you&#8217;re a hot startup and your database is starting to slow down.  You decide to cache some results so that you can render web pages more quickly.  If you want your cache to use multiple servers (scale horizontally, in the biz), you’ll need some way of picking the right server for a particular key.  If you only have 5 to 10 minutes allocated for this problem on your development schedule, you’ll end up using what is known as the naïve solution: put your N server IPs in an array and pick one using key % N. </p>
<p>I kid, I kid &#8212; I know you don&#8217;t have a development schedule.  That&#8217;s OK.  You&#8217;re a startup.</p>
<p>Anyway, this ultra simple solution has some nice characteristics and may be the right thing to do.  But your first major problem with it is that as soon as you add a server and change N, most of your cache will become invalid.  Your databases will wail and gnash their teeth as practically everything has to be pulled out of the DB and stuck back into the cache.  If you’ve got a popular site, what this really means is that someone is going to have to wait until 3am to add servers because that is the only time you can handle having a busted cache.  Poor Asia and Europe &#8212; always getting screwed by late night server administration.</p>
<p>You’ll have a second problem if your cache is read-through or you have some sort of processing occurring alongside your cached data.  What happens if one of your cache servers fails?  Do you just fail the requests that should have used that server?  Do you dynamically change N?  In either case, I recommend you save the angriest twitters about your site being down.  One day you&#8217;ll look back and laugh.  One day.</p>
<p>As I said, though, that might be OK.  You may be trying to crank this whole project out over the weekend and simply not have time for a better solution.  That is how I wrote the caching layer for Audiogalaxy searches, and that turned out OK.  The caching part, at least.  But if had known about it at the time, I would have started with a simple version of consistent hashing.  It isn’t that much more complicated to implement and it gives you a lot of flexibility down the road. </p>
<p>The technical aspects of consistent hashing have been well explained in other places, and you’re crazy and negligent if you use this as your only reference.  But, I’ll try to do my best.  Consistent hashing is a technique that lets you smoothly handle these problems:</p>
<ul>
<li>Given a resource key and a list of servers, how do you find a primary, second, tertiary (and on down the line) server for the resource?</li>
<li>If you have different size servers, how do you assign each of them an amount of work that corresponds to their capacity?</li>
<li>How do you smoothly add capacity to the system without downtime?  Specifically, this means solving two problems:
<ul>
<li>How do you avoid dumping 1/N of the total load on a new server as soon as you turn it on?</li>
<li>How do you avoid rehashing more existing keys than necessary?</li>
</ul>
</li>
</ul>
<p>In a nutshell, here is how it works.  Imagine a 64-bit space.  For bonus points, visualize it as a ring, or a clock face.  Sure, this will make it more complicated when you try to explain it to your boss, but bear with me:</p>
<p><center><img src='http://www.spiteful.com/wp-content/uploads/2008/03/consistent_hashing_simple.png' alt='consistent_hashing_simple.png' /></center></p>
<p>That part isn’t very complicated.  </p>
<p>Now imagine hashing resources into points on the circle. They could be URLs, GUIDs, integer IDs, or any arbitrary sequence of bytes.  Just run them through MD5 or SHA and shave off everything but 8 bytes (and if anyone tells you that you shouldn’t use MD5 for this because it isn’t secure, just nod and back away slowly.  You have identified someone not worth arguing with).  Now, take those freshly minted 64-bit numbers and stick them onto the circle:</p>
<p><center><img src='http://www.spiteful.com/wp-content/uploads/2008/03/consistent_hashing_resources.png' alt='consistent_hashing_resources.png' /></center></p>
<p>Finally, imagine your servers.  Imagine that you take your first server and create a string by appending the number 1 to its IP. Let&#8217;s call that string IP1-1.  Next, imagine you have a second server that has twice as much memory as server 1.  Start with server #2’s IP, and create 2 strings from it by appending 1 for the first one and 2 for the second one.  Call those strings IP2-1 and IP2-2.  Finally, imagine you have a third server that is exactly the same as your first server, and create the string IP3-1.  Now, take all those strings, hash them into 64-bit numbers, and stick them on the circle with your resources:</p>
<p><center><img src='http://www.spiteful.com/wp-content/uploads/2008/03/consistent_hashing_full.png' alt='consistent_hashing_full.png' /></center></p>
<p>Can you see where this is headed?  You have just solved the problem of which server to use for resource A.  You start where resource A is and head clockwise on the ring until you hit a server.  If that server is down, you go to the next one, and so on and so forth.  In practice, you’ll want to use more than 1 or 2 points for each server, but I’ll leave those details as an exercise for you, dear reader.</p>
<p>Now, allow me to use bullet points to explain how cool this is:</p>
<ul>
<li>Assuming you’ve used a lot more than 1 point per server, when one server goes down, every other server will get a share of the new load.  In the case above, imagine what happens when server #2 goes down.  Resource A shifts to server #1, and resource B shifts to server #3 (Note that this won’t help if all of your servers are already at 100% capacity.  Call your VC and ask for more funding).</li>
<li>You can tune the amount of load you send to each server based on that server’s capacity.  Imagine this spatially – more points for a server means it covers more of the ring and is more likely to get more resources assigned to it.
<p>You could have a process try to tune this load dynamically, but be aware that you&#8217;ll be stepping close to problems that control theory was built to solve.  Control theory is more complicated than consistent hashing.</li>
<li>If you store your server list in a database (2 columns: IP address and number of points), you can bring servers online slowly by gradually increasing the number of points they use.  This is particularly important for services that are disk bound and need time for the kernel to fill up its caches.  This is one way to deal with the datacenter variant of the <a href="http://en.wikipedia.org/wiki/Thundering_herd_problem">Thundering Herd Problem</a>.
<p>Here I go again with the control theory &#8212; you <em>could</em> do this automatically.  But adding capacity usually happens so rarely that just having somebody sitting there watching top and running SQL updates is probably fine.  Of course, EC2 changes everything, so maybe you’ll be hitting the books after all.</li>
<li>If you are really clever, when everything is running smoothly you can go ahead and pay the cost of storing items on both their primary and secondary cache servers.  That way, when one server goes down, you’ve probably got a backup cache ready to go.</li>
</ul>
<p>Pretty cool, eh?</p>
<p>I want to hammer on point #4 for a minute.  If you are building a big system, you really need to consider what happens when machines fail.  If the answer is “we crush the databases,” congratulations: you will get to observe a cascading failure.  I love this stuff, so hearing about cascading failures makes me smile.  But it won’t have the same effect on your users.  </p>
<p>Finally, you may not know this, but you use consistent hashing every time you put something in your cart at Amazon.com.  Their <a href="http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf">massively scalable data store, Dynamo</a>, uses this technique.  Or if you use Last.fm, you’ve used a great combination: consistent hashing + memcached.  They were kind enough to <a href="http://www.last.fm/user/RJ/journal/2007/04/10/392555/">release their changes</a>, so if you are using memcached, you can just use their code without dealing with these messy details.  But keep in mind that there are more applications to this idea than just simple caching.  Consistent hashing is a powerful idea for anyone building services that have to scale across a group of computers.</p>
<p>A few more links:</p>
<ul>
<li><a href="http://www.lexemetech.com/2007/11/consistent-hashing.html">Another blog post about consistent hashing</a></li>
<li><a href="http://citeseer.ist.psu.edu/karger97consistent.html">The original paper</a></li>
</ul>
<ul class="promotions" style="padding: 0; list-style-image: none; list-style-type: none;"><h5 style="margin: 0pt; font-size:1.2em;">Would you like to...</h5><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Save this for later, <a href="http://www.spiteful.com/lets_bounce.php?p=43&key=delicious&f"><img src="http://www.spiteful.com/wp-content/plugins/sociable/images/delicious.small.gif"/> del.ici.ously?</a></li><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Help those web 2.0 valuations? <a href="http://www.spiteful.com/lets_bounce.php?p=43&key=reddit&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/reddit_tiny_2.png" /></a> <a href="http://www.spiteful.com/lets_bounce.php?p=43&key=digg&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/digg-it-tiny.gif"/></a></li></ul></ul><img src="http://feeds.feedburner.com/~r/Spiteful/~4/253215320" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/feed/</wfw:commentRss>
		</item>
		<item>
		<title>The Lawsuit, Shutdown, and Aftermath</title>
		<link>http://www.spiteful.com/2008/03/14/the-lawsuit-shutdown-and-aftermath/</link>
		<comments>http://www.spiteful.com/2008/03/14/the-lawsuit-shutdown-and-aftermath/#comments</comments>
		<pubDate>Fri, 14 Mar 2008 19:04:22 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
		<category><![CDATA[Audiogalaxy]]></category>

		<category><![CDATA[FolderShare]]></category>

		<category><![CDATA[History]]></category>

		<category><![CDATA[RIAA]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/2008/03/14/the-lawsuit-shutdown-and-aftermath/</guid>
		<description><![CDATA[This is the final article in my three part series about building Audiogalaxy.  If you haven&#8217;t already, you may want to read part one and part two first.
2002
By 2002, things were going along quite nicely from a technology standpoint.  Running the C backend on the hardware we had purchased for the Java version [...]]]></description>
			<content:encoded><![CDATA[<p><em>This is the final article in my three part series about building Audiogalaxy.  If you haven&#8217;t already, you may want to read <a href="http://www.spiteful.com/2008/03/11/always-refer-to-your-v1-as-a-prototype/">part one</a> and <a href="http://www.spiteful.com/2008/03/13/users-with-a-tattoo-of-your-logo-check/">part two</a> first.</em></p>
<p><strong>2002</strong><br />
By 2002, things were going along quite nicely from a technology standpoint.  Running the C backend on the hardware we had purchased for the Java version meant we were very much below our capacity, and we made it past 1 million simultaneous clients.  I was pleased that I had learned the right lessons from V1 and designed a solid architecture.  </p>
<p>I was also feeling good about the other major service I looked after &#8212; search.  Audiogalaxy had sponsored my senior engineering project in 2000 &#8212; a high performance MySQL indexing service.  That, combined with a caching service I built in 2001, had solved the search problem.  We were handling 80 million page views every day, most of them searches.</p>
<p>But having some of the scalability problems taken care of didn&#8217;t mean I was taking it easy.  We were getting more and more DMCA takedown notices, and I&#8217;d been working on tools to identify all variations of matching songs for a long time. Michael went to Washington, DC to talk to a group of RIAA folks, and I ended up on a conference call.  Their tech guy, Dan Farmer (yes, <a href="http://www.svn.net/datamonk/sjmn.05.04.95.html">that</a> <a href="http://en.wikipedia.org/wiki/Dan_Farmer">Dan Farmer</a>), grilled me on the details of the filtering system.</p>
<p>The filtering system was the most frustrating piece of tech I&#8217;ve ever worked on.  Audiogalaxy was amazingly effective at distributing rare music, and so no matter how many variations on artist and song names we blocked, things kept slipping by.  I tried every trick that I could think of and blocked huge numbers of songs, but it was like trying to hold back the tide.</p>
<p>Eventually, the RIAA decided we weren&#8217;t doing enough.  On May 24th, a gruff and tattooed messenger walked into the office with The Envelope, and that was that.  Michael called in the lawyers and spent weeks trying to find a solution.  In the end, he decided that settling it as quickly as possible was the best strategy. </p>
<p>I shut Audiogalaxy down from my apartment on the afternoon of June 17, 2002.  It was a beautiful, Austin summer day.  I knew what was coming, so I had stayed home to enjoy the sunshine.  Mid afternoon, my cell phone rang, and David gave me the word to shut it down. </p>
<p>I typed out the command, paused for just a moment, and hit Enter.  </p>
<p>For the amazing amount of effort that we put into building the system, destroying it was anti-climatic.  The script I ran pushed a new configuration setting to several hundred servers and returned seconds later.  The transfers that were currently active were allowed to finish, and then everything stopped.  It was a moment I won&#8217;t forget, and I regret that I didn&#8217;t do it with the rest of team in the same room.</p>
<p>Over the next few weeks, I tried to enjoy being free from the stress of maintaining such a big operation 24/7, but I couldn&#8217;t shake the feeling that at 24, my career had already peaked.  I didn&#8217;t really appreciate the experience I had gained at the time, and I was mostly consumed with the thought that I would never again have so many users love my software.  I had no idea what I would do next. </p>
<p>Most of the staff was laid off shortly after that.  After our traffic plummeted, Geoff and I spent a week unracking hundreds of servers from the data center in north Austin and driving them down I35 to the office.  The company partnered with Rhapsody to offer a music service, and we tried to think of other things we could do with Audiogalaxy.  But eventually, we decided it was time to move on.</p>
<p><img src='http://www.spiteful.com/wp-content/uploads/2008/03/unracked_servers.jpg' alt='unracked_servers.jpg' /></p>
<p>Every cloud has a silver lining, though.  During the lawsuit, Michael spent a lot of time working from home while collaborating on documents with the law firm.  Keeping up-to-date versions of the documents on his home and work computers was annoying.  He knew that as more and more people worked with multiple computers in their daily lives, that this would become a common problem that needed a simple technical fix.  And thus the idea for our next startup, FolderShare, was born.</p>
<ul class="promotions" style="padding: 0; list-style-image: none; list-style-type: none;"><h5 style="margin: 0pt; font-size:1.2em;">Would you like to...</h5><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Save this for later, <a href="http://www.spiteful.com/lets_bounce.php?p=36&key=delicious&f"><img src="http://www.spiteful.com/wp-content/plugins/sociable/images/delicious.small.gif"/> del.ici.ously?</a></li><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Help those web 2.0 valuations? <a href="http://www.spiteful.com/lets_bounce.php?p=36&key=reddit&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/reddit_tiny_2.png" /></a> <a href="http://www.spiteful.com/lets_bounce.php?p=36&key=digg&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/digg-it-tiny.gif"/></a></li></ul></ul><img src="http://feeds.feedburner.com/~r/Spiteful/~4/251565984" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spiteful.com/2008/03/14/the-lawsuit-shutdown-and-aftermath/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Users With a Tattoo of Your Logo?  Check.</title>
		<link>http://www.spiteful.com/2008/03/13/users-with-a-tattoo-of-your-logo-check/</link>
		<comments>http://www.spiteful.com/2008/03/13/users-with-a-tattoo-of-your-logo-check/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 19:00:19 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
		<category><![CDATA[Audiogalaxy]]></category>

		<category><![CDATA[/dev/epoll]]></category>

		<category><![CDATA[History]]></category>

		<category><![CDATA[startups]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/2008/03/13/users-with-a-tattoo-of-your-logo-check/</guid>
		<description><![CDATA[This is the second article in my three part history about building Audiogalaxy.com. You should probably read the first one first.
2000
I came back from my Christmas break feeling less burnt out. I focused on designing a backend that could handle 100,000 simultaneous Satellites and then started building it.  To free me up from working [...]]]></description>
			<content:encoded><![CDATA[<p><em>This is the second article in my three part history about building Audiogalaxy.com. You should probably <a href="http://www.spiteful.com/2008/03/11/always-refer-to-your-v1-as-a-prototype/">read the first one</a> first.</em></p>
<p><strong>2000</strong><br />
I came back from my Christmas break feeling less burnt out. I focused on designing a backend that could handle 100,000 simultaneous Satellites and then started building it.  To free me up from working on the client, Michael bought a copy of the Steven&#8217;s networking book and started working on a C version of the Satellite core.  And David hired <a href="http://www.kuro5hin.org/story/2002/6/21/171321/675">Kennon Ballou</a> to help build the next-generation web interface.</p>
<p>The new backend and client went live in April, using my humble website from V1.  Traffic started growing steadily, and by the end of May, we had about 3,000 clients connected at peak times.  Sometime around the end of July, there was a Napster injunction scare, which pushed us over the 8,000 mark.  We released version 0.6 of the client and David’s beautiful new website in September.  At that point, our peak load started increasing by thousands of users every week. </p>
<p>In the way that only 22-year-old unattached guys can do, I revolved my life around work.  The kitchen was full of free food, and if I didn&#8217;t eat there, I ate lunch and dinner with Michael or anyone else that happened to be around.  I went to the gym in the evening with Michael and Geoff and then back to the office.  When things got too intense, the distractions of 6th Street were a short walk away.  I remember going back to the office to work on low-priority stuff while I sobered up before driving home.  At my apartment, my furniture was limited to a twin mattress, a chair, and a folding table for my computer.  Every week we had more users, and every time something in the cloud melted down and we had to scramble to fix it, our crew became a little bit more tightly knit.  And no, I didn&#8217;t date very much.  </p>
<p>This was the year that we figured out how to manage a huge website with a tiny team.  Geoff handled the hardware, I handled any services I had written and did a little bit of MySQL work, and Michael did everything else: he built Linux kernels, handled Apache and the load balancers, and dealt with any tricky aspects of MySQL.  We had a set of internal pages for monitoring the site, and we checked them as soon as we got out of bed, reflexively throughout the day, and right before going to sleep.  We all had weird schedules, so we managed to cover a good chunk of every day, and it was always fair game to wake someone else up at 4am to diagnose a problem. This wasn’t really designed or planned – it just happened because it needed to be done.</p>
<p>The excitement about how quickly we were scaling up was tempered in December.  We had an amazing advertising contract for a flat $5 CPM.  Eventually, they realized how much money they owed us and decided to just not pay us.  Michael had to let a few reviewers go and borrow $75,000 to make payroll.  We all got a lesson about how low the startup roller coaster can go.</p>
<p><strong>2001</strong><br />
Thanks to the lessons we learned from V1, the hardware, rather than the software, became the bottleneck for V2 of the Satellite service.  We quickly surpassed our goal of 100,000 simultaneous users and realized that target was missing at least one zero.  We watched graphs of how many users were online, but instead of seeing beautiful diurnal curves, we saw harsh, flat caps as we bumped into capacity limits. Due to limitations of Java at the time, each server could only handle 2000 simultaneous users, and our users let us know this was a problem. Michael kept a fax with &#8220;Buy A New Server!&#8221; scrawled across it hanging up in his office. </p>
<p>We definitely tried.  And we definitely bought a lot of servers.  Every few weeks we would get a batch of parts delivered to the datacenter.  Geoff would drive out and spend 24 highly caffeinated hours assembling and racking 20 new servers with a manual screwdriver that often left his hands bloody.  But it seemed like we could never keep up.  We filled out one entire row at the datacenter and started working on a second:</p>
<p><center><img src='http://www.spiteful.com/wp-content/uploads/2008/03/datacenter_work.jpg' alt='datacenter_work.jpg' /></center></p>
<p>In the fall of 2001 I read about /dev/epoll and decided it was the key to a vastly more scalable backend written in C.  We hired <a href="http://awesame.org/">Steven Hazel</a>, a talented and practical developer from the FreeNet project to help with the project, and I knew we had the right guy when he asked how we were going to avoid <a href="http://en.wikipedia.org/wiki/Second-system_effect">Second System Syndrome</a> during the port.</p>
<p>Despite the difficulties of converting a lot of the string processing we did from Java to C, testing the new version was pretty easy.  We loaded the service down with assertions and rigid state machines, and simply ran a single instance in the cluster.  The flood of traffic would initially drive the process into an assertion within seconds.  </p>
<p>As we worked through the bugs, the uptime turned into minutes, then hours, and eventually the service virtually never crashed.  With hundreds of instances deployed, we got so much traffic that we were able to remove all the bugs we were likely to run into.  We had one or two machines that would crash every month or two with inscrutable core files.  Because it was always the same machine, I eventually attributed this to faulty memory.  The idea that you could write software that was more reliable than hardware was fascinating to me. </p>
<p>In fact, almost everything about the scale of the software fascinated me.  I found that a system with hundreds of thousands of clients and thousands of events per seconds behaved like a physical machine built out of springs and weights.  If one server process stalled for a moment, effects could ripple throughout the cluster.  Sometimes, it seemed like there were physical oscillations – huge bursts of traffic would cause everything to slow down and back off, and then things would recover enough to trigger another burst of load.  I had never even imagined that these sorts of problems existed in the software world, and I found myself wishing I had taken control theory more seriously in college. </p>
<p>Keeping up with the traffic at this time was difficult, but in retrospect, it was really a lot of fun.  I had graduated from UT in December of 2000 and moved downtown within walking distance of both 6th Street and the office.  I spent the summer on a completely nocturnal cycle, partially because of the Texas heat, but mainly because restarting services was easier at 3 in the morning.  I was tired of staying up late to deploy new code, so I just changed my schedule.  Audiogalaxy users had led me to a set of live trance mixes from clubs in Europe, which turned me into a diehard electronica fan, and driving around Texas to catch DJs on the weekend was much easier if staying up until 8am was normal.  I bought some turntables and a lot of vinyl.  And a couch.  I loved the light in my apartment when I got home in the morning.</p>
<p><center><img src='http://www.spiteful.com/wp-content/uploads/2008/03/downtown_apartment.jpg' alt='downtown_apartment.jpg' /></center></p>
<p>Audiogalaxy had gotten big enough at this point that it was not uncommon to overhear strangers talking about it.  Meeting people in the wild that loved the company was one of the best perks of working at the company.  The community-based features of the site had also really taken off and eighty thousand messages were posted per day at our peak.  A couple met on our message boards and got married shortly after that.  A community of several hundred spinning instructors formed to share ideas about music selection during class.  Several years later, a die-hard message board fan got the AG logo tattooed onto his leg:</p>
<p><center><img src='http://www.spiteful.com/wp-content/uploads/2008/03/agtat.jpg' alt='Yup, that is an Audiogalaxy Tattoo' /></center></p>
<p>At some point in 2001, I focused my problem solving skills on why I didn&#8217;t have a girlfriend.  Perhaps, I thought, it had something to do with the fact that I had built my wardrobe of jeans and identical gray t-shirts on principals of efficiency rather than good taste.  I put a small amount of effort into this by upgrading from Doc Marten boots to Doc Marten dress shoes, and I did it all just in time.  A few months later, a beautiful, smart, and fascinating friend of Michael and Geoff&#8217;s from high school moved to Austin and wanted to meet some people.  Jeanie tells me that even though she thought I was cute from the beginning, she wasn&#8217;t that interested until she found out I had written the backend for the Satellite Service.  I think it was the new shoes.  She and I started dating and got married in 2004. </p>
<p>Life was hectic but good.  We were having so much success it was almost like something bad was about to happen.  </p>
<p><em>Continued in <a href="http://www.spiteful.com/2008/03/14/the-lawsuit-shutdown-and-aftermath/">Part 3</a></em></p>
<ul class="promotions" style="padding: 0; list-style-image: none; list-style-type: none;"><h5 style="margin: 0pt; font-size:1.2em;">Would you like to...</h5><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Save this for later, <a href="http://www.spiteful.com/lets_bounce.php?p=35&key=delicious&f"><img src="http://www.spiteful.com/wp-content/plugins/sociable/images/delicious.small.gif"/> del.ici.ously?</a></li><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Help those web 2.0 valuations? <a href="http://www.spiteful.com/lets_bounce.php?p=35&key=reddit&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/reddit_tiny_2.png" /></a> <a href="http://www.spiteful.com/lets_bounce.php?p=35&key=digg&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/digg-it-tiny.gif"/></a></li></ul></ul><img src="http://feeds.feedburner.com/~r/Spiteful/~4/250929181" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spiteful.com/2008/03/13/users-with-a-tattoo-of-your-logo-check/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Always Refer to Your V1 As a Prototype</title>
		<link>http://www.spiteful.com/2008/03/11/always-refer-to-your-v1-as-a-prototype/</link>
		<comments>http://www.spiteful.com/2008/03/11/always-refer-to-your-v1-as-a-prototype/#comments</comments>
		<pubDate>Tue, 11 Mar 2008 15:22:43 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
		<category><![CDATA[Audiogalaxy]]></category>

		<category><![CDATA[History]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/2008/03/11/always-refer-to-your-v1-as-a-prototype/</guid>
		<description><![CDATA[Reading all the SXSW press always makes me nostalgic for my time in Austin and the 3 years I spent working at Audiogalaxy.com, a music startup.  Here is a history of the company from the perspective of the first full-time programmer. This is the first of three parts.
Getting the Job
My career at Audiogalaxy got [...]]]></description>
			<content:encoded><![CDATA[<p><em>Reading all the SXSW press always makes me nostalgic for my time in Austin and the 3 years I spent working at Audiogalaxy.com, a music startup.  Here is a history of the company from the perspective of the first full-time programmer. This is the first of three parts.</em></p>
<p><strong>Getting the Job</strong><br />
My career at Audiogalaxy got started by chance one spring afternoon in 1999.  At the time, I was a computer-engineering student with 4 more classes to finish before I graduated that fall.  Like most of my friends, I was planning on taking an entry-level engineering job at one of the big companies in Austin, but things didn’t end up turning out that way.</p>
<p>I ran into Michael Merhej on the steps of the Electrical and Computer Engineering building and stopped to chat about the business I had heard he had started.  Michael and I had been lab partners in an electrical engineering class a few semesters before, and he had struck me as a high energy guy with an amazing capacity to focus on the things he was interested in.  We talked a little bit about Audiogalaxy, and he told me that they really needed some web developers.  A music website sounded a lot more fun than working for the engineering career center like I had done the previous year, so I told him I was interested. </p>
<p>My &#8220;interview&#8221; took place at Michael&#8217;s low-budget apartment that he shared with three other guys.  He had just finished shipping some new servers off to the datacenter, and a mix of computer component packaging and laundry were strewn about the apartment.  David McArthur, the other co-founder of AG, asked me a few questions about HTML, and then they offered me a part time contracting job as a web developer. </p>
<p>At that time, the company had two areas of interest.  The first was an FTP search engine that Michael had written the previous year.  Back in the stone age of MP3 distribution, FTP sites were the preferred technology.  FTP servers allowed admins to enforce upload/download ratios, and so running an FTP site was a fine way to increase the size of your own music collection.  Sites had different ratios, were up during different times of day, and were just plain hard to find. This created a market for information about them, and AG filled it.  Michael had written one of the leading FTP search engines the year before and was getting an impressive amount of traffic.  </p>
<p>The second part of the company targeted unsigned artists trying to get exposure.  Artists could upload their MP3s, and the staff would listen to the music, categorize it, and write a few blurbs.  Then, the music went on the site for anyone to download.  Originally, bands had to use FTP to upload their music, which was too difficult.  One of my first projects was &#8220;Web-FTP,” which provided an HTTP based system for uploading content and allowed for integration with the tools the music staff used to handle the music.</p>
<p><strong>Summer, 1999</strong><br />
I started working when finals ended.  During the summer, I worked part-time because I also had a full-time internship at Tivoli.  Tivoli, which had recently been acquired by IBM, had started a boot camp program to try to develop engineering talent.  It was a fun program, but more importantly, it wasn&#8217;t a very high stress job.  This left me with lots of time in the evenings and weekends to get my Audiogalaxy work done.</p>
<p>For my first month or two at AG, the company had no offices.  I worked from home for the first few weeks, and then David gave everyone a key to his house and set up some computers in the back.  Later in the summer, we moved into offices located on the less distracting part of 6th street.  By the end of the summer, I was signed up for the startup life.  I finished my internship at Tivoli, quit my campus job at UT, dropped 2 of the classes I was scheduled to take that fall, and started working for AG full time. </p>
<p>The salary was just OK, but that didn&#8217;t really matter because I had 55,000 stock options!  Once we hit it big and had our IPO, I knew I would make a lot of money.  That seems insane and naïve now, but with all the other crazy startups out there at the time, it made complete sense to 21-year-old Tom.  Overall, it seemed like an interesting opportunity, and the job market was so good back then that I knew I would have other options if the company folded.  Besides, working on a music website with friends in downtown Austin seemed like a pretty cool gig.  </p>
<p>At this point, there were 4 other full time employees who stayed with the company until the very end:</p>
<ul>
<li>Michael Merhej – Founder/CEO</li>
<li>David McArthur – Founder/handled the web site design and managed its development.  He and Michael had met working together at a computer lab on campus.</li>
<li>Eric Zappa – Managed all the music reviewers and had previously worked for a label.</li>
<li>Geoff Midler –Did everything from tech support to handling servers.  He had known Michael since high school.</li>
</ul>
<p>All of those folks also went on to work for FolderShare, but that is another story. </p>
<p>After I started working full-time, I wrapped up Web-FTP and ported the FTP search to PHP (Michael had originally written it as a C CGI program).  Around that time, we were frustrated with how few people were running FTP servers, so we started kicking around ideas about trying to distribute our own.  We wanted something that would take just a few clicks to install and would ping our servers so that we could index them.  We had some notes for this sketched out on the whiteboard when Michael downloaded a little program called Napster.</p>
<p><strong>Fall, 1999</strong><br />
Michael immediately knew that Napster was the future and that FTP sites were about to become a footnote.  Since there weren&#8217;t any other programmers on staff at the time, I was given the task of coding a competing system.  Without knowing it, I also signed up to be on call for data center problems 24/7 for the next 8 years. I hadn&#8217;t planned on living the life of a sysadmin, but the defining characteristic of startup life is doing whatever it takes to get the job done.  Michael drove me to the Sprint store and helped me pick out my very first cell phone the next month.</p>
<p>Napster was impressive, but we wanted to do things a little bit differently.  First, we wanted to keep users coming back to our website.  This was back in the day of high CPM deals, so the more website traffic we could get, the better.  Second, we wanted to avoid partitioning our network.  One problem with Napster was that you could only search for files shared by people connected to the same server as your client, which greatly reduced the availability of the music.</p>
<p>The project got started around September 24th, 1999.  I remember the date because that evening I drove out to Southpark Meadows with my friend John to catch a Phish concert.  I spent the time in the car talking about some of my ideas for the service, and I started hacking on it that weekend.</p>
<p>The learning curve to build a P2P client and the accompanying cloud services was steep.  I had worked with Java before, but I’d never built any real, shipping software with it.  But if I had learned anything in engineering school, it was the ability to work intense hours absorbing and applying new information.</p>
<p>It took a lot of 100-hour weeks, but by the end of the year we had shipped a flawed, early version of the Satellite.  I had created a piece of software that only scaled to a few thousand users, but I had also learned a lot of lessons.  More importantly, I had gotten a taste of how amazing it was to work on networked software that connected thousands of people and computers. With 2 classes left, I had also screwed up my 4.0 GPA by getting a C in a silly linear algebra class, but I had become so obsessed with the Satellite that I didn&#8217;t mind.  I had bigger problems at work.  </p>
<p>The client, written in Java and using JNI to show a native WIN32 UI, was flaky.  The server was massively bottlenecked on the database.  It took hours to start a transfer for a requested file, and we would get bug reports that said things like, &#8220;My computer&#8217;s clock runs faster when I&#8217;m using the Satellite.”  Sigh.  You really do have to plan to throw one away. </p>
<p>Before I went home for Christmas that year, Michael and I sat down.  We talked through what a service that didn&#8217;t use the database as much would look like, and we set a goal of building a service that could handle 100,000 simultaneous users.  A cluster of servers that passed messages between each other to coordinate file transfers sounded like a good idea.  I printed out a list of all the queries that the current service did and took them home to think.</p>
<p><em>Continued in <a href="http://www.spiteful.com/2008/03/13/users-with-a-tattoo-of-your-logo-check/">Part 2</a></em></p>
<ul class="promotions" style="padding: 0; list-style-image: none; list-style-type: none;"><h5 style="margin: 0pt; font-size:1.2em;">Would you like to...</h5><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Save this for later, <a href="http://www.spiteful.com/lets_bounce.php?p=34&key=delicious&f"><img src="http://www.spiteful.com/wp-content/plugins/sociable/images/delicious.small.gif"/> del.ici.ously?</a></li><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Help those web 2.0 valuations? <a href="http://www.spiteful.com/lets_bounce.php?p=34&key=reddit&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/reddit_tiny_2.png" /></a> <a href="http://www.spiteful.com/lets_bounce.php?p=34&key=digg&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/digg-it-tiny.gif"/></a></li></ul></ul><img src="http://feeds.feedburner.com/~r/Spiteful/~4/249557684" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spiteful.com/2008/03/11/always-refer-to-your-v1-as-a-prototype/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Lessons Learned Scaling the Audiogalaxy Search Engine</title>
		<link>http://www.spiteful.com/2008/03/07/lessons-learned-scaling-the-audiogalaxy-search-engine/</link>
		<comments>http://www.spiteful.com/2008/03/07/lessons-learned-scaling-the-audiogalaxy-search-engine/#comments</comments>
		<pubDate>Fri, 07 Mar 2008 23:22:12 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
		<category><![CDATA[Audiogalaxy]]></category>

		<category><![CDATA[Caching]]></category>

		<category><![CDATA[Compression]]></category>

		<category><![CDATA[Monitoring]]></category>

		<category><![CDATA[MySQL]]></category>

		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/2008/03/07/lessons-learned-scaling-the-audiogalaxy-search-engine/</guid>
		<description><![CDATA[Tackling a big, new programming challenge frequently means that you don&#8217;t actually understand the problem until after you solve it.  Because of that, one of the most important things to do after nailing down a solution is to think about better solutions.  Post-mortems seem to be more popular from a &#8220;how could we [...]]]></description>
			<content:encoded><![CDATA[<p><img style="float: right; padding: 4px; margin: 0 0 2px 7px;display: inline;" src="http://www.spiteful.com/wp-content/uploads/2008/03/whoops_small.jpg" alt="Photo by flickr user vadania" height="176" width="250" />Tackling a big, new programming challenge frequently means that you don&#8217;t actually understand the problem until after you solve it.  Because of that, one of the most important things to do after nailing down a solution is to think about better solutions.  Post-mortems seem to be more popular from a &#8220;how could we have managed this project better&#8221; angle, but the technical aspect should not be overlooked.</p>
<p>Sure, constant criticism of everything you do may not make you a happier person, but even if you don&#8217;t actually implement your new ideas, reevaluating your designs will make you a stronger architect.  Just try to keep the habit from spilling over too much into your personal life. <img src='http://www.spiteful.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>In that spirit, I&#8217;ve put together a follow-up to my post about the <a href="http://www.spiteful.com/2008/02/29/design-details-of-audiogalaxycoms-high-performance-mysql-search-engine/">Audiogalaxy.com MySQL search engine</a>.  To keep from getting discouraged about all the improvements I overlooked, I&#8217;ve split the list up into 2 categories.  The first contains ideas I was correct in not spending time on during the initial development. Although these ideas may have been interesting programming challenges, at the time, they simply wouldn&#8217;t have made the company any money.  (I know it sucks, but sometimes programmers forget their job is to make money for the company&#8211;not to write cool software.) The second category contains things that I probably should have done.  They may have been easy to do if I had put them into the initial code base and tested them along with everything else, or they may have been things that would have saved the company money.  These are the ones to really think about.  If I find any clever ideas or algorithms that would have helped, I like to add them to <a href="http://www.spiteful.com/category/toolbox/">my toolbox</a> to consider for next time.</p>
<h3><img src="http://www.spiteful.com/wp-content/uploads/2008/03/thumbs_down.gif" style="border: 0pt none ; vertical-align: middle" /> Fun things I couldn&#8217;t justify doing</h3>
<p><strong>Silky Smooth Searching</strong><br />
I&#8217;m not sure if anyone is actually using this technique (or what the official name is), but the basic idea is to start caching results for a search before the user actually clicks the search button.  I&#8217;m sure all of you really savvy internet types know you can hit enter milliseconds after typing a search term, but I know there are plenty of people that don&#8217;t.   I suspect that with some Ajax magic, you could take advantage of the precious few hundred milliseconds it takes to move the mouse over to the &#8220;submit&#8221; button after typing a query.  During that time, the backend could be busily building a cache of search results or pre-fetching them from disk. </p>
<p>As neat as this would have been, I don&#8217;t think I could have ever justified the development time.  A smoother search experience may have made users on fast connections slightly happier, but we had more important things to work on (of course, for some companies, <a href="http://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20.html">shaving milliseconds off response times really is important</a>). </p>
<p><strong>Better relevance</strong><br />
We sorted results by historical popularity, current availability, and a few other tricks.  One thing I would have really liked to do is take into account more information about the ordering of search terms. For example, when a user searched for &#8220;A B&#8221;, results that contained &#8220;A B&#8221; possibly should have been ahead of results that contained &#8220;B A.&#8221;  This would have been neat, but I really had no incentive to add the extra complexity.  Achieving the right weight balance on something like this would have taken a lot of experimentation.  Experimentation means downtime, and the potential payoff didn&#8217;t look big enough to justify it.</p>
<p><strong>Search power tools</strong><br />
This includes things like supporting quoted phrases to match text exactly, removing results using -, magic words to restrict results to certain bit rates, etc.  While programmers and technical folks love this sort of thing, I would bet that most people don&#8217;t. And as much as I wanted to put stuff like this in, the existing search was good enough.  It is always important to remember that adding any feature has an ongoing cost in code maintenance and debuggability, so you really have to be careful when adding non-essential ones.</p>
<p><strong>Sharded Indexes</strong><br />
One technique to speed up non-cached searches (which were primarily disk IO bound) would have been to partition my index across multiple servers.  For a single search, multiple index servers would each return a chunk of search results that the cache servers would aggregate into a single result set.  This would have become a requirement once our index grew too large, but we did not get to that point.</p>
<p>This was a tempting project for me because I didn&#8217;t know how to do it and I love learning about this sort of thing.  However, we had such a high hit rate in our caching layer that most of our searches were lightening fast, so adding this huge chunk of complexity wouldn&#8217;t have helped the company.</p>
<h3><img src="http://www.spiteful.com/wp-content/uploads/2008/03/thumbs_up.gif" style="border: 0pt none ; vertical-align: middle" /> Things I should have done</h3>
<p><strong>Faster indexing</strong><br />
Every few months we had to take the site down overnight to clean out our multi-hundred million row databases (I assure you this will be a bullet point when I do a critique of the entire system design).  After removing all the stale data, it was faster to recreate the search index than to edit the existing one.  The indexing process was fairly straightforward&#8211;it simply walked through all the rows in a single database server.  As you might imagine, this tended to bottleneck on disk IO (both in reading rows from the DB and writing them out in the index).  An indexing process that ran on multiple index servers and took advantage of more disk spindles could have shaved hours of downtime off the site each month, which would have been a big win.</p>
<p>As a more general rule, whenever you are thinking about doing anything with disk IO, ask yourself whether or not you can speed things up by using more spindles.  Disk IO is going to be a bottleneck in the datacenter for a long time.  Even after solid state disks get mass acceptance, it will never hurt to at least consider how you can use more than one of them at once.  As a side note, it is always a good idea to have a dollar figure in mind for what downtime costs per minute.  Knowing that can help you make better decisions about how you invest your developer resources.</p>
<p><strong>Redundant cache servers</strong><br />
Each cache server handled a unique subset of searches, which meant that some percentage of searches would fail when one of them went down.  This design was easy to implement and the cheapest way to get good performance, but it periodically led to avoidable downtime.  Fixing this would have helped work around another problem&#8211;the IO crunch if the cache for one machine was wiped or rebooted.  Designing for this sort of failure is so important and so easy that I&#8217;ll cover the best way to do it in a future article.</p>
<p><strong>More compression</strong><br />
As I&#8217;ve mentioned, one common bottleneck throughout the system was disk IO.  One way we could have addressed this is by compressing anything we stored on disk. Raw lists of IDs don&#8217;t compress very well, but the compression ratio can be improved by sorting the list and storing it as an initial offset plus a set of deltas.  While &#8220;1,2,3,4&#8243; might not compress well, &#8220;1,1,1,1&#8243; will compress fabulously.  This optimization is slightly iffy &#8212; we only had 4 index servers and 4 cache servers, so the savings would have to be pretty spectacular to save a significant amount of money. Despite that, I put this into the thumbs-up list because it would have been easy to do at the very beginning of the project.</p>
<p><a href="http://www.zlib.net/">Zlib</a> is already easy to use, but it is wise to integrate it at the very beginning of your project.  One obvious performance problem with this sort of service is buffer copying, and the sooner you think about compression, the saner the buffer management will be.  One final caveat though&#8211;compressing data to send across a network may not be worth it.  Gigabit ethernet is awfully fast, and I found it was generally less of a bottleneck than the disk.</p>
<p><strong>More caching</strong><br />
I mentioned that the cache servers only provided a list of match IDs to the web server&#8217;s PHP script, which then had to hit one of the database slaves before rendering the page.  Those queries were indexed by ID, so they were able to use the higher performance slaves, which were capable of something like 1000 queries per second.  However, the cache server could have easily done those final queries once and saved the results alongside the cached IDs.</p>
<p>I always like to go back to the numbers for decisions like this.  If it took 2 queries per page to fetch the info and the cluster was handling 2000 searches per second, saving that information in the cache layer could have saved us about 4 database slaves.  Total costs associated with those servers would be at least $10K, so this would have been an interesting change assuming the design didn&#8217;t increase IO usage in the cluster layer significantly.  Of course, the change would also have to have been simple enough that it didn&#8217;t require any big changes to the architecture, introduce time consuming new bugs, or take more than a few days to code and test.  </p>
<p>A project like this is exactly what startups are good at doing.  I would have been excited to build this over a weekend I might otherwise have taken off because I know I could put it into production and see the effects right away.  I would have initially enabled it only for some small percentage of queries on a single server and rolled it out at 2 or 3 in the morning.  </p>
<p><strong>Better Monitoring</strong><br />
Usually, I could easily determine if there was a problem with the site simply by watching the rate of incoming requests and the rate of completed transfers.  Those two numbers were very predictable, and any problem on the front or back end was likely to cause a drop in one or both of them.  The tricky part came when I had to narrow down which machine was having the problem.</p>
<p>Good monitoring is a key part of scalability. One best practice I recommend for custom services is to publish a set of values via HTTP that can be easily parsed and dumped into RRDTool or viewed in lynx from the command line.  Our search cluster didn&#8217;t support this, and finding problems usually meant secure shelling into each box and checking the error log.  This was fine with a small number of servers, but it would have been much easier to see anomalies with a set of graphs tracking how many queries per second each server was doing.</p>
<ul class="promotions" style="padding: 0; list-style-image: none; list-style-type: none;"><h5 style="margin: 0pt; font-size:1.2em;">Would you like to...</h5><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Save this for later, <a href="http://www.spiteful.com/lets_bounce.php?p=28&key=delicious&f"><img src="http://www.spiteful.com/wp-content/plugins/sociable/images/delicious.small.gif"/> del.ici.ously?</a></li><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Help those web 2.0 valuations? <a href="http://www.spiteful.com/lets_bounce.php?p=28&key=reddit&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/reddit_tiny_2.png" /></a> <a href="http://www.spiteful.com/lets_bounce.php?p=28&key=digg&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/digg-it-tiny.gif"/></a></li></ul></ul><img src="http://feeds.feedburner.com/~r/Spiteful/~4/247641249" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spiteful.com/2008/03/07/lessons-learned-scaling-the-audiogalaxy-search-engine/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Design details of Audiogalaxy.com’s high performance MySQL search engine</title>
		<link>http://www.spiteful.com/2008/02/29/design-details-of-audiogalaxycoms-high-performance-mysql-search-engine/</link>
		<comments>http://www.spiteful.com/2008/02/29/design-details-of-audiogalaxycoms-high-performance-mysql-search-engine/#comments</comments>
		<pubDate>Fri, 29 Feb 2008 22:00:11 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
		<category><![CDATA[Audiogalaxy]]></category>

		<category><![CDATA[C]]></category>

		<category><![CDATA[Inverted Index]]></category>

		<category><![CDATA[Managing Gigabytes]]></category>

		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://www.spiteful.com/2008/02/29/design-details-of-audiogalaxycoms-high-performance-mysql-search-engine/</guid>
		<description><![CDATA[As I mentioned before, search was one of most interesting problems I worked on at Audiogalaxy.  It was one of the core functions of the site, and somewhere between 50 to 70 million searches were performed every day.  At peak times, the search engine needed to handle 1500-2000 searches every second against a [...]]]></description>
			<content:encoded><![CDATA[<p>As I mentioned before, search was one of most interesting problems I worked on at Audiogalaxy.  It was one of the core functions of the site, and somewhere between 50 to 70 million searches were performed every day.  At peak times, the search engine needed to handle 1500-2000 searches every second against a MySQL database with about 200 million rows.  It was frequently hard to design for more than 10 or 100x our current traffic (and our growth was so dramatic that there wasn&#8217;t really ever time to spend more than a few weeks on the problem), so it wasn&#8217;t until the 4th major iteration on the cluster design that I really considered the scale problems to be solved.</p>
<p>Here is a diagram of the architecture I ended up with (component explanations are below):<br />
<center><img src="http://www.spiteful.com/wp-content/uploads/2008/02/searcharchitecture2.png" alt="Audiogalaxy Search Architecture" border="1" /></center>Arrows indicate the direction the request flows for any request/response pairs.  What you see here is the result of a few years of evolutionary improvements, not a single top-down design. Generally, things changed when I had a few weeks to focus on search. It was enough time to make improvements but rarely enough time to scrap everything and start over.</p>
<p><strong>Satellite Servers (300+ servers, written in C)</strong><br />
The file sharing clients (the satellites) kept connections open to the satellite servers, and these servers were the ultimate authority about the availability of files.  Information about availability was used to sort results immediately before they were displayed to the user.</p>
<p><strong>Gateways (4 servers, written in C)</strong><br />
Gateways were necessary whenever a cache server or a web server needed to query anything in the satellite server cluster.  Because there were so many satellite servers, it wasn&#8217;t feasible to query all of them in a reasonable amount of time.  To address this problem, the gateways helped aggregate information and route requests.</p>
<p>For processing searches, the gateways were the source of the &#8220;current availability&#8221; for every search result.  The availability was used to help sort the results &#8212; we wanted to show files that were currently available higher in the search results than files that were not.  Gateways could respond to these queries without querying any other servers, so it was feasible to query for large numbers of results at once.</p>
<p><strong>Caches (4 servers, written in C)</strong><br />
I called them caching servers, but they ultimately did most of the heavy lifting for each search in addition to caching the results.  They accepted queries and paging offsets from the web servers, cached results from the index servers to disk, queried the databases and gateways for sorting information, and returned a set of results.  In earlier versions of the search, results were not cached by a dedicated service, and most of the processing was done in PHP on the web servers.  Manipulating large arrays of results in PHP was a huge perf issue, but it was initially nice to be able to tweak the search algorithm in a scripting language without dealing with recompiling and redeploying.</p>
<p>These servers were also used to sort results by relevance.  Results were sorted twice.  The first sort was by historical availability, and it occurred when a result set (which could have been many thousands of rows) was initially cached to disk.  The historical availability was stored in a database and was updated daily by adding the current availability to 1/2 the previous historical availability.  The second sort, based on current availability information from the gateways, occurred whenever a result set was accessed from disk.  This wasn&#8217;t a true sort; it simply made sure that regardless of the historical availability, any currently unavailable song did not show up in the results before a currently available song.  This ensured that new songs which could be downloaded immediately got bumped up ahead of songs which used to be popular but no one was currently sharing.</p>
<p>The cache servers were partitioned such that each unique search had its results cached on a single service.  Each search string was hashed into a unique ID, and that number was used to determine which cache server handled the results.   To improve the efficiency of the cache layer, I took advantage of the fact that our search engine ignored word ordering and duplicates, so I stripped out duplicates and sorted the words before hashing. Thus, searches for &#8220;A A B&#8221;, &#8220;B A&#8221;, or &#8220;A B B&#8221; all hashed to the same ID and thus the same results.</p>
<p>For efficiency, the cache servers would cache potentially large numbers of results for new searches.  As the user paged through search results, each page could be fulfilled by the cache file from the initial batch.</p>
<p>Each server ended up with many thousands of cached results stored on disk.  To avoid overwhelming the file system, we had 65K folders in two levels of 256 folders each, named 00 to FF.  Each results file was identified by the hash of the query string with the first 4 hex digits controlling which folder it ended up in.</p>
<p><strong>Index Servers (4 servers, written in C)</strong><br />
These servers implemented an inverted index for all the searchable content in our MySQL databases.  The index had 2 components: an enormous in-memory hashtable of words and disk-based storage of the rowIDs that matched each word.  Obviously, index servers processed queries, but they also monitored the database for two types of changes.  First, each server looked for new rows in the primary table.  Second, it looked for new rows in an Updates table that was used to tell the search engine to re-index existing rows.  Because there may have been several hundred million indexed rows, it wasn&#8217;t feasible for the search engine to continually spider the whole table.  Therefore, I used the Updates table to trigger changes when I deleted or edited a row that was already indexed.</p>
<p>To process queries, I used the inverted index algorithm straight out of <a href="http://books.google.com/books?id=2F74jyPl48EC&amp;dq=managing+gigabytes&amp;pg=PP1&amp;ots=5P9PFq5Xb6&amp;sig=ImZAkxkrEcHDSWpwAqwi5t5kLIM&amp;hl=en&amp;prev=http://www.google.com/search?q=managing+gigabytes&amp;ie=utf-8&amp;oe=utf-8&amp;rls=org.mozilla:en-US:official&amp;client=firefox-a&amp;sa=X&amp;oi=print&amp;ct=title&amp;cad=one-book-with-thumbnail">Managing Gigabytes</a>.  Each query was broken into words, and each word was used as a key into the in-memory hashtable.  The hashtable record contained the count of how many rows matched that word and an offset to the disk to read the full ID list.  The service would then iterate through the words from smallest num rows to largest and intersect the word lists.  To efficiently intersect the lists, it would walk the smaller list and do binary searches over the next larger one.  Each intersection produced a new list that was intersected with the next larger list.</p>
<p>The algorithm itself is simple and powerful, but the first tricky part was managing disk IO, which is always the bottleneck for a service like this.  I didn&#8217;t want a search for &#8220;The UncommonWords&#8221; to pull all the IDs that use the word <em>the </em>off the disk.  The second tricky part dealt with managing updates to the index.  If the service indexed a new row the the word <em>the </em>in it, I didn&#8217;t want to risk having to write the entire word list back to disk.  So, the service kept a small ID list in memory and combined it with the disk backed lists as necessary.  Periodically, the in-memory lists were combined with the main ones and flushed to disk.</p>
<p>We had several servers running this process, and each one kept a duplicate index.  Fortunately, we never had to deal with sharding the index across multiple servers or a hash table of words that wouldn&#8217;t fit in memory.  Speaking of which, I&#8217;d love to read some papers on how internet-scale search engines actually do that.  Does anyone have any recommendations?</p>
<p><strong>Web Servers (Apache, running PHP scripts)</strong><br />
After everything was designed  and implemented, the web servers didn&#8217;t have much work to do in the final version of the search.  They established a TCP connection to the appropriate cache server, sent in the search string and how many results they wanted along with a paging offset, and then read the matching row IDs and popularity back.  Then, they looked up the matching rows in the MySQL database to get the original strings and rendered the page.</p>
<p><strong>Communication Protocols </strong><br />
All of the communication between my services used a custom protocol, not SOAP or anything HTTP based.  Most of the bandwidth between the services was consumed sending long lists of integer IDs back and forth, and a custom protocol made that easy to optimize.  I think it also simplified using long-lived connections between the cache and the index layer.</p>
<p><strong>The Life Of A Query</strong><br />
So, tying it all together, here is a run through of what happened when a user ran a search:</p>
<ol>
<li>A user hits &#8220;Search&#8221; and an HTTP POST is sent to one of the load balanced web servers</li>
<li>The web server strips the duplicates from the search term, sorts the words, hashes the result and uses that value to pick a cache server.  It establishes a TCP connection to the cache server and sends the query, blocking while it waits for the results.</li>
<li>The cache server accepts the request from the web server.  If a file of search results does not exist for the query, it randomly picks one of the index servers and uses a pooled connection to send it a request.</li>
<li>The index server receives the request and builds a list of row IDs using the inverted index algorithm described above. It sends the list of IDs back to the cache server.</li>
<li>Assuming it didn&#8217;t have the results cached, the cache server reads the IDs back from the index server.  It then queries a database to get the historical availability.  Using that, it sorts the results and writes them out to disk.</li>
<li>Using the sorted results (either from disk or from memory if it just got them from the index server), the cache server sends a list of IDs to a random Gateway server, which responds with the current availability of each file.</li>
<li>The cache layer adjusts the order of the results based on the current availability.  Then, it indexes into the results based on the requested offset from the web server and writes a few dozen IDs back to the web server.</li>
<li>The web server process unblocks and reads the matching IDs back from the cache server.  It hits the database to get anything necessary to render the page, formats the results, and gzips them out to the web browser.</li>
<li>The user gets the response and smiles (hopefully).</li>
</ol>
<p>So, in a few hundred milliseconds a simple search could easily touch 4 servers plus half a dozen different databases.</p>
<p>My favorite part of all of this was running tail -f against the logs on an index server.  We had an excellent hit rate for our caching layer, so the only searches I really saw there were serious misspellings&#8211;and usually humorous ones at that.</p>
<p>Update: I&#8217;ve added another article covering some of the <a href="http://www.spiteful.com/2008/03/07/lessons-learned-scaling-the-audiogalaxy-search-engine/">lessons learned building the search engine. </a></p>
<ul class="promotions" style="padding: 0; list-style-image: none; list-style-type: none;"><h5 style="margin: 0pt; font-size:1.2em;">Would you like to...</h5><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Save this for later, <a href="http://www.spiteful.com/lets_bounce.php?p=24&key=delicious&f"><img src="http://www.spiteful.com/wp-content/plugins/sociable/images/delicious.small.gif"/> del.ici.ously?</a></li><li style="font-size: 1.1em; margin: 0 0 3px 6px;">Help those web 2.0 valuations? <a href="http://www.spiteful.com/lets_bounce.php?p=24&key=reddit&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/reddit_tiny_2.png" /></a> <a href="http://www.spiteful.com/lets_bounce.php?p=24&key=digg&f"><img style="vertical-align: middle; border: 0;" src="http://www.spiteful.com/wp-content/plugins/sociable/images/digg-it-tiny.gif"/></a></li></ul></ul><img src="http://feeds.feedburner.com/~r/Spiteful/~4/243557269" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spiteful.com/2008/02/29/design-details-of-audiogalaxycoms-high-performance-mysql-search-engine/feed/</wfw:commentRss>
		</item>
	</channel>
</rss><!-- Dynamic Page Served (once) in 1.183 seconds -->
