<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:blogger='http://schemas.google.com/blogger/2008' xmlns:georss='http://www.georss.org/georss' xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6067234520252164707</id><updated>2024-09-20T18:27:13.110-07:00</updated><category term="professional"/><category term="storagemonkeys"/><category term="dns"/><title type='text'>Max Kalashnikov</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://blog.maxkalashnikov.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default?redirect=false'/><link rel='alternate' type='text/html' href='http://blog.maxkalashnikov.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Max</name><uri>http://www.blogger.com/profile/04705387565124551855</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwu8ar9I34d4Doi9OHJhKuL2bDunqnI27v1lBBUEFdLbVy3NflbFTU9O2mtXYXgZFDqmS9_u9MFnHTH7WBr4L2ivvW5dxzEekrWvbCgW0ecu2wQSpX-iA966_Fi8PVg/s220/caee81e6dd93fec608e6cc26.png'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>11</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6067234520252164707.post-1431717065714913988</id><published>2011-04-11T07:41:00.000-07:00</published><updated>2011-04-11T07:41:33.541-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="professional"/><title type='text'>The myth of the &quot;commodity&quot; server (for memory)</title><content type='html'>Over the past several years, I keep stumbling upon deployment systems and such concepts as &quot;sharding&quot; which use as their &lt;i&gt;raison d&#39;être&lt;/i&gt; the ability to scale across an arbitrary number of cheap, &quot;commodity&quot; (usually 1U) servers.&lt;br /&gt;
&lt;br /&gt;
The implication is that &quot;larger&quot; servers either have a higher price per performance or are somehow more difficult to administer[1]. I reject both suppositions.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
The day of &quot;big iron&quot; is well past us. This isn&#39;t to say one can&#39;t still buy large machines, or even run Linux on an IBM z-series, but for most practical intents, there are only two classes of Linux[2] server hardware.&lt;br /&gt;
&lt;br /&gt;
The larger class is based on the quad-processor Xeon 7xxx series motherboards. These machine are, I admit, less bang for the buck, if ones &quot;bang&quot; is fungible processor power and/or memory.&lt;br /&gt;
&lt;br /&gt;
Everything else, however, has either linear or even sub-linear pricing.&lt;br /&gt;
&lt;br /&gt;
Let&#39;s lo0k at the current pricing from Dell, whom I find to be the cheapest of the brand-name vendors:&lt;br /&gt;
&lt;br /&gt;
CPU (cores@clock) model slots $price&lt;br /&gt;
&lt;br /&gt;
X3430 (4@2.4) R310 4mem $1257 &lt;br /&gt;
E5620 (4@2.4) R410 8mem $1319&lt;br /&gt;
E5620 (4@2.4) R510 8mem $1418&lt;br /&gt;
E5620 (4@2.4) R610 12mem $1762&lt;br /&gt;
E5620 (4@2.4) T610 12mem $1537 &lt;br /&gt;
E5620 (4@2.4) R710 18mem $1712&lt;br /&gt;
E5620 (4@2.4) T710 18mem $1498&lt;br /&gt;
&lt;br /&gt;
1*E6510 (4@1.73) R810 16mem $3821&lt;br /&gt;
2*E7520 (4@1.86) R810 32mem $5531 &lt;br /&gt;
2*E7520 (4@1.86) R910 32mem $5790&lt;br /&gt;
4*E7520 (4@1.86) R910 64mem $8855&lt;br /&gt;
&lt;br /&gt;
These are all configured with rack rails with cable arms and as little memory as possible, assuming one would buy commodity memory. What&#39;s notable is that the &quot;small&quot; machines with 4 and 8 memory slots are under 10% cheaper than the next ones up and that the 18-slot models are cheaper than the 12-slotters.&lt;br /&gt;
&lt;br /&gt;
If one is memory-bound[3], the best deal for the money is the 5U-tall T710. If you&#39;re fortunate enough to be in a facility with plenty of power but not plenty of space, then the 2U-tall R710 makes sense for the extra 15%. Either way, assembling that many memory slots out of the smaller 1Us is going to be more expensive, more space and power consuming, and will yield less usable memory, since each box has some common OS overhead.&lt;br /&gt;
&lt;br /&gt;
What I also find notable is that the higher-end servers, though over twice as expensive for the cheapest model, are still cheaper and smaller for the memory slots than enough 1Us. Even over the 2Us, the price premium is under 50% for the base system, and likely a good deal less once the memory itself is included.&lt;br /&gt;
&lt;br /&gt;
Since memory density increases with Moore&#39;s law, if you have 3% monthly growth or less and you comfortably[4] fit into one of the $1500 servers, there&#39;s no need to worry about &quot;sharding&quot; due to memory. Similarly, if you&#39;re at 10% monthly growth (doubling every year), you have 2 years to grow into the then-current larger machines, assuming that number of memory slots per same cost server [5] doesn&#39;t increase.&lt;br /&gt;
&lt;br /&gt;
For a startup, 2 years is a lot of engineering time that could be spent on actually driving the growth rather than focusing on how to handle it if it happens to appear.&lt;br /&gt;
&lt;br /&gt;
For now, pricing of CPU &quot;horsepower&quot; across the different servers is left as an exercise to the reader who enjoys comparing benchmarks.&lt;br /&gt;
&lt;br /&gt;
[1] The virtualization proponents seem to go both ways on this, the other way being the subdivision of larger servers into several smaller, virtual machines.&lt;br /&gt;
&lt;br /&gt;
[2] Linux on x86 is the only one that counts these days, right?&lt;br /&gt;
&lt;br /&gt;
[3] Often the case with modern languages such as Java and Python. The practice of using memcached or other in-memory databases similarly leads to memory scarcity.&lt;br /&gt;
&lt;br /&gt;
[4] That is, without paying a huge premium for the highest density memory, which premium often only exists for a short period of time.&lt;br /&gt;
&lt;br /&gt;
[5] Or, rather, per processor, unless we go back to a serial connection technology like FB-DIMM.&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.maxkalashnikov.com/feeds/1431717065714913988/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.maxkalashnikov.com/2011/04/myth-of-commodity-server-for-memory.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/1431717065714913988'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/1431717065714913988'/><link rel='alternate' type='text/html' href='http://blog.maxkalashnikov.com/2011/04/myth-of-commodity-server-for-memory.html' title='The myth of the &quot;commodity&quot; server (for memory)'/><author><name>Max</name><uri>http://www.blogger.com/profile/04705387565124551855</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwu8ar9I34d4Doi9OHJhKuL2bDunqnI27v1lBBUEFdLbVy3NflbFTU9O2mtXYXgZFDqmS9_u9MFnHTH7WBr4L2ivvW5dxzEekrWvbCgW0ecu2wQSpX-iA966_Fi8PVg/s220/caee81e6dd93fec608e6cc26.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6067234520252164707.post-413988807744647176</id><published>2011-04-04T10:38:00.000-07:00</published><updated>2011-04-04T10:38:19.063-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="professional"/><title type='text'>When is it time for a senior sysadmin?</title><content type='html'>In the quest for the &quot;perfect&quot; startup to join, I have my own personal guidelines as to company size and growth. However, I also tend to ask questions to determine if it&#39;s too early or too late for me (as a system administrator) to be of adequate help.&lt;br /&gt;
&lt;br /&gt;
I&#39;m not just a porridge-swilling Goldilocks when it comes to this kind of timing. If it&#39;s too early[1], I&#39;m going to get bored, while the company wastes its money, which isn&#39;t good for anyone. Too late, and I end up being incapable of overcoming legacy hurdles, which is a source of frustration and appearance of ineffectiveness, again not being good for anyone.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: large;&quot;&gt;Growth&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
The first thing I look at is growth, since that&#39;s the single most reliable sign that it may be &quot;too early.&quot; Merely modest growth can also mean a great challenge for someone who is by all accounts still a senior sysadmin, but not for me, which is why I probe that early.&lt;br /&gt;
&lt;br /&gt;
For a startup with the expectation of &quot;hockey stick&quot; style growth, I would say the right time is anywhere on the elbow part (of greatest slope change). The nearly-horizontal part means it&#39;s too early, since that can last an indeterminate amount of time and such minimal[2]&amp;nbsp; growth can be handled by developers sharing the load of administration.&lt;br /&gt;
&lt;br /&gt;
I look for 10% monthly growth or doubling yearly as a minimum. Any metric that can be credibly linked to infrastructure works, including bandwidth, users, revenue, servers, even employees. I have yet to run into the need for having a maximum. Does anyone have a suggestion of a growth rate that&#39;s clearly up the handle of the hockey stick? Factor of 10 yearly? &lt;br /&gt;
&lt;span style=&quot;font-size: small;&quot;&gt; &lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;font-size: large;&quot;&gt;Employees&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: large;&quot;&gt;&lt;span style=&quot;font-size: small;&quot;&gt;Another metric I use is number of employees, or, more specifically, number of technical employees. The &quot;too late&quot; case has an easy rule of thumb: everyone technical needs sit in the same room and still be able to communicate effectively with each other. My experience is that this is a common early startup model. Once people have walls (even cube walls) and doors separating them, there&#39;s just enough of an &quot;us vs. them&quot; mentality that a sysadmin can no longer absorb enough of everything that&#39;s going on to effectively influence how things are done in the future[3].&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: large;&quot;&gt;&lt;span style=&quot;font-size: small;&quot;&gt;I&#39;m not sure there&#39;s a danger of there being a &quot;too early&quot; case, but I&#39;d be hard pressed to recommendsysadmin being ones first or second hire.&lt;/span&gt; &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: large;&quot;&gt;Number of Servers&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
A common, though, to my mind, less significant, metric is number of servers. The reason I consider it of secondary importance is that doesn&#39;t translate well to overall environment complexity. Put another way, the existing number of servers doesn&#39;t translate well to the eventual number of servers once a sysadmin.&lt;br /&gt;
&lt;br /&gt;
Still, if you can run everything on one or two servers, it&#39;s probably too early. If you have a couple hundred and you don&#39;t already have someone dedicated to thinking about them, it&#39;s too late.&lt;br /&gt;
&lt;span style=&quot;font-size: small;&quot;&gt; &lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;font-size: large;&quot;&gt;Server/Services Spending&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: small;&quot;&gt;More significant than the number of servers is how much is being spent on the hardware (if applicable), hosting, and services, such as ISPs and CDNs. My sweet spot is that this needs to be about twice the salary of a sysadmin, since I can often cut those expenditures in half[4]&lt;/span&gt;. Less than a sysadmin&#39;s salary and it&#39;s too early. More than 5 times a sysadmin&#39;s salary and it&#39;s too late, though, like with growth, I have yet to see this be an issue in the real world.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] Granted, it may only be too early for &lt;i&gt;me&lt;/i&gt; but not for a junior sysadmin. That&#39;s a philosophical question for another post. I&#39;ve found, however, that most startups don&#39;t want to spend the time and money to eventually hire two people rather than waiting and getting just one. &lt;br /&gt;
&lt;br /&gt;
[2] It&#39;s important to remember to normalize against technological progress. Even I/O performance progresses linearly, even if it doesn&#39;t follow the geometric progression of Moore&#39;s Law. &lt;br /&gt;
&lt;br /&gt;
[3] Including influencing development process and tools, if not providing the outright. I&#39;ve heard this method called &quot;DevOps,&quot; but I just consider it to be good startup system administration.&lt;br /&gt;
&lt;br /&gt;
[4] Easily justifying my own salary, if needed, but, more importantly, revealing the negotiation over $10k one way or the other seem the silly waste of time that it is.</content><link rel='replies' type='application/atom+xml' href='http://blog.maxkalashnikov.com/feeds/413988807744647176/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.maxkalashnikov.com/2011/04/when-is-it-time-for-senior-sysadmin.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/413988807744647176'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/413988807744647176'/><link rel='alternate' type='text/html' href='http://blog.maxkalashnikov.com/2011/04/when-is-it-time-for-senior-sysadmin.html' title='When is it time for a senior sysadmin?'/><author><name>Max</name><uri>http://www.blogger.com/profile/04705387565124551855</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwu8ar9I34d4Doi9OHJhKuL2bDunqnI27v1lBBUEFdLbVy3NflbFTU9O2mtXYXgZFDqmS9_u9MFnHTH7WBr4L2ivvW5dxzEekrWvbCgW0ecu2wQSpX-iA966_Fi8PVg/s220/caee81e6dd93fec608e6cc26.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6067234520252164707.post-379254259901445375</id><published>2011-03-25T11:48:00.000-07:00</published><updated>2011-03-25T11:48:04.128-07:00</updated><title type='text'>OpenStreetMap is a ghetto of stagnation.</title><content type='html'>Having interacted with a few other mappers, particularly in disputes, I had the odd impression that either they were a bit, shall we say, mentally challenged, or struggled with language. Now I know why.&lt;br /&gt;
&lt;br /&gt;
Fully a year later, one of the people in charge communicates with me and, in summary, says that the community is favored over map quality every time. Wow.&lt;br /&gt;
&lt;br /&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;&lt;br /&gt;
So what is this community? Where do its members &quot;hang out?&quot; There&#39;s a plethora of choice, and, apparently, they&#39;re all equally inadequate, except for the mailing lists, which, despite being shockingly anachronistic[1], are held up as the pillar of excellence as discussion venue. Never mind that the same OSMF board member who did so is also 
complaining about &quot;toxic&quot; participants on the mailing lists, who are 
there only to argue and aren&#39;t otherwise active mappers. &lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;The fora I&#39;ve found so far are:&lt;br /&gt;
&lt;br /&gt;
Wiki&lt;br /&gt;
Wiki Talk&lt;br /&gt;
Forums&lt;br /&gt;
Meetus &lt;br /&gt;
Help pages&lt;br /&gt;
Individual OSM email&lt;br /&gt;
Out-of-band regular e-mail&lt;br /&gt;
and, of course, the mailing lists and their archives. &lt;br /&gt;
&lt;br /&gt;
That&#39;s quite a dauntingly fragmented set of channels, even for an earnest participant. At best, they strike me as a signficant distraction from the task at hand.&lt;br /&gt;
&lt;br /&gt;
All this means smart, dedicated, motivated mappers are going to get systematically chased away, while those who oppose change but are good at playing politics will stay. Sound familiar? I fear that this is always the logical conclusion to any such Wiki-like &quot;crowdsourcing&quot; effort.&lt;br /&gt;
&lt;br /&gt;
I had such high hopes for the octo-chicken. Still, it may work, as it seems to have mostly worked for Wikipedia. Here&#39;s hoping for a worthy fork in the meantime.&lt;br /&gt;
&lt;br /&gt;
[1] Even when I started with the &#39;net a quarter century ago, they already seemed quaintly backwards, compared to Usenet. I&#39;m pretty confident OSM isn&#39;t nearly that old.</content><link rel='replies' type='application/atom+xml' href='http://blog.maxkalashnikov.com/feeds/379254259901445375/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.maxkalashnikov.com/2011/03/openstreetmap-is-ghetto-of-stagnation.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/379254259901445375'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/379254259901445375'/><link rel='alternate' type='text/html' href='http://blog.maxkalashnikov.com/2011/03/openstreetmap-is-ghetto-of-stagnation.html' title='OpenStreetMap is a ghetto of stagnation.'/><author><name>Max</name><uri>http://www.blogger.com/profile/04705387565124551855</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwu8ar9I34d4Doi9OHJhKuL2bDunqnI27v1lBBUEFdLbVy3NflbFTU9O2mtXYXgZFDqmS9_u9MFnHTH7WBr4L2ivvW5dxzEekrWvbCgW0ecu2wQSpX-iA966_Fi8PVg/s220/caee81e6dd93fec608e6cc26.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6067234520252164707.post-6296316261448379799</id><published>2011-03-01T09:38:00.000-08:00</published><updated>2011-03-01T09:38:37.282-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="dns"/><category scheme="http://www.blogger.com/atom/ns#" term="professional"/><title type='text'>Secondary DNS</title><content type='html'>Here&#39;s my advice for &quot;secondary&quot;&amp;nbsp; DNS service. I recommend running the master unlisted (&quot;stealth master&quot;) and using it only to serve zone transfer to the slaves. It can also be a good idea to have a backup &quot;stealth&quot; slave that could become the master.&lt;br /&gt;
&lt;br /&gt;
I call them &quot;slaves&quot; even though, in registration terms, I think they&#39;re still called &quot;primary&quot; and &quot;secondary.&quot; I have yet to find a practical distinction, and, with a stealth master, there could be confusion.&lt;br /&gt;
 &lt;br /&gt;
Make sure to have at least one slave listed from a different TLD (.com, .org, .net, or a ccTLD).&lt;br /&gt;
&lt;br /&gt;
A list of my preferred providers, reasonably priced:&lt;br /&gt;
&lt;br /&gt;
DNS Made Easy (per 5-10 million query pricing)&lt;br /&gt;
BackupDNS (flat per zone per month)&lt;br /&gt;
EasyDNS (per million query pricing)&lt;br /&gt;
DNS Unlimited (cheap per million query pricing)&lt;br /&gt;
Durable DNS (per million query pricing)&lt;br /&gt;
No-IP &quot;squared&quot; (flat per domain per year)&lt;br /&gt;
&lt;br /&gt;
Not all of them support configuring more than one master, but they all have web access to effect the changes.&lt;br /&gt;
&lt;br /&gt;
More detailed advice may be forthcoming.&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.maxkalashnikov.com/feeds/6296316261448379799/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.maxkalashnikov.com/2011/03/secondary-dns.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/6296316261448379799'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/6296316261448379799'/><link rel='alternate' type='text/html' href='http://blog.maxkalashnikov.com/2011/03/secondary-dns.html' title='Secondary DNS'/><author><name>Max</name><uri>http://www.blogger.com/profile/04705387565124551855</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwu8ar9I34d4Doi9OHJhKuL2bDunqnI27v1lBBUEFdLbVy3NflbFTU9O2mtXYXgZFDqmS9_u9MFnHTH7WBr4L2ivvW5dxzEekrWvbCgW0ecu2wQSpX-iA966_Fi8PVg/s220/caee81e6dd93fec608e6cc26.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6067234520252164707.post-7737048172574528855</id><published>2011-02-03T11:29:00.001-08:00</published><updated>2011-02-07T22:41:54.484-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="professional"/><title type='text'>Virtualization for databases (bad idea)</title><content type='html'>Originally in response to this (excerpt of a) discussion on LinkedIn:&lt;br /&gt;
&lt;blockquote&gt;
&lt;span class=&quot;comment-body&quot; data-li-comment-text=&quot;&quot;&gt;
I think this is a LINUX issue! Because in linux the I/O is buffered or 
delegated to a proccess. When you install Postgres or any DB, Postgres 
tell to the OS that it can&#39;t wait to do the I/O, it must be done 
inmediattly. But what happens in a virtualized environment?
&lt;/span&gt;&lt;/blockquote&gt;
There&#39;s no such thing as telling the OS to do an I/O immediately, as opposed to waiting. It&#39;s the other way around: non-buffered I/O requires waiting for it to actually complete. This is important for such features as data integrity (knowing it was written to the platter, or, perhaps, in the case of SSDs, that the silicon was erased and written to).&lt;br /&gt;
&lt;br /&gt;
The real problem is that virtualization is fundamentally flawed. What is an operating system for, in the first place? It&#39;s the interface between the hardware and the applications. Virtualization breaks this, without, IMO, adequate benefit.&lt;br /&gt;
&lt;br /&gt;
Put another way, virtualization abstracts away hardware, to a lowest common denominator. It is therefore an unsurprising result that the subsequent performance is consistent with the lowest common denominator as well. &quot;Commodity hardware&quot; is a myth[1].&lt;br /&gt;
&lt;br /&gt;
One of my greatest tools as a sysadmin is my knowledge of hardware, how it fits together, and how it interacts with the OS. Take that away from me by insisting on virtualization or ordering off a hosting provider&#39;s menu of servers, and I, too, suffer from the lowest common denominator syndrome.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] Really, it&#39;s that non-commodity &quot;big iron&quot; is extinct in my world, especially with the demise of Sun.</content><link rel='replies' type='application/atom+xml' href='http://blog.maxkalashnikov.com/feeds/7737048172574528855/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.maxkalashnikov.com/2011/02/originally-in-response-to-this-excerpt.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/7737048172574528855'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/7737048172574528855'/><link rel='alternate' type='text/html' href='http://blog.maxkalashnikov.com/2011/02/originally-in-response-to-this-excerpt.html' title='Virtualization for databases (bad idea)'/><author><name>Max</name><uri>http://www.blogger.com/profile/04705387565124551855</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwu8ar9I34d4Doi9OHJhKuL2bDunqnI27v1lBBUEFdLbVy3NflbFTU9O2mtXYXgZFDqmS9_u9MFnHTH7WBr4L2ivvW5dxzEekrWvbCgW0ecu2wQSpX-iA966_Fi8PVg/s220/caee81e6dd93fec608e6cc26.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6067234520252164707.post-3531101425067701402</id><published>2011-01-22T10:53:00.002-08:00</published><updated>2011-01-22T11:01:22.284-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="professional"/><title type='text'>You only just swallowed us, I know, but please cough us back up.</title><content type='html'>I was asked recently what my ideal scenario to retain me long-term, and it occurred to me, after answering otherwise, that there does exist such a situation. Our new overlords would have to spin us off and let us operate independently, as a wholly-owned subsidiary.&lt;br /&gt;
&lt;br /&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;&lt;br /&gt;
My own role has not been rendered completely irrelevant, as I had feared, just stagnant. The closest thing I currently have to a boss managed to finagle our keeping our own deployment process and administrative control of our servers. For now, this means the hosting provider. Later, it means mostly virtual boxes in eBay&#39;s datacenter(s). It probably won&#39;t be significantly worse than what we have now, since our provider&#39;s internal network has had numerous failures.&lt;br /&gt;
&lt;br /&gt;
However, since my next step, right before the acquisition, was going to be to move to our own datacenter, there will be no moving forward. I&#39;ll be stuck with the already outgrown scaling (for lack of a better term) model and no control of the network, hardware, or provisioning. The most powerful tools with which I am adept won&#39;t be available to me.&lt;br /&gt;
&lt;br /&gt;
There will also be no opportunity for mentorship or participation in hiring other sysadmins, something I have found adds significantly to my overall job satisfaction. No, joining eBay Ops (cue &quot;Central Services&quot; jingle from &lt;i&gt;Brazil&lt;/i&gt;) is not an option, since I enjoy being productive.&lt;br /&gt;
&lt;br /&gt;
If we were spun off, the lip service given to continuing what we were doing, just with eBay&#39;s resources behind us, could actually be made to be true. We would be free of the usual bureaucratic encumbrances, all-downside purchasing process (no buyers, just forms)[1], crippling &quot;collaboration&quot; tools like Exchange and Skype, and the temptation to shoehorn what&#39;s still a nimble startup operation into a nearly immobile behemoth&#39;s infrastructure.&lt;br /&gt;
&lt;br /&gt;
We could still sub-lease their campus and maybe even be eBay-galaxy-of-companies employees so as to&amp;nbsp; share benefits (though even those are lackluster and an administrative time sink). However, we would control our own destiny in terms of hiring, purchasing, and operating our service. Integration with eBay&#39;s services would be via API, as it would otherwise, since the code bases have, to put it mildly, irreconcilable differences.&lt;br /&gt;
&lt;br /&gt;
I very seriously doubt, however, that this could ever happen, since there&#39;s too much potential for loss of face somewhere up the chain of command. In the meantime, I&#39;ll continue to help in what ways I can and be on the lookout for another suitable startup.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] Unless it&#39;s over a million dollars. The purchasing department has a great scam going. They&#39;ve managed to appear to have very low costs, because they outsourced everything one might think they do. The accounting work is off-shore, and the request, quote, purchase, and receiving tasks are all pushed onto all employees in the guise of self-service. Of course, it&#39;s still Purchasing that dreams up the Byzantine policies everyone else is expected to implement.</content><link rel='replies' type='application/atom+xml' href='http://blog.maxkalashnikov.com/feeds/3531101425067701402/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.maxkalashnikov.com/2011/01/acquisition-do-they-ever-go-well-for.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/3531101425067701402'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/3531101425067701402'/><link rel='alternate' type='text/html' href='http://blog.maxkalashnikov.com/2011/01/acquisition-do-they-ever-go-well-for.html' title='You only just swallowed us, I know, but please cough us back up.'/><author><name>Max</name><uri>http://www.blogger.com/profile/04705387565124551855</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwu8ar9I34d4Doi9OHJhKuL2bDunqnI27v1lBBUEFdLbVy3NflbFTU9O2mtXYXgZFDqmS9_u9MFnHTH7WBr4L2ivvW5dxzEekrWvbCgW0ecu2wQSpX-iA966_Fi8PVg/s220/caee81e6dd93fec608e6cc26.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6067234520252164707.post-9030994407537690852</id><published>2011-01-21T13:26:00.000-08:00</published><updated>2011-01-22T11:01:22.285-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="professional"/><category scheme="http://www.blogger.com/atom/ns#" term="storagemonkeys"/><title type='text'>Compression at &quot;Internet&quot; scale (originally posted to StorageMonkeys November 22, 2009)</title><content type='html'>One of the things I&#39;ve learned, having been in more traditional 
&quot;Enterprise&quot; environments and &quot;Internet&quot; companies is that the latter 
have much larger scale issues, with respect to storage, by an order of 
magnitude or two, than the former. &lt;br /&gt;
&lt;a href=&quot;&quot; name=&quot;readmore&quot;&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;&lt;br /&gt;

 Fortunately, there&#39;s also a difference in the nature of the data, such 
that the most voluminous (and, arguably, most valuable) data, web access
 logs, are highly compressible (5-20x) with the right algorithm.  
Compression is important at this scale for reducing I/O and increasing 
speed of access, not the number of bits &quot;spinning&quot; on platters.  &lt;br /&gt;
&lt;br /&gt;
A
 solution must work in real-time. There is some flexibility in that 
average load is rarely anywhere near peak load. However, my experience 
is that paying for unused capacity is better than depending on 
catchingup on backlogs during off-peak times. In the former case, the 
consequences of a poor estimate are finite and predictable but not so in
 the latter. &lt;br /&gt;
&lt;br /&gt;
Assuming one wants to use the data, a solution must
 decompress at least as fast as it compresses. I haven&#39;t run into this 
as a problem, since the readily available algorithms easily meet such a 
requirement. A possible issue could be with parallel processing of the 
compression but centralized processing of the decompression, such as to 
load into decision support database. &lt;br /&gt;
&lt;br /&gt;
Performance has to be no 
more than O(n) for memory (distributed compressors) or O(n) for CPU 
(central compressor). Fortunately, the former appears easily satisfied 
by available algorithms, so long as &quot;n&quot; is log event volume, not average
 size of each log event&lt;br /&gt;
&lt;br /&gt;
HTTP logs are extremly self-similar, so 
just throwing Lempel-Ziv at them is sub-optimal. Experimenting, although
 I&#39;ve found descendants like LZMA do quite well (around 5x), that seems 
to be the top end, at a not particularly impressive speed. This may be 
great for general purpose compression but not for this special purpose. &lt;br /&gt;
&lt;br /&gt;Though
 they&#39;ll often have plenty of natural-language embedded within, large 
text compressors (such as those tuned for the Hutter Prize cf. &lt;a href=&quot;http://mattmahoney.net/dc/text.html&quot; title=&quot;Matt Mahoney&#39;s page&quot;&gt;http://mattmahoney.net/dc/text.html&lt;/a&gt;)
 aren&#39;t ideal, either. I speculate that this is due to a much higher 
incidence of abbreviations and numerals, but I&#39;m hardly qualified.&lt;br /&gt;
&lt;br /&gt;
Another
 possibility would be to configure/customize ones web server to log in a
 pre-compressed format. I generally reject this out of hand, because it 
removes much of the self-documenting nature of verbose logs. Moreover, 
it can&#39;t predict the future to determine the frequency of a current log 
event. To do so would mean maintaining a buffer, which may as well be on
 the disk of another server, the current situation. Perhaps more to the 
point, my operational philosophy discourages burdening something 
critical like a web server with something ancillary like log 
compression.&lt;br /&gt;
&lt;br /&gt;
The best option I&#39;ve found so far is the PPMd 
algorithm, primarily as implemented in softwarey by the 7zip package. 
Specifically, with order 7 and 1GB of memory, a modern CPU will compress
 my web logs 10:1 at 10MB/s. Its main disadvantages are being memory 
heavy, with an identical footprint for compression and decompression and
 lack of parallel implementation.&lt;br /&gt;
&lt;br /&gt;
I don&#39;t yet have any good data,
 partly because of the fast pace of startups means the character of the 
logs I work with changes and partly because I rarely have the luxury of 
trying more than one method on the same data. However, once I do, I&#39;ll 
post some hard number comparisons between LZMA and PPMd with various 
tuning options.&lt;br /&gt;
&lt;br /&gt;
Next year, look for my musings on compression of database redo/write-ahead logs.</content><link rel='replies' type='application/atom+xml' href='http://blog.maxkalashnikov.com/feeds/9030994407537690852/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.maxkalashnikov.com/2011/01/compression-at-internet-scale.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/9030994407537690852'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/9030994407537690852'/><link rel='alternate' type='text/html' href='http://blog.maxkalashnikov.com/2011/01/compression-at-internet-scale.html' title='Compression at &quot;Internet&quot; scale (originally posted to StorageMonkeys November 22, 2009)'/><author><name>Max</name><uri>http://www.blogger.com/profile/04705387565124551855</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwu8ar9I34d4Doi9OHJhKuL2bDunqnI27v1lBBUEFdLbVy3NflbFTU9O2mtXYXgZFDqmS9_u9MFnHTH7WBr4L2ivvW5dxzEekrWvbCgW0ecu2wQSpX-iA966_Fi8PVg/s220/caee81e6dd93fec608e6cc26.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6067234520252164707.post-6041470497379992324</id><published>2011-01-21T13:23:00.000-08:00</published><updated>2011-01-22T11:01:22.285-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="professional"/><category scheme="http://www.blogger.com/atom/ns#" term="storagemonkeys"/><title type='text'>Storage on the cheap - lessons learned (originally posted to StorageMonkeys July 11, 2009)</title><content type='html'>Having purchased, assembled, configured, and turned up quite a number
 of storage arrays, where a major concern was total cost, I&#39;ve come up 
with something of a checklist of best practices.&lt;br /&gt;
&lt;br /&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;&lt;br /&gt;
Use cheap, 
commodity, desktop SATA drives. They&#39;re as good, if not better than, 
&quot;enterprise&quot; models. They&#39;re certainly cheaper per performance.&lt;br /&gt;
&lt;br /&gt;
If advanced administration, failover, or clustering features, such as from Veritas, are needed, use SAS HBAs.&lt;br /&gt;
&lt;br /&gt;
Otherwise, use SAS RAID cards. They tend to support more attached devices and may even be cheaper.&lt;br /&gt;
&lt;br /&gt;
Make
 sure to buy disks from multiple batches for use within a RAID. That is,
 have a mix of drive models and sub-models, manufacturers, and even end 
vendors.&lt;br /&gt;
&lt;br /&gt;
Bad batch syndrome is, potentially the most catastrophic. Corollary: Don&#39;t buy models so new that there&#39;s only one.&lt;br /&gt;&lt;br /&gt;Buy only drives which support NCQ. The price premium, if any, is neglible.&lt;br /&gt;
Even if there&#39;s no performance gain for a particular use case, there&#39;s no downside to having it turned on everywhere.&lt;br /&gt;
To that end, turn NCQ on for all new adapters connecting new disks.&lt;br /&gt;&lt;br /&gt;If coming into a legacy environment, turn off NCQ unless absolutely certain that all existing disks support it.&lt;br /&gt;Problems/corruption can be insidiously subtle.&lt;br /&gt;&lt;br /&gt;Before use, write (zeros are fastest) to the entire device. This will trigger any bad blocks to be reallocated.&lt;br /&gt;After that, run a SMART scan on the whole device and check for clean results. This will catch any (very rare) infant mortality.&lt;br /&gt;It also indelibly &quot;stamps&quot; the drive as having been tested.&lt;br /&gt;&lt;br /&gt;Install smartmontools on all servers. It&#39;s small and otherwise takes no resources.&lt;br /&gt;Running the smartd daemon is another matter. That&#39;s a monitoring concern.&lt;br /&gt;&lt;br /&gt;Turn on all the supported idle/background SMART tests supported by each device.&lt;br /&gt;&lt;br /&gt;Discard (permanently stop using) a disk at the first sign of trouble.&lt;br /&gt;A SMART error or even warning is trouble.&lt;br /&gt;A write error is trouble.&lt;br /&gt;A read error (assuming the disk has been zeroed) is trouble.&lt;br /&gt;A timeout, unless positively isolated to the disk itself, is not trouble.&lt;br /&gt;&lt;br /&gt;For external connectors, use only the screw-on type. For SAS, that&#39;s SFF-8470.&lt;br /&gt;This does mean spending more money.&lt;br /&gt;Often, one must use internal connections (e.g. SFF-8087) with an adapter.&lt;br /&gt;The
 latching connectors are all too easily disconnected (sometimes only 
partially, which can be worse than fully) and/or too fragile.&lt;br /&gt;&lt;br /&gt;Locate equipment such that storage cables can be short but have enough slack.&lt;br /&gt;Always provide good strain relief on all ext cables. This means cable ties at strategic points.&lt;br /&gt;Test for adequate slack and clearance by sliding all connected and neighboring equipment.&lt;br /&gt;&lt;br /&gt;Add between 3% and 6% (of active disks) hot spares. That should last 2-3 years without human intervention.&lt;br /&gt;By then, replace all the disk, not just the failed ones, as your failure rate will, otherwise, accelerate heavily.&lt;br /&gt;Time your transition to take advantage of technology and/or price improvements but assume closer to 2 years than 3.&lt;br /&gt;&lt;br /&gt;RAID1(+0) is far more flexible and simpler than RAID5. It performs much better in degraded and recovery modes.&lt;br /&gt;A good implementation can nearly double read performance, especially on contentious operations.&lt;br /&gt;It costs only 60% more than a 4 column (+1 parity) RAID5 or an 8 column RAID6.&lt;br /&gt;&lt;br /&gt;Don&#39;t oversubscribe the system bus.&lt;br /&gt;PCI-X 64bit@133MHz is only 1067MB/s half-duplex. (i.e. could be adequate for highly asymmetric read/write)&lt;br /&gt;PCIe x4 is 1000MB/s full-duplex.&lt;br /&gt;SAS 4-lane is 1200MB/s full-duplex.&lt;br /&gt;&lt;br /&gt;Once
 everything is assembled, measure these maximum throughputs. Do so at 
each layer, including the HBA/RAID card and each spindle.&lt;br /&gt;&lt;br /&gt;At each layer with a dirty region log (DRL) and/or journaling option, opt to use it.&lt;br /&gt;If
 practical, &quot;waste&quot; a whole spindle on it. Otherwise, locate it 
somewhere highly contentious or low-demand, such as the boot disk.&lt;br /&gt;&lt;br /&gt;Similarly, try simulating a failure at each layer and measure the recovery time. That will be the minimum under no load.&lt;br /&gt;&lt;br /&gt;If the block size an application or database uses can be tuned, raise it to the highest possible.&lt;br /&gt;Conversely, use the smallest supported stripe unit width size.&lt;br /&gt;Set number of columns such that full stripe width is an even multiple (or, better yet, factor) of block size.&lt;br /&gt;For RAID5, this usually means 4 (plus parity), 8, or (rarely) 16.&lt;br /&gt;4 columns plus parity is particularly well suited to PCIe-to-SAS hardware RAID5, since there&#39;s a 4:5 PCIe:SAS bandwidth ratio.&lt;br /&gt;&lt;br /&gt;For redundant components (e.g. cables, expanders, power supplies), test hot-swappability.&lt;br /&gt;Do so at different &quot;duty&quot; (simluated outage) cycles and flap rates.&lt;br /&gt;Test flip-flopping between the two components.&lt;br /&gt;
&lt;br /&gt;
If
 you can ever check all these off, I&#39;ll be impressed. Still, I hope it 
helps other cheapskates out there avoid a few pitfalls. &lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.maxkalashnikov.com/feeds/6041470497379992324/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.maxkalashnikov.com/2011/01/storage-on-cheap-lessons-learned.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/6041470497379992324'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/6041470497379992324'/><link rel='alternate' type='text/html' href='http://blog.maxkalashnikov.com/2011/01/storage-on-cheap-lessons-learned.html' title='Storage on the cheap - lessons learned (originally posted to StorageMonkeys July 11, 2009)'/><author><name>Max</name><uri>http://www.blogger.com/profile/04705387565124551855</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwu8ar9I34d4Doi9OHJhKuL2bDunqnI27v1lBBUEFdLbVy3NflbFTU9O2mtXYXgZFDqmS9_u9MFnHTH7WBr4L2ivvW5dxzEekrWvbCgW0ecu2wQSpX-iA966_Fi8PVg/s220/caee81e6dd93fec608e6cc26.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6067234520252164707.post-6755509773208621327</id><published>2011-01-21T13:21:00.001-08:00</published><updated>2011-01-22T11:01:22.286-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="professional"/><category scheme="http://www.blogger.com/atom/ns#" term="storagemonkeys"/><title type='text'>&quot;Dark&quot; storage: wastefulness or just good engineering? (originally posted to StorageMonkeys June 24, 2009)</title><content type='html'>Having recently read more and more discussion about so-called dark 
storage, I&#39;ve been reminded of something I routinely try to impress upon
 managers, especially clients: unless your use case is archiving, total 
bytes is a poor metric for storage.&lt;br /&gt;
&lt;br /&gt;
In fact, the term &quot;storage&quot; 
itself may be partly to blame for the continued misconception. One need 
only glance at the prices of commodity disks to recognize that there 
isn&#39;t anything near a linear relationship between cost and bytes stored.&lt;br /&gt;
&lt;a href=&quot;http://www.blogger.com/post-edit.g?blogID=6067234520252164707&amp;amp;postID=6755509773208621327&quot; name=&quot;readmore&quot;&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;&lt;br /&gt;
A
 quarter century ago was the golden age of the mini-computer, and the 
reign of the micro- was dawning. The Fujitsu Eagle was, at least in the 
semiconductor industry here in Silicon Valley, very popular, so it will 
be my yardstick. At a third of a gigabyte in usable space and just under
 1.9MB/s, one could read or write the whole thing in just under 3 
minutes. Today, a 1.5TB Barracuda is 4500 times the size but only 66 
times the throughput, so it takes over 3 hours to go through the whole 
thing. A 6th-generation 450GB Cheetah is better, at under an hour.&lt;br /&gt;
&lt;br /&gt;
I
 like the Eagle&#39;s 3 minutes as a rule of thumb. That&#39;s 21GB on larger, 
modern, 7200 RPM disks, and I suggest that everything beyond that may as
 well be considered superfluous or archive storage. Accepting this 
measure end-to-end means that one would only want 72GB accessible to a 
host off each 4Gb/s FC or 216GB per 4x SAS. Ouch.&lt;br /&gt;
&lt;br /&gt;
A whitepaper 
from Xiotech criticizes storage vendors&#39; performance numbers as being 
misleading, since they are based on short-stroking benchmarks, rather 
than representing the performance of the whole disk.&lt;br /&gt;
&lt;br /&gt;
I suggest 
that short-stroking disks as a matter of course and leaving the rest 
purposefully &quot;dark&quot; is smart engineering. Suddenly, those 160GB drives 
look much more appealing than the 1.5TB ones, at least for 
performance-sensitive uses, such as databases.&lt;br /&gt;
&lt;br /&gt;
Certainly, there 
are use cases where data beyond the 3 minute limit is still useful: 
anything that rarely, if ever, gets read. That tends to include backups,
 archives, audit trails, and even database intent logs. One may be able 
to have all these coexist on the same spindles as the &quot;high performance&quot;
 uses, but it would require careful forethought and testing.&lt;br /&gt;
&lt;br /&gt;
My 
21GB example with a 160GB disk means 87% &quot;dark,&quot; to simulate an Eagle. 
It&#39;s a high percentage but nothing to be alarmed about, as long as it&#39;s 
done with full awareness.</content><link rel='replies' type='application/atom+xml' href='http://blog.maxkalashnikov.com/feeds/6755509773208621327/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.maxkalashnikov.com/2011/01/dark-storage-wastefulness-or-just-good.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/6755509773208621327'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/6755509773208621327'/><link rel='alternate' type='text/html' href='http://blog.maxkalashnikov.com/2011/01/dark-storage-wastefulness-or-just-good.html' title='&quot;Dark&quot; storage: wastefulness or just good engineering? (originally posted to StorageMonkeys June 24, 2009)'/><author><name>Max</name><uri>http://www.blogger.com/profile/04705387565124551855</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwu8ar9I34d4Doi9OHJhKuL2bDunqnI27v1lBBUEFdLbVy3NflbFTU9O2mtXYXgZFDqmS9_u9MFnHTH7WBr4L2ivvW5dxzEekrWvbCgW0ecu2wQSpX-iA966_Fi8PVg/s220/caee81e6dd93fec608e6cc26.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6067234520252164707.post-1714410547744725400</id><published>2011-01-21T13:18:00.002-08:00</published><updated>2011-01-22T11:01:22.286-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="professional"/><category scheme="http://www.blogger.com/atom/ns#" term="storagemonkeys"/><title type='text'>Why are DRAM SSDs so pricey? (originally posted to StorageMonkeys June 10, 2009 )</title><content type='html'>As a UNIX veteran who has a vague recollection of /dev/drum, I keep 
thinking that it would be really nice to have a device to swap to that&#39;s
 somewhere between disk and memory in terms of speed and cost (total 
installed cost, not just each module).&lt;br /&gt;
&lt;br /&gt;
Mostly, I feel 
constrained by the 32-48GB limits on moderately priced ($1-3k) servers. 
To go higher, for even modest processor speeds, is a $5-$10k premium.  
Moreover, DRAM doesn&#39;t really wear out, and it would be nice to put 
older, lower density modules to use. &lt;br /&gt;
&lt;br /&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;&lt;br /&gt;
The trouble is, what I&#39;ve 
found so far is either very low capacity, priced much higher than the 
memory modules themselves, or both.  I&#39;m not particularly interested in 
adding 4GB of fast swap to a 48GB machine, though ACARD has something 
for $250 with a 48GB limit, with high density modules, defeating my 
second purpose. Similarly, I&#39;m not interested in paying $10k for 16GB of
 RAM SSD ($625/GB?!) when I could just dump that money into the base 
server and get much faster access. &lt;br /&gt;
&lt;br /&gt;
I&#39;m not a hardware guy (in 
the EE sense), so I&#39;m genuinely curious about this. Is it really that 
difficult/expensive to stick a memory controller (northbridge?) onto a 
SATA interface? Am I being too cynical in assuming that it&#39;s mere market
 &quot;segmentation&quot; without a low-end consumer segment? &lt;br /&gt;
&lt;br /&gt;
What I 
described already exists with the name &quot;motherboard,&quot; but the software 
package &quot;scst&quot; seems woefully incomplete. For example, the MPT-Fusion 
driver is still described as &quot;alpha&quot; or early development, so I&#39;m not 
holding my breath on reliability, let alone performance. I&#39;m sure 
participation by the vendors would help. LSI, are you listening?</content><link rel='replies' type='application/atom+xml' href='http://blog.maxkalashnikov.com/feeds/1714410547744725400/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.maxkalashnikov.com/2011/01/why-are-dram-ssds-so-pricey-originally.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/1714410547744725400'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/1714410547744725400'/><link rel='alternate' type='text/html' href='http://blog.maxkalashnikov.com/2011/01/why-are-dram-ssds-so-pricey-originally.html' title='Why are DRAM SSDs so pricey? (originally posted to StorageMonkeys June 10, 2009 )'/><author><name>Max</name><uri>http://www.blogger.com/profile/04705387565124551855</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwu8ar9I34d4Doi9OHJhKuL2bDunqnI27v1lBBUEFdLbVy3NflbFTU9O2mtXYXgZFDqmS9_u9MFnHTH7WBr4L2ivvW5dxzEekrWvbCgW0ecu2wQSpX-iA966_Fi8PVg/s220/caee81e6dd93fec608e6cc26.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6067234520252164707.post-4987575597655973906</id><published>2011-01-21T13:17:00.001-08:00</published><updated>2011-01-22T11:01:22.287-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="professional"/><category scheme="http://www.blogger.com/atom/ns#" term="storagemonkeys"/><title type='text'>All about the Benjamins (originally posted to StorageMonkeys June 9, 2009)</title><content type='html'>The choice of the unit of measure of storage is interesting to me 
because it&#39;s otherwise tought to measure price for performance.&lt;br /&gt;
&lt;br /&gt;
I
 remain agape at the price tag on high-end, supposedly high-performance,
 storage systems. Connected by FibreChannel or gigabit Ethernet, that&#39;s a
 limit of 400 and 110 MB/s, respectively. (Yes, I know of 8Gb/s FC and 
10GE, but these are prohibitively expensive, if supported.Even 
link-aggregated GigE practically tops out at 880MB/s) I&#39;m thinking that 
writes across 40 7200RPM disks could saturate an FC link, and it would 
take fewer than 20 15k disks. Neither of these strikes me as impractical
 or unusual sizes of storage arrays, even doubling those numbers for 
RAID 1. More importantly, such arrays don&#39;t strike me as high 
performance.&lt;br /&gt;
&lt;br /&gt;
Particularly shocking is that a brand name &quot;SAN&quot; 
solution of such a size would cost in the neighborhood of a quarter 
million dollars and be at its performance limit. Granted, it might be 
half that price without fancy management and replication software. 
whereas the less fancy alternative, at one tenth to one fifth the cost, 
would still be expandable from a performance standpoint. How much does 
the Veritas database suite cost these days?&lt;br /&gt;
&lt;br /&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;&lt;br /&gt;
The cheaper 
alternative, which I have implemented and benchmarked, is using 
Serial-Attached SCSI (SAS) instead of FibreChannel and commodity SATA 
disks instead of 10k or 15k spindles.Although it&#39;s not necessarily &quot;SAN&quot;
 in the marketing sense, SAS readily supports multiple hosts per bus. 
It&#39;s also typically implemented as 4x 300MB/s channels on one connector 
for interfacing to expanders (a rough equivalent to FC switches). An x4 
PCIe slot is actually the limiting throughput factor for one of these, 
as each x1 lane is only 250MB/s. Even with RAID1, rolling my own array 
would cost $25k (including labor), maybe double that for Dell brand 
MD1000s. One could then spend twice again the same amount to get triple 
the throughput on the same server(s), before running up against the 
limit. Additional fanciness can be gained from 3rd-party storage 
software vendors, especially in this economy, for under 6 figures.&lt;br /&gt;
&lt;br /&gt;
That&#39;s
 for truly random I/O. For sequential I/O, such as for logs, the 
situation is even more egregious: only 4 7.2k spindles would saturate a 
(dedicated) FC link. If it&#39;s paired for redundancy, one would need a 
second pair for the non-sequential, perhaps introducing some management 
complexity, unless FC link aggregation becomes common enough to be 
standardized.&lt;br /&gt;
&lt;br /&gt;
Another issue I&#39;ve had come up in conversation is reliability and/or maintenance. This &lt;a href=&quot;http://usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html&quot;&gt;Usenix paper&lt;/a&gt;
 belies the notion that SATA disks are any less reliable than others. 
With a 3-6% annualized replacement rate, that&#39;s 2-5 disks per year, or 
about 15% or 12 disks over 2.5 years, on an 80-disk array. I&#39;ve actually
 already included this (4 spares per 20 non-spares) in the $25k above.&lt;br /&gt;
&lt;br /&gt;
Somewhere
 between 2 and 3 years, you&#39;re going to have to bite the bullet, spend 
another $25k for twice as much space, and migrate the old data, assuming
 you&#39;re not already upgrading for other reasons. Woe is you. You&#39;ll just
 have to resort to drowning your sorrows in the hundreds of grand you 
saved, never mind the headache of shipping disks back and forth.&lt;br /&gt;
&lt;br /&gt;
The Storage Emperor&#39;s new clothes are looking mighty skimpy, indeed.</content><link rel='replies' type='application/atom+xml' href='http://blog.maxkalashnikov.com/feeds/4987575597655973906/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.maxkalashnikov.com/2011/01/all-about-benjamins-originally-posted.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/4987575597655973906'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6067234520252164707/posts/default/4987575597655973906'/><link rel='alternate' type='text/html' href='http://blog.maxkalashnikov.com/2011/01/all-about-benjamins-originally-posted.html' title='All about the Benjamins (originally posted to StorageMonkeys June 9, 2009)'/><author><name>Max</name><uri>http://www.blogger.com/profile/04705387565124551855</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwu8ar9I34d4Doi9OHJhKuL2bDunqnI27v1lBBUEFdLbVy3NflbFTU9O2mtXYXgZFDqmS9_u9MFnHTH7WBr4L2ivvW5dxzEekrWvbCgW0ecu2wQSpX-iA966_Fi8PVg/s220/caee81e6dd93fec608e6cc26.png'/></author><thr:total>0</thr:total></entry></feed>