<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;CU8BR3k9eip7ImA9WhVXEkg.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128</id><updated>2012-04-12T11:04:16.762-07:00</updated><category term="scalable database" /><category term="distributed" /><category term="TPCC" /><category term="MySQL" /><category term="appliance" /><category term="cluster" /><category term="web" /><category term="SSD" /><category term="Conference" /><category term="Web 2.0 Expo" /><category term="internet" /><category term="use case" /><category term="performance" /><category term="No SQL" /><category term="architects" /><category term="scaling" /><category term="architecture" /><category term="ACID" /><category term="database" /><category term="Percona" /><title>Clustrix</title><subtitle type="html" /><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://blog.clustrix.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://blog.clustrix.com/" /><author><name>Sergei</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="24" height="32" src="http://3.bp.blogspot.com/_VHQJkYQ5-dY/SoixbUYDYOI/AAAAAAAAAro/so2zLlfjOuc/S220/profile.jpg" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>21</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/Clustrix" /><feedburner:info uri="clustrix" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;AkQMSHs5fCp7ImA9WhRbEU8.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-1511784577488573308</id><published>2012-02-01T12:28:00.000-08:00</published><updated>2012-02-01T12:33:09.524-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-01T12:33:09.524-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="appliance" /><category scheme="http://www.blogger.com/atom/ns#" term="SSD" /><title>Why Build an Appliance?</title><content type="html">&lt;div class="entry-content"&gt;
Clustrix sells appliances.  We marry our 
software with industry  standard hardware to make plug and play devices 
that drop in and work on  the network.  This is same model as my former company Isilon (now EMC), NetApp, and pretty much all successful storage vendors.&amp;nbsp; Why do this instead of selling software by itself?&lt;br /&gt;
&lt;br /&gt;
&lt;table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-g8AiCqtks5g/Tyma0mTdz-I/AAAAAAAAAA8/7hhdvbFwQKo/s1600/SSD+320+angle+right+1to1.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="204" src="http://4.bp.blogspot.com/-g8AiCqtks5g/Tyma0mTdz-I/AAAAAAAAAA8/7hhdvbFwQKo/s320/SSD+320+angle+right+1to1.jpg" width="320" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Qualifying SSDs is particularly tricky because of the huge variation in quality.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
The number one reason for the appliance 
model is quality control.  By  reducing the supported hardware set, we 
drastically increase the QA  time we get on the intended hardware.  QA 
time looking for hardware  interactions really matter in a product that 
stores customers’ data.   The bar is simply set much higher there.&amp;nbsp; As an example, we do extensive testing of 
durability of data on our  specific hardware.  Depending on the hard 
drive controller, controller  firmware, drives, and drive firmware, we 
have gotten very different  results on exactly when a specific piece of 
data makes it to stable storage.   Many drives and even controllers lie about when the data is safe.   Some drives implement tagged queuing or FUA poorly.  I’ve even seen  drives return from a sync command without 
actually having synced the  data.  The only way to ensure data integrity
 on a storage system is to  properly characterize the hardware with an extensive test regime and control every 
piece of that storage system.   With that control, we can form 
relationships with the vendors to fix  bugs that we expose in the 
hardware and firmware.  At Clustrix, we have a  variety of different 
pieces of test software to exercise the disk  subsystem and verify the 
data is safe every time.  With that test  software, we have rejected 
many pieces of hardware and countless  versions of firmware that didn’t 
make the cut.  The same rigorous  process goes into qualifying 
networking, Infiniband, NVRAM, processing,  and memory components.  This
 kind of focused testing and qualification  is not possible on a 
software-only product.&lt;br /&gt;
&lt;br /&gt;
The second critical benefit of the 
appliance model is much tighter  integration between hardware and 
software.  At Clustrix, we are able to  monitor the hardware in the box 
and present that data seamlessly in our  “system” database.  For 
example, “SELECT * from system.memory” will give  you all the details 
for the memory installed on the system and tell you  if there are any 
correctable or uncorrectable ECC errors.  We have  logic to send alerts 
on correctable errors and safely shut down the node  on an uncorrectable
 error.  On Clustrix, this hardware-specific data  sits right along side
 of database-specific data like queries per second  and disk full 
percentage which allows exceptionally easy integration  with tools like 
Nagios and Cacti.  This sort of tight integration is  only possible on 
an appliance.&lt;br /&gt;
&lt;br /&gt;
Finally, being an appliance makes the Clustrix 
database much easier  to install, manage, and use.  Creating a high 
performance, bullet-proof  database is no longer a science project.  You
 no longer have to put  together the pieces, get all the right versions 
of firmware, get the  right versions of the kernel and libraries, the 
right version of the  database software and make sure all the parts are 
tuned to work  together.  The Clustrix appliance is an integrated and 
tuned database  right out of the box.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-1511784577488573308?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/816shaKk_6I" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/1511784577488573308/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=1511784577488573308" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/1511784577488573308?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/1511784577488573308?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/816shaKk_6I/why-build-appliance.html" title="Why Build an Appliance?" /><author><name>Aaron Passey</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-g8AiCqtks5g/Tyma0mTdz-I/AAAAAAAAAA8/7hhdvbFwQKo/s72-c/SSD+320+angle+right+1to1.jpg" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2012/02/why-build-appliance.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0cMQXs9eyp7ImA9WhRbEEg.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-5793605014732947273</id><published>2012-01-31T15:04:00.000-08:00</published><updated>2012-01-31T15:04:40.563-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-31T15:04:40.563-08:00</app:edited><title>Scripting SSH with Python</title><content type="html">The other day Aaron Passey, our CTO, pointed out to me that I've written
 the same bit of code in different languages at each of the last 3 
places I've worked. I figured that since this seems to be something 
commonly useful, yet not obviously available, I'd write something about 
it.&lt;br /&gt;
&lt;br /&gt;
The scenario is that we have some code which wants to do remote ssh 
calls. Some variant of this code exists within the Isilon cluster 
management code, and we use it at Clustrix within our clx command line 
tool, as well as the database update scripts. We're already running on a
 Unix box (or variant (osx, linux, etc)) and we have access to an ssh 
client, so this it totally do-able from code, but not immediately 
obvious how.&lt;br /&gt;
&lt;br /&gt;
I'll present this in Python, but the same applies to C, or any other
 language. I'm aware of the Paramiko library for Python which is 
supposed to have support for this. That may do everything this can do - I
 don't know. This is 100 lines of code, pretty easy to follow, and maybe
 instructive. This is all about utilizing ssh and our friendly posix 
primitives.&amp;nbsp; After the walkthrough, I've included the entire source at 
the bottom.&lt;br /&gt;
&lt;br /&gt;
The obvious thing that we'd like to do is fork off an ssh process 
and read the stdin. The easy way to do this in python is with 
os.popen2(). This will give us back the stdin and stdout:&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;(sin, sout) = os.popen2(cmd)&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
This, however, will not work. ssh wants a psuedo tty (a pty). If it's not 
running in one, it just exits. This is where the helpful python pty 
class comes in:&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;(pid, f) = pty.fork()&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Now we've got ssh 
running in the right environment.&amp;nbsp; The two args we got back are 
important: pid is the process id of our forked ssh process, and f is a 
unix fileno (not to be confused with a python file handle) which is the 
combined stdin and stdout of the process. It's important to remember 
that f isn't something reference counted. We're going to need to 
explicitly close it.&lt;br /&gt;
&lt;br /&gt;
Now that we have the basic mechanism in place, let's build us a little ssh class:&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;class SSH:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def __init__(self, ip, passwd, user, port):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.ip = ip&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.passwd = passwd&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.user = user&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.port = port&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
This is structured so you can create one
 SSH object per target box and reuse it to do different commands. We'll 
also include the ability to push and pull files. Our first method will 
be the command handler. It will take only one argument (other than 
self), which will be the command to run:&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def run_cmd(self, c):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (pid, f) = pty.fork()&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if pid == 0:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; os.execlp("ssh", "ssh", '-p %d' % self.port,&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.user + '@' + self.ip, c)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return (pid, f)&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
As you can see, the 
pty fork works just like the os fork in that if the pid is 0 it means 
we're the child, and if non 0 it means we're the parent.&lt;br /&gt;
&lt;br /&gt;
Since 
this is a raw unix fileno, the file closed condition is a little weird. 
Reading it will block until something is available and then return 
results. But when the descriptor closes it throws an os error. I'd 
rather be able to handle the reads just in a loop (or maybe it's because
 I'm an old C programmer) so I'm going to wrap the read:&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def _read(self, f):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; x = ''&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; try:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; x = os.read(f, 1024)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; except Exception, e:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # this always fails with io error&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; pass&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return x&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Once we've got this thing forked, and can read from it, we need to 
get our results back out. There's on additional thing we need to be 
prepared for: ssh might want to ask us some questions. We've all seen 
this before:&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;harmony:~$ ssh paulmini&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;The authenticity of host 'paulmini (10.1.2.125)' can't be established.&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;RSA key fingerprint is 7e:91:5d:5d:06:fe:3f:24:94:84:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div id=":1j9"&gt;
&lt;wbr style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;/wbr&gt;&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;c0:75:96:8c:d1:f1.&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;Are you sure you want to continue connecting (yes/no)? &lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
We just want to say "yes" and move on. ssh might also ask us for a 
password if we don't have host keys enabled and we'll need to be 
prepared to handle that. We'll handle all of these actions in an 
ssh_results method:&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def ssh_results(self, pid, f):&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
First, let's initialize our output buffer:&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output = ""&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Now
 we'll read our first chunk and see if ssh is asking anything of us. If 
they want to know if we really want to continue connecting because the 
target isn't in ssh/known_hosts, we'll say yes. If they ask us for the 
password we'll provide it.&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; got = self._read(f)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # check for authenticity of host request&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; m = re.search("authenticity of host", got)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if m:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; os.write(f, 'yes\n')&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Read until we get ack&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while True:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; got = self._read(f)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; m = re.search("Permanently added", got)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if m:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; break&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; got = self._read(f)&lt;/span&gt;&lt;/span&gt;

&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # check for passwd request&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; m = re.search("assword:", got)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if m:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # send passwd&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; os.write(f, self.passwd + '\n')&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # read two lines&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; tmp = self._read(f)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; tmp += self._read(f)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; m = re.search("Permission denied", tmp)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if m:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; raise Exception("Invalid passwd")&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # passwd was accepted&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; got = tmp&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Preliminaries done, we can now entire our results loop:&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while got and len(got) &amp;gt; 0:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output += got&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; got = self._read(f)&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
We've
 know we've now ready everything because our _read returned empty. This 
also means that the ssh process has ended. There's a defunct zombie 
process sitting there and we're going to have to clean that up. We could
 handle the SIGCHLD and waitpid it, but in this case since we know the 
pid and we know the process is done, it's much simpler. Also, remember 
that since f isn't a reference counted Python object we're going to need
 to manually clean that up:&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; os.waitpid(pid, 0)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; os.close(f)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return output&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
And
 that's basically it. With some niceties for pushing and pulling files, 
and some error handling, the completed code looks like this:&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;#&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;# Remote ssh cmds&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;#&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;import pty, re, os, sys, stat, getpass&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;class SSHError(Exception):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def __init__(self, value):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.value = value&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def __str__(self):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return repr(self.value)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;class SSH:&lt;/span&gt;&lt;/span&gt;

&lt;span style="font-size: x-small;"&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def __init__(self, ip, passwd, user, port):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.ip = ip&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.passwd = passwd&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.user = user&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.port = port&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def run_cmd(self, c):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (pid, f) = pty.fork()&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if pid == 0:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; os.execlp("ssh", "ssh", '-p %d' % self.port,&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.user + '@' + self.ip, c)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return (pid, f)&lt;/span&gt;&lt;/span&gt;

&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def push_file(self, src, dst):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (pid, f) = pty.fork()&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if pid == 0:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; os.execlp("scp", "scp", '-P %d' % self.port,&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; src, self.user + '@' + self.ip + ':' + dst)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return (pid, f) &lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def push_dir(self, src, dst):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (pid, f) = pty.fork()&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if pid == 0:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; os.execlp("scp", "scp", '-P %d' % self.port, "-r", src,&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.user + '@' + self.ip + ':' + dst)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return (pid, f)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def _read(self, f):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; x = ''&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; try:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; x = os.read(f, 1024)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; except Exception, e:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # this always fails with io error&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; pass&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return x&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def ssh_results(self, pid, f):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output = ""&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; got = self._read(f)&lt;/span&gt;&lt;/span&gt;

&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # check for authenticity of host request&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; m = re.search("authenticity of host", got)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if m:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; os.write(f, 'yes\n') &lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Read until we get ack&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while True:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; got = self._read(f)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; m = re.search("Permanently added", got)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if m:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; break&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; got = self._read(f)&lt;/span&gt;&lt;/span&gt;

&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # check for passwd request&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; m = re.search("assword:", got)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if m:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # send passwd&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; os.write(f, self.passwd + '\n')&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # read two lines&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; tmp = self._read(f)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; tmp += self._read(f)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; m = re.search("Permission denied", tmp)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if m:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; raise Exception("Invalid passwd")&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # passwd was accepted&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; got = tmp&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while got and len(got) &amp;gt; 0:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output += got&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; got = self._read(f)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; os.waitpid(pid, 0)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; os.close(f)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return output&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def cmd(self, c):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (pid, f) = self.run_cmd(c)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return self.ssh_results(pid, f)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def push(self, src, dst):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; s = os.stat(src)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if stat.S_ISDIR(s[stat.ST_MODE]):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (pid, f) = self.push_dir(src, dst)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (pid, f) = self.push_file(src, dst)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return self.ssh_results(pid, f)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;def ssh_cmd(ip, passwd, cmd, user=getpass.getuser(), port=22):&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; s = SSH(ip, passwd, user, port)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; return s.cmd(cmd)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;def ssh_push(ip, passwd, src, dst, user=getpass.getuser(), port=22):&lt;/span&gt;&lt;/span&gt;
&lt;span style="font-size: x-small;"&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; s = SSH(ip, passwd, user, port)&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return s.push(src, dst)&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-5793605014732947273?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/wJq0ZMCmtKE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/5793605014732947273/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=5793605014732947273" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/5793605014732947273?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/5793605014732947273?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/wJq0ZMCmtKE/scripting-ssh-with-python.html" title="Scripting SSH with Python" /><author><name>Paul Mikesell</name><uri>http://www.blogger.com/profile/04944019448874102240</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><feedburner:origLink>http://blog.clustrix.com/2012/01/scripting-ssh-with-python.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUQESHc4fSp7ImA9WhRUFkQ.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-7888993520831085718</id><published>2012-01-27T11:33:00.000-08:00</published><updated>2012-01-27T11:41:49.935-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-27T11:41:49.935-08:00</app:edited><title>Running a devnode cluster across multiple boxes</title><content type="html">In my last post, we used &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl&lt;/span&gt; to fire up a simple 3-node cluster, with all nodes running on the same system. &amp;nbsp;This is interesting insofar as it demonstrates the functionality of the cluster, but we're certainly quite resource limited trying to emulate three nodes on one box. &amp;nbsp;In this post, I'll show how to get &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnode&lt;/span&gt; running on different physical servers, so you can begin to see the potential of horizontal scaling available with Clustrix. &lt;br /&gt;
&lt;div&gt;
Along the way I'll also cover:&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;Manual control of &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnode&lt;/span&gt; (vs. &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl&lt;/span&gt;)&lt;/li&gt;
&lt;li&gt;How, where, and what to look for in the logs&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"&gt;Requirements&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;Three Red Hat/CentOS 6 or equivalent clients&lt;/li&gt;
&lt;li&gt;All clients must be on the same subnet&lt;/li&gt;
&lt;li&gt;The clients should NOT be running &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysqld&lt;/span&gt; (so port 3306 will be free to &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnode&lt;/span&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
&lt;span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"&gt;Firing up the devnode instances&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
First install the DevKit RPMs, as described in &lt;a href="http://blog.clustrix.com/2012/01/getting-started-with-clustrix-devkit.html"&gt;my last blog post&lt;/a&gt;, on each of your clients, and ensure you have a writeable working directory available on each (I'll use&amp;nbsp;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;/data/clustrix&lt;/span&gt; on each). &amp;nbsp;If you followed along on the last exercise, please stop those nodes (&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl stop&lt;/span&gt;), and I also recommend cleaning out the prior state with &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;rm -rf /data/clustrix/*&lt;/span&gt;. &amp;nbsp;The flags we'll specify below will overwrite old stuff as needed, but you'd still have the old nodes' 2 and 3 state on your first client, which might become confusing. &amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&amp;nbsp;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
Now we're going to start up devnode directly, and also change the flags around quite a bit. &amp;nbsp;I recommend opening three terminal windows, one for each node; connect (ssh, presumably) to each of your clients with these nodes, then run the following, one in each&amp;nbsp;window:&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;client1$ /opt/clustrix/bin/devnode -clusterpath /data/clustrix -setpnid 1 -eth eth0 -clean -nclean 4 -vdev-size 2048 -logfile -&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;client2$ /opt/clustrix/bin/devnode -clusterpath /data/clustrix -setpnid 2 -eth eth0 -clean -nclean 4 -vdev-size 2048 -logfile - -noautostart&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;client3$ /opt/clustrix/bin/devnode -clusterpath /data/clustrix -setpnid 3 -eth eth0 -clean -nclean 4 -vdev-size 2048 -logfile - -noautostart&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
The &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-clusterpath&lt;/span&gt;, &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-nclean&lt;/span&gt;, and &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-vdev-size&lt;/span&gt; flags we talked about last time (recap: where does the simulated node store it's data, and how many/big disks should we have -- note that I'm allocating 8GB per node here). &amp;nbsp;Let's pick apart the other flags here:&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-setpnid&lt;/span&gt; sets a Physical Node ID -- normally this would come from a node's MAC address&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-eth&lt;/span&gt;&amp;nbsp;tells devnode to use the ethernet interface for inter-node communication; when we created our three node cluster on the same physical machine with devnodectl, it specified -unix to use UNIX sockets instead. &amp;nbsp;On real nodes, we'd be using InfiniBand for this purpose.&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-clean&lt;/span&gt; tells the cluster to wipe all prior state&amp;nbsp;&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-noautostart&lt;/span&gt; for the second and third nodes avoids a devnode restart step, as will be explained further below&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-logfile -&lt;/span&gt; means to log to stdout instead of to a log file&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
This last option is how I prefer to run, because staring at logfiles is how I live. &amp;nbsp;For our purposes today I think it will be most instructive for you as well. &amp;nbsp;It should be noted that normally these logs go to &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;/data/clustrix/p1/devnode.log&lt;/span&gt; (substitute &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;p2&lt;/span&gt;, &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;p3&lt;/span&gt;, etc. for other nodes). &amp;nbsp;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
If a bunch of FATAL errors scroll by, the most likely culprit is a port conflict:&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;2012-01-25 11:22:38 ERROR cp/cp_sock.c:74 cp_bind(): stream_listen(IPv4(0.0.0.0:2048)): Address already in use&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;2012-01-25 11:22:38 FATAL core/segv.c:93 main_segv_handler(): Program received a fatal signal on core 0 &amp;nbsp;fiber 0&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;2012-01-25 11:22:38 FATAL core/segv.c:95 main_segv_handler(): C stack trace:&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;0x000000000052d976&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;bind_done()&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;&amp;lt;no lines read&amp;gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;0x00000000008c7c91&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;scheduler_run_one_item()&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;&amp;lt;no lines read&amp;gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;0x00000000008c8893&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;scheduler_main_loop()&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;&amp;lt;no lines read&amp;gt;&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
The above indicates that port 2048 (the control port, which will cover later) is already in use. &amp;nbsp;You'd see this if you tried to run the above commands on the same box, instead of 3 different boxes. &amp;nbsp;If you're running &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysqld&lt;/span&gt; on one of your boxes, it will fail thusly:&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;n1 2012-01-25 11:37:47 ERROR mysql/server/mysql_proto.c:1171 listen_on_port(): stream_listen(IPv4(0.0.0.0:3306)): Address already in use&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;n1 2012-01-25 11:37:47 FATAL dbcore/dbstate.c:104 dbconf_done(): Error handling dbconf chain: Address already in use: dbconf/INIT_MYSQL_PROTO failed (unable to create TCP socket)&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
You can work around these by specifying the &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-anyport&lt;/span&gt; flag, in which case you'll need to look back through the logs to find which port it's chosen:&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;2012-01-25 11:41:17 INFO dbcore/driver.ct:90 driver_publish_address(): pnid p3 &lt;b&gt;control port 33274&lt;/b&gt; sw f71a74d89c512ac&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;n1 2012-01-25 11:41:17 INFO dbcore/driver.ct:90 driver_publish_address(): pnid p3 &lt;b&gt;mysql port 36290&lt;/b&gt; sw f71a74d89c512ac&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"&gt;Adding nodes 2 and 3 to your cluster&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
Normally fresh Clustrix nodes start up in a "cluster-of-one"; you can connect to any node and then pull in the others to form a larger cluster. &amp;nbsp;This mechanism involves a process restart (on real nodes, an initd-like process called nanny takes care of this); to avoid this, we used the &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-noautostart&lt;/span&gt; flag when starting nodes 2 and 3, so they don't start up as "cluster-of-one", and can't be accessed via mysql until they are added to a cluster.&lt;/div&gt;
&lt;div&gt;
So, connect to your first client (client1 above, the one where you did not have the -noautostart flag):&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[nparrish@hefty mainline1]$ mysql -h beta001 -u root&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; use system;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Reading table information for completion of table and column names&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;You can turn off this feature to get a quicker startup with -A&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Database changed&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; select * from available_node_details;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+------+------------+----------------------------------------------------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| pnid | name &amp;nbsp; &amp;nbsp; &amp;nbsp; | value &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+------+------------+----------------------------------------------------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p3 &amp;nbsp; | cluster &amp;nbsp; &amp;nbsp;| nparrish &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; |&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p3 &amp;nbsp; | sw_version | 1112854534402937516 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p3 &amp;nbsp; | version &amp;nbsp; &amp;nbsp;| 5.0.45-clustrix-v3.2-7371-0f71a74d89c512ac-release |&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p3 &amp;nbsp; | iface_ip &amp;nbsp; | 10.2.12.188 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p3 &amp;nbsp; | iface_mac &amp;nbsp;| 00:25:90:34:69:04 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p3 &amp;nbsp; | hostname &amp;nbsp; | loeb.colo.sproutsys.com &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p3 &amp;nbsp; | started &amp;nbsp; &amp;nbsp;| 2012-01-25 20:44:33.037318 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; |&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p2 &amp;nbsp; | cluster &amp;nbsp; &amp;nbsp;| nparrish &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; |&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p2 &amp;nbsp; | sw_version | 1112854534402937516 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p2 &amp;nbsp; | version &amp;nbsp; &amp;nbsp;| 5.0.45-clustrix-v3.2-7371-0f71a74d89c512ac-release |&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p2 &amp;nbsp; | iface_ip &amp;nbsp; | 10.2.12.194 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p2 &amp;nbsp; | iface_mac &amp;nbsp;| 00:25:90:34:70:0a &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p2 &amp;nbsp; | hostname &amp;nbsp; | sainz.colo.sproutsys.com &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; |&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| p2 &amp;nbsp; | started &amp;nbsp; &amp;nbsp;| 2012-01-24 23:52:17.085502 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; |&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+------+------------+----------------------------------------------------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;14 rows in set (0.00 sec)&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
So we're looking at &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;system.available_node_details&lt;/span&gt;, which shows the nodes that can be seen on the network (here eth0, on real nodes via InfiniBand), who are not already part of another cluster. &amp;nbsp;It's sort of a key/value pair table, extended to include the pnid (Physical Node ID -- recall we set this with &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-setpnid&lt;/span&gt;). &amp;nbsp;We're really most interested in the hostname -- I can see my two other clients, great. &amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Now we add these nodes into the cluster using the ALTER CLUSTER query:&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; alter cluster add p2;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Query OK, 0 rows affected (0.00 sec)&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; alter cluster add p3;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Query OK, 0 rows affected (0.02 sec)&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; select * from nodeinfo\G&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;*************************** 1. row ***************************&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; nodeid: 1&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;started: 2012-01-25 20:50:50&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ntptime: 2012-01-25 20:54:13&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp;node uptime: 2012-01-20 23:15:52&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; hostname: beta001.colo.sproutsys.com&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; iface_name: eth0&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; iface_ip: 10.2.13.11&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;iface_mac_addr: 00:30:48:c3:e7:5c&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; pnid: p1&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;cores: 12288&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;*************************** 2. row ***************************&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; nodeid: 3&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;started: 2012-01-25 20:50:57&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ntptime: 2012-01-25 20:54:13&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp;node uptime: 2011-04-12 01:05:08&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; hostname: loeb.colo.sproutsys.com&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; iface_name: eth0&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; iface_ip: 10.2.12.188&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;iface_mac_addr: 00:25:90:34:69:04&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; pnid: p3&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;cores: 805313044&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;*************************** 3. row ***************************&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; nodeid: 2&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;started: 2012-01-25 20:50:53&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ntptime: 2012-01-25 20:54:13&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp;node uptime: 2011-10-25 17:31:38&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; hostname: sainz.colo.sproutsys.com&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; iface_name: eth0&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; iface_ip: 10.2.12.194&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;iface_mac_addr: 00:25:90:34:70:0a&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; pnid: p2&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;cores: 3145744&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;3 rows in set (0.01 sec)&lt;/span&gt;&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
And you're ready to rock and roll. &amp;nbsp;(Yes, the &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;cores&lt;/span&gt; value is a little funny -- on Clustrix nodes this would tell you how many CPU cores available on each node.)&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"&gt;Accessing Your Cluster&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
With each node using the standard MySQL port, it's a little less fussy to connect to the cluster, as you no longer need to find and specify a different port for each node. &amp;nbsp;As before, connect to any node and you see the same database instance. &amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Normally a Clustrix cluster is configured with a Virtual IP (VIP), which is a distinct IP address which load balances connections across all the nodes. &amp;nbsp;This is partially implemented within the base OS of our appliance nodes, so unfortunately we cannot provide this functionality with the DevKit. &amp;nbsp;You can, however, implement simple load-balancing with a tool like&amp;nbsp;&lt;a href="http://haproxy.1wt.eu/"&gt;HAProxy&lt;/a&gt;&amp;nbsp;(we use this internally as a simple solution to cut over between clusters located on different subnets, where DSR is not possible). &amp;nbsp;I'll see about writing this up as another blog post. &amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Please bear in mind that while we've now got our &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnode&lt;/span&gt; processes distributed across multiple servers, they are now communicating over ethernet instead of via UNIX sockets (memory). &amp;nbsp;Ethernet is a far cry from the InfiniBand which connects real Clustrix nodes. &amp;nbsp;Beyond IB being a higher throughput, lower latency interconnect, our software is tuned for these characteristics, so we're not going to see world-beating performance with our simulated cluster running over ethernet. &amp;nbsp;For that, I'll refer you to our &lt;a href="http://blog.clustrix.com/2011/10/percona-evaluates-clustrix-and-mysql.html"&gt;prior blog post&lt;/a&gt; on Percona's TPCC test!&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"&gt;Recap&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
So we've shown how to get &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnode&lt;/span&gt; running on different nodes, communicating with each other via ethernet. &amp;nbsp;While a little more "real" than having them all running on a single box, the performance characteristics are going to be orders of magnitude off from the capabilities of proper Clustrix nodes. &amp;nbsp;This does provide a simulacra of the platform to develop against, and we'd welcome the opportunity to move you from there to deploying on real hardware. &amp;nbsp;As always, your feedback or questions are welcome, in the comments or&amp;nbsp;&lt;a href="https://groups.google.com/a/clustrix.com/group/support-public/topics"&gt;support forum&lt;/a&gt;. &amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-7888993520831085718?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/nz7G1dAkyIo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/7888993520831085718/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=7888993520831085718" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/7888993520831085718?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/7888993520831085718?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/nz7G1dAkyIo/running-devnode-cluster-across-multiple.html" title="Running a devnode cluster across multiple boxes" /><author><name>Nathan Parrish</name><uri>http://www.blogger.com/profile/00747063650590372849</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total><feedburner:origLink>http://blog.clustrix.com/2012/01/running-devnode-cluster-across-multiple.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkAGR3g_cCp7ImA9WhRUFU8.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-3799260177649569836</id><published>2012-01-23T17:23:00.000-08:00</published><updated>2012-01-25T12:52:06.648-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-25T12:52:06.648-08:00</app:edited><title>Getting started with Clustrix DevKit</title><content type="html">We recently released our Clustrix Developers Kit to enable folks to experiment with the Clustrix solution right away, without having to take delivery of actual hardware or coordinate a hosted evaluation. &amp;nbsp;The kit includes the &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnode&lt;/span&gt; binary, which is the database code minus the few hardware-specific bits to run on our appliance hardware, and a &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl&lt;/span&gt; utility to simplify wrangling multiple instances of &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnode&lt;/span&gt; together into a functional cluster. &amp;nbsp;With this kit, it is possible to create a fully functional cluster, load data onto it, and run your application against it. &amp;nbsp;Obviously there are limitations: CPU, memory, and disk I/O capacity are going to be divided between multiple simulated nodes, and while we can simulate failure scenarios (as we will do in a future post), actual fault tolerance requires Clustrix nodes. &lt;br /&gt;
&lt;br /&gt;
In this post, I'll walk you through:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Installing the RPMs that comprise the DevKit&lt;/li&gt;
&lt;li&gt;Using &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl&lt;/span&gt; to start up a cluster&lt;/li&gt;
&lt;li&gt;Accessing the database instance with the &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&lt;/span&gt; client&lt;/li&gt;
&lt;li&gt;Dumping and importing data from a MySQL instance&lt;/li&gt;
&lt;li&gt;Starting MySQL replication as a slave&lt;/li&gt;
&lt;/ol&gt;
&lt;div&gt;
Along the way I want to highlight the following features of Clustrix:&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;Drop-in compatibility with MySQL, including replication&lt;/li&gt;
&lt;li&gt;Single-instance database (no sharding!)&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
Let's get started!&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"&gt;Installation&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
First download from the main page at www.clustrix.com, the "Try it now" button. &amp;nbsp;You'll need to download and install the common and devnode packages as follows:&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[root@beta001 v3.2]# rpm -ivh clustrix-common-v3.2-493.x86_64.rpm clustrix-devnode-v3.2-7371.x86_64.rpm&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Preparing... &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;########################################### [100%]&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp;1:clustrix-common &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;########################################### [ 50%]&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp;2:clustrix-devnode &amp;nbsp; &amp;nbsp; &amp;nbsp; ########################################### [100%]&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
If you run into any errors installing the RPMs, it's likely to be due to missing dependencies, which you may be able to fill in using yum. &amp;nbsp; Otherwise, you'll probably need to find a client with a more recent Redhat/CentOS install (for this exercise, I'm using CentOS 6). &amp;nbsp;&lt;/div&gt;
&lt;div&gt;
What does the RPM actually install? &amp;nbsp;Everything installs under&amp;nbsp;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;/opt/clustrix&lt;/span&gt;; support libraries in&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;lib/&lt;/span&gt;, and the two executables we care about in &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;bin/:&lt;/span&gt;&amp;nbsp;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnode&lt;/span&gt; and &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl&lt;/span&gt;. &amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
You'll also need a data directory for the cluster to store files which will serve as virtual disks. &amp;nbsp;It is highly recommended that you store these on local disk rather than over NFS. &amp;nbsp;Bear in mind that with proper Clustrix nodes, this storage is provided with SSDs, thus the throughput capacity of your local storage will be a constraining factor in performance of a devnode cluster (a subject I plan to revisit in a future blog post). &amp;nbsp;Here I'm going to use &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;/data/clustrix&lt;/span&gt;&lt;span style="font-family: inherit;"&gt;, which is the default for the &lt;/span&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: inherit;"&gt;utility (if you use a different dir, you'll need to use the&lt;/span&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt; -d/--data-dir&lt;/span&gt;&lt;span style="font-family: inherit;"&gt; argument each time you run &lt;/span&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl&lt;/span&gt;&lt;span style="font-family: inherit;"&gt;):&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[root@beta001 v3.2]# mkdir /data/clustrix&lt;/span&gt;&lt;/div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;
&lt;div&gt;
[root@beta001 v3.2]# chown nparrish /data/clustrix&lt;/div&gt;
&lt;div style="font-size: small;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
If you are running as a non-root user (which is advised), you'll also want to &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;chown&lt;/span&gt; the directory to your username, as shown.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"&gt;Initializing the cluster&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl&lt;/span&gt; takes care of starting up &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnode&lt;/span&gt; processes, specifying flags so that they automatically join together into a cluster. &amp;nbsp;In later blog posts we'll take a more manual approach, in order to demonstrate things like adding nodes to an existing cluster, running &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnode&lt;/span&gt; instances on multiple clients, and fault tolerance features. &amp;nbsp;But for now, firing up the cluster is as simple as:&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[nparrish@beta001 ~]$ /opt/clustrix/bin/devnodectl --init --nodes 3 start&lt;/span&gt;&lt;/div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;
&lt;div style="font-size: small;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
Let's look at the output this returns:&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[exec] /opt/clustrix/bin/devnode -clusterpath /data/clustrix -cluster nparrish -setpnid 1 -anyport -unix -glue -noautostart -nclean 4 -vdev-size 256&lt;/span&gt;&lt;/div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;
&lt;div&gt;
[exec] /opt/clustrix/bin/devnode -clusterpath /data/clustrix -cluster nparrish -setpnid 2 -anyport -unix -noautostart -nclean 4 -vdev-size 256&lt;/div&gt;
&lt;div&gt;
[exec] /opt/clustrix/bin/devnode -clusterpath /data/clustrix -cluster nparrish -setpnid 3 -anyport -unix -noautostart -nclean 4 -vdev-size 256&lt;/div&gt;
&lt;div style="font-size: small;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
This shows us the actual &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnode&lt;/span&gt; commands being run. &amp;nbsp;I'll draw your attention to a few of the options specified:&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-clusterpath /data/clustrix&lt;/span&gt;: All cluster state, including the files used for virtual disks, will be in this dir.&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-nclean 4 -vdev-size 256&lt;/span&gt;: This gives us 4 disks of 256MB each, for each of the three nodes.&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Waiting for node 1 to enter quorum.&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;done.&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Node 1 [RUNNING]: /data/clustrix/p1&lt;/span&gt;&lt;/div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;
&lt;div&gt;
&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;mysql: 3306&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;control: 52924&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;healthmon:3581&lt;/div&gt;
&lt;div&gt;
&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;mysql socket: /data/clustrix/p1/mysql.sock&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Node 2 [RUNNING]: /data/clustrix/p2&lt;/div&gt;
&lt;div&gt;
&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;mysql: 57059&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;control: 57079&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;healthmon:52498&lt;/div&gt;
&lt;div&gt;
&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;mysql socket: /data/clustrix/p2/mysql.sock&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Node 3 [RUNNING]: /data/clustrix/p3&lt;/div&gt;
&lt;div&gt;
&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;mysql: 58157&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;control: 2048&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;healthmon:52002&lt;/div&gt;
&lt;div&gt;
&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;mysql socket: /data/clustrix/p3/mysql.sock&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Example command to access your cluster:&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;mysql -u root -S /data/clustrix/p3/mysql.sock&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
This tells us how to access each of the three nodes that have been started. &amp;nbsp;The -anyport command means that each node gets a random MySQL port (after trying for the default 3306, which you can see node 1 gets -- if your client is already running a &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysqld&lt;/span&gt; instance, all nodes would get some different port number), and this output tells you what port to specify for your clients to connect. &amp;nbsp;You can also connect via socket, as shown in the example. &amp;nbsp;The control and healthmon ports will be covered in a future blog post. &amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"&gt;Connecting to the cluster&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[nparrish@beta001 ~]$ mysql -h 127.0.0.1 -P 58157 -u root&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Welcome to the MySQL monitor. &amp;nbsp;Commands end with ; or \g.&lt;/span&gt;&lt;/div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;
&lt;div&gt;
Your MySQL connection id is 3074&lt;/div&gt;
&lt;div&gt;
Server version: 5.0.45&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.&lt;/div&gt;
&lt;div&gt;
This software comes with ABSOLUTELY NO WARRANTY. This is free software,&lt;/div&gt;
&lt;div&gt;
and you are welcome to modify and redistribute it under the GPL v2 license&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
mysql&amp;gt;&amp;nbsp;select @@version;&lt;/div&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+----------------------------------------------------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| @@version &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+----------------------------------------------------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| 5.0.45-clustrix-v3.2-7371-0f71a74d89c512ac-release |&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+----------------------------------------------------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;1 row in set (0.00 sec)&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
A few things to note here:&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;This is standard, off-the-shelf &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&lt;/span&gt;&amp;nbsp;command line client (hence the Oracle copyright)&lt;/li&gt;
&lt;li&gt;I connected using the IP for localhost; mysql client tries to be smart and use the default mysql socket (&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;/var/lib/mysql/mysql.sock&lt;/span&gt;) if you specify &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-h localhost&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;Clustrix reports server version 5.0.45 for compatibility only; from the full version string in &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;@@version&lt;/span&gt;,&amp;nbsp;&amp;nbsp;the relevant Clustrix part starts at v3.2)&lt;/li&gt;
&lt;li&gt;We are connecting with built-in root user, which initially has no password&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
Let's create a table and insert a little bit of data:&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; use test;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Database changed&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; create table foo (id int key, v varchar(100));&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Query OK, 0 rows affected (0.09 sec)&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; insert into foo values (1, 'bar'), (2, 'baz');&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Query OK, 2 rows affected (0.00 sec)&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; select * from foo;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+----+------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| id | v &amp;nbsp; &amp;nbsp;|&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+----+------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| &amp;nbsp;1 | bar &amp;nbsp;|&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| &amp;nbsp;2 | baz &amp;nbsp;|&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+----+------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;2 rows in set (0.03 sec)&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Nothing fancy, but let's now connect to a different node and see how things look:&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; exit&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Bye&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[nparrish@beta001 ~]$ mysql -h 127.0.0.1 -P 57059 -u root test&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Reading table information for completion of table and column names&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;You can turn off this feature to get a quicker startup with -A&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[...]&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; show tables;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+----------------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| Tables_in_test |&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+----------------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| foo &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+----------------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;1 row in set (0.00 sec)&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; select * from foo;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+----+------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| id | v &amp;nbsp; &amp;nbsp;|&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+----+------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| &amp;nbsp;2 | baz &amp;nbsp;|&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;| &amp;nbsp;1 | bar &amp;nbsp;|&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;+----+------+&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;2 rows in set (0.01 sec)&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Note that here I've connected to a different node, but I have the exact same view of my data. &amp;nbsp;You might also note that the rows were returned in a different order. &amp;nbsp;On Clustrix, rows are distributed across multiple nodes, and so if no &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;ORDER BY&lt;/span&gt; clause is specified, may be returned in different order. &amp;nbsp;This differs from MySQL, which implicitly orders by primary key (since it must only read from one place).&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"&gt;mysqldump and clustrix_import&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
Let's import data from an existing MySQL instance (here running on a different client, called &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;hefty&lt;/span&gt;). &amp;nbsp;We start by dumping with &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysqldump&lt;/span&gt;:&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[nparrish@hefty tmp]$ mysqldump -h 127.0.0.1 --single-transaction --master&lt;/span&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-data=2 sbtest &amp;gt; /tmp/sbtest.sql&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;--single-transaction&lt;/span&gt; ensures that all tables in the database are consistent with respect to eachother&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;--master-data=2&lt;/span&gt; gives us the position in the replication binlog that corresponds to that single transaction, so we can start up a replication slave&lt;/li&gt;
&lt;li&gt;Here we are just dumping a specific database, &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;sbtest&lt;/span&gt;. &amp;nbsp;We'll also need to create this database on our cluster, as the resulting dump file will not have &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;DROP/CREATE DATABASE&lt;/span&gt; statements for it.&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
We can then import this using&amp;nbsp;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&lt;/span&gt; client:&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[nparrish@hefty tmp]$ mysql -h beta001 -P 3306 sbtest &amp;lt; /tmp/sbtest.sql&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
A few things to note here:&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;As noted, we need to indicate which database to import into, so we must first &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;CREATE DATABASE sbtest;&lt;/span&gt; then specify&amp;nbsp;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;sbtest &lt;/span&gt;&lt;span style="font-family: inherit;"&gt;argument to indicate the target database.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="font-family: inherit;"&gt;If we were importing into actual nodes, we'd use a tool called clustrix_import, which does the inserts in parallel; the tool has not been (de)tuned for use with devnode, so using these together will result in an out-of-memory condition.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
Recall that these nodes were created with 4 256MB disks each, so by default the cluster will accomodate only a small (~1GB) dump file. &amp;nbsp;If you want larger disks, use the&amp;nbsp;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;--drive-size&lt;/span&gt; option when running &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl --init&lt;/span&gt; (see &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl -h&lt;/span&gt; output for details).&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"&gt;MySQL Replication&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: inherit;"&gt;We'll dedicate a whole future blog post to replication, but for just a taste let's start up a slave to catch up from the point at which we took the above dump. &amp;nbsp;First we need to find the log file and position, which is stored as a comment in our dump file:&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[nparrish@hefty tmp]$ head -24 /tmp/sbtest.sql&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-- MySQL dump 10.11&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;--&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[....]&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-- Position to start replication or point-in-time recovery from&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;--&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;-- CHANGE MASTER TO MASTER_LOG_FILE='hefty-bin.000001', MASTER_LOG_POS=610868;&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div style="font-family: inherit;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div style="font-family: inherit;"&gt;
So we just need to add the hostname and user account to use in order to point our cluster at this MySQL instance (note that we can run this on any of our nodes):&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; CHANGE MASTER TO MASTER_LOG_FILE='hefty-bin.000001', MASTER_LOG_POS=610868, MASTER_HOST='hefty', MASTER_USER='root';&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Query OK, 0 rows affected (0.04 sec)&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; start slave;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Query OK, 0 rows affected (0.02 sec)&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&amp;gt; show slave status\G&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;*************************** 1. row ***************************&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Slave_Name: default&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Slave_Status: Running&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Master_Host: hefty&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Master_Port: 3306&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Master_User: root&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Master_Log_File: hefty-bin&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Slave_Enabled: Enabled&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Log_File_Seq: 2&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Log_File_Pos: 267&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Last_Error: no error&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Connection_Status: Connected&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; Relay_Log_Bytes_Read: 0&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Relay_Log_Current_Size: 0&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;Seconds_Behind_Master: 0&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;1 row in set (0.01 sec)&lt;/span&gt;&lt;/div&gt;
&lt;div style="font-family: inherit;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div style="font-family: inherit;"&gt;
And we're off and replicating any write events on the MySQL instance. &amp;nbsp;&lt;/div&gt;
&lt;div style="font-family: inherit;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"&gt;Stopping devnodes&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
To shut down your cluster, run &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl stop&lt;/span&gt;:&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[nparrish@beta001 clustrix]$ /opt/clustrix/bin/devnodectl stop&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[node /data/clustrix/p1] kill -9 21089&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[node /data/clustrix/p2] kill -9 20078&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[node /data/clustrix/p3] kill -9 20079&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
As noted, the processes are simply stopped with kill -9. &amp;nbsp;You can restart them with&amp;nbsp;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl start&lt;/span&gt;, this time leaving out the&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt; --init&lt;/span&gt; and &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;--nodes&lt;/span&gt; options; note that these processes will probably get different mysql ports than for the prior run. &amp;nbsp;We'll spend more time with restarting &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnode&lt;/span&gt; processes when we cover fault tolerance in a future post. &amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"&gt;Recap&lt;/span&gt;&lt;/div&gt;
&lt;div style="font-family: inherit;"&gt;
So we've covered some of the initial steps you'll be taking with the Clustrix developers kit:&lt;/div&gt;
&lt;div&gt;
&lt;ol&gt;
&lt;li&gt;Installing the RPMs&lt;/li&gt;
&lt;li&gt;Starting up a simple three node cluster with &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnodectl&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;Connecting to the cluster with mysql&amp;nbsp;&lt;/li&gt;
&lt;li&gt;Using &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysqldump&lt;/span&gt; to get data from an existing MySQL instance, then import with &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;mysql&lt;/span&gt; onto your cluster&lt;/li&gt;
&lt;li&gt;Using MySQL replication to set up your cluster as a slave to that MySQL instance&lt;/li&gt;
&lt;/ol&gt;
&lt;div&gt;
If you're familiar with MySQL, most of this should look pretty familiar, and that's quite the point -- the drop-in compatibility you get with Clustrix means there's no application rewrite involved. &amp;nbsp;Out of the box you get a cluster which hosts a single-instance database, no federation or sharding mess. &amp;nbsp;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"&gt;What's next&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
Stay tuned for more. &amp;nbsp;Future topics will include a closer look at how data is stored on Clustrix, deeper coverage of replication capabilities, running &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;devnode&lt;/span&gt; on multiple hosts, fault tolerance, examining query performance, and more. &amp;nbsp;&lt;/div&gt;
&lt;div&gt;
Please come talk to us on the support forums at&amp;nbsp;&lt;a href="https://groups.google.com/a/clustrix.com/group/support-public/topics"&gt;https://groups.google.com/a/clustrix.com/group/support-public/topics&lt;/a&gt; if you run into any problems or have more technical questions. &amp;nbsp;&amp;nbsp;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-3799260177649569836?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/eJW3dfCRmIE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/3799260177649569836/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=3799260177649569836" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/3799260177649569836?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/3799260177649569836?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/eJW3dfCRmIE/getting-started-with-clustrix-devkit.html" title="Getting started with Clustrix DevKit" /><author><name>Nathan Parrish</name><uri>http://www.blogger.com/profile/00747063650590372849</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2012/01/getting-started-with-clustrix-devkit.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkUDRHY_fSp7ImA9WhRbEU8.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-605129942318752241</id><published>2011-11-04T18:25:00.000-07:00</published><updated>2012-02-01T11:24:35.845-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-01T11:24:35.845-08:00</app:edited><title>Distributed Database Architectures: Distributed Storage / Centralized Compute</title><content type="html">In my previous post I wrote about &lt;a href="http://sergeitsar.blogspot.com/2011/11/distributed-database-architectures.html"&gt;shared disk architectures&lt;/a&gt; and the problems they introduce. It's common to see comparisons between shared disk and shared nothing architectures, but that distinction is too coarse to capture the differences between various shared nothing approaches.&lt;br /&gt;&lt;br /&gt;Instead, I'm going to characterize the various "shared-nothing" style systems by their query evaluation architectures. Most systems fall into one of the following buckets:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Centralized compute&lt;/li&gt;&lt;li&gt;Limited distributed compute&lt;/li&gt;&lt;li&gt;Fully distributed compute&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;b&gt;Centralized Compute: MySQL Cluster &lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.mysql.com/products/cluster/"&gt;MySQL Cluster&lt;/a&gt; consists of two basic roles used for servicing user queries: a compute role and a storage/data role. The compute node is the front end which takes in the query, plans it, and executes it. The compute node will communicate with the storage nodes remotely to fetch any data relevant to the query.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-9Blp95pXStI/TrRqMqbvF7I/AAAAAAAAB6s/Osizuu0oBv0/s1600/dist+storage+cent+compute+1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-9Blp95pXStI/TrRqMqbvF7I/AAAAAAAAB6s/Osizuu0oBv0/s1600/dist+storage+cent+compute+1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;In the distributed storage model, data is no longer shared between the nodes at the page level. Instead, the storage nodes expose a higher level API which allows the compute node to fetch row ranges based on the available access paths (i.e. indexes).&lt;br /&gt;&lt;br /&gt;In such a system, storage level locks associated with the data are now managed exclusively by the storage node itself. A compute node does not cache any data; instead, it always asks the set of storage nodes responsible for the data. The system solved the cache coherence overhead problem. &lt;br /&gt;However, it still suffers from extensive data movement and centralized query evaluation.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;strike&gt;Cache coherence overhead&lt;/strike&gt;&lt;/li&gt;&lt;li&gt;Extensive data movement&lt;/li&gt;&lt;li&gt;Centralized query evaluation&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;b&gt;MySQL Query Evaluation in Action&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Consider the following example:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;i&gt;SELECT count(*) FROM mytable WHERE acol = 1 and bcol = 2&lt;/i&gt;&lt;/div&gt;&lt;br /&gt;Assumptions:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;an index over &lt;i&gt;acol&lt;/i&gt;&lt;/li&gt;&lt;li&gt;10% of the rows in the table match &lt;i&gt;acol = 1&lt;/i&gt;&lt;/li&gt;&lt;li&gt;3% of the rows match &lt;i&gt;acol = 1 and bcol = &lt;/i&gt;2&lt;/li&gt;&lt;li&gt;total table size 1 Billion rows&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-7Hs9p8lrWMI/TrSF3fFLHgI/AAAAAAAAB68/l6-OfQvPKjw/s1600/dist+storage+cent+compute+2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-Hn8mR3U2VKM/TrSJM0DHF0I/AAAAAAAAB7U/r_6xxc4xBic/s1600/dist+storage+cent+compute+2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-Hn8mR3U2VKM/TrSJM0DHF0I/AAAAAAAAB7U/r_6xxc4xBic/s1600/dist+storage+cent+compute+2.png" /&gt;&amp;nbsp;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;In the diagram above, the arrows represent the flow of data through the system. As you can see, most of the query evaluation in the example is done by a single compute node.&amp;nbsp; The system generated a &lt;b&gt;data movement of 100 million rows&lt;/b&gt;, and only &lt;b&gt;a single node performed additional filtering and aggregate count&lt;/b&gt;. &lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;It's an improvement over a shared disk system, but it still has some serious limitations. Such a system could be well suited for simple key access (i.e. query touches a few specific rows), but any more complexity will generally result in poor performance&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;As with the shared disk system, adding more nodes will not help improve single query execution, and queries which operate over large volumes of data have the potential to saturate the message bus between the nodes.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-605129942318752241?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/q7ofgIGksJE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/605129942318752241/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=605129942318752241" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/605129942318752241?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/605129942318752241?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/q7ofgIGksJE/distributed-database-architectures.html" title="Distributed Database Architectures: Distributed Storage / Centralized Compute" /><author><name>Sergei</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="24" height="32" src="http://3.bp.blogspot.com/_VHQJkYQ5-dY/SoixbUYDYOI/AAAAAAAAAro/so2zLlfjOuc/S220/profile.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-9Blp95pXStI/TrRqMqbvF7I/AAAAAAAAB6s/Osizuu0oBv0/s72-c/dist+storage+cent+compute+1.png" height="72" width="72" /><thr:total>2</thr:total><feedburner:origLink>http://blog.clustrix.com/2011/11/distributed-database-architectures.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkUDRHY9fip7ImA9WhRbEU8.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-7850895367987723231</id><published>2011-11-04T14:51:00.000-07:00</published><updated>2012-02-01T11:24:35.866-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-01T11:24:35.866-08:00</app:edited><title>Distributed Database Architectures: Shared Disk</title><content type="html">One of the most common questions I get about &lt;a href="http://www.clustrix.com/"&gt;Clustrix&lt;/a&gt; is "How is Clustrix different than Database X?" Depending on who asks the question, Database X tends to be anything from Oracle RAC to MySQL Cluster. So I decided to put together a primer on different types of Distributed Database Architectures, and what each one means for real world applications. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Shared Disk Databases&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Shared disk databases fall into a general category where multiple database instances share some physical storage resource. With a shared disk architecture, multiple nodes coordinate access to a shared storage system at a block level.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-ABKLHvI2pz8/TrMvECJwiII/AAAAAAAAB5E/NLS98-fFHgM/s1600/shared+disk+1.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-ABKLHvI2pz8/TrMvECJwiII/AAAAAAAAB5E/NLS98-fFHgM/s1600/shared+disk+1.png" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Adding clustering to a stand alone database through a shared disk cache.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;Several of the older generation databases fall into this category, including &lt;a href="http://www.oracle.com/technetwork/database/clustering/overview/index.html"&gt;Oracle RAC&lt;/a&gt;, &lt;a href="http://www-01.ibm.com/software/data/db2/linux-unix-windows/editions-features-purescale.html"&gt;IBM DB2 pureScale&lt;/a&gt;, &lt;a href="http://www.sybase.com/manage/shared-disk-clustering"&gt;Sybase&lt;/a&gt;, and others.&lt;br /&gt;&lt;br /&gt;These systems all started out as single instance databases. The easiest way to add clustering to an existing database stack would be to share the storage system between multiple independent nodes. All of these databases already did paged disk access through a buffer manager cache. Make the caches talk to each to manage concurrence, and bam, you have a distributed database!&lt;br /&gt;&lt;br /&gt;Most of the database stack remains the same. You have the same planner, the same optimizer, the same query execution engine. It's a low risk way to extend the database to multiple nodes. You can add more processing power to the database, keep data access transparency in the application layer, and get some amount of fault tolerance.&lt;br /&gt;&lt;br /&gt;But such systems also have serious limitations which prevent them from getting very wide spread adoption, especially for applications which require scale. They simply don't scale for most workloads  &lt;span style="font-size: x-small;"&gt;(footnote: see Scale w/ Sharing below)&lt;/span&gt;, and they are extremely complex to administer.&lt;br /&gt;&lt;br /&gt;Almost every new distributed database system built in the last 10 years has embraced a different architecture, mainly because the shared disk model has the following problems:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Cache coherence overhead&lt;/li&gt;&lt;li&gt;Extensive data movement across the cluster&lt;/li&gt;&lt;li&gt;Centralized query execution (only a single node participates in query resolution)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;b&gt;Cache Coherence and Page Contention&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Let's assume the following workload as the first example:&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;i&gt;DB 1: UPDATE mytable SET mycol = mycol + 1 WHERE mykey = &lt;span style="color: red;"&gt;X&lt;/span&gt;&lt;/i&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;i&gt;DB 2: UPDATE mytable SET mycol = mycol + 1 WHERE mykey =&lt;span style="color: blue;"&gt; Y&lt;/span&gt;&lt;/i&gt;&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;Now let's further assume that some of the keys for that update statement will share a page on disk. So we don't have contention on the same key, but we do have some contention on the same disk page.&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-s24ipE1npMI/TrM3mo4CObI/AAAAAAAAB5c/a6AF0p0XTuQ/s1600/shared+disk+2.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-s24ipE1npMI/TrM3mo4CObI/AAAAAAAAB5c/a6AF0p0XTuQ/s1600/shared+disk+2.png" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Page contention in shared-disk systems.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;Both nodes receive similar update queries. Both nodes must update the same physical page. But in order to do it safely, they must insure that &lt;i&gt;all copies of the page are consistent across the cluster&lt;/i&gt;. &lt;br /&gt;&lt;br /&gt;Managing such consistency comes at a cost. Consider some of the requirements which have to be satisfied:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Every node must acquire a page lock in order to read or write from the page. In a clustered environment, such locking results in communication between nodes.&lt;/li&gt;&lt;li&gt;If a node acquires a write lock on a page and modifies it, it must notify any other node to invalidate their caches.&lt;/li&gt;&lt;li&gt;If a node gets a page invalidation request, it must re-fetch the latest copy of the page, which also results in more network communication.&lt;/li&gt;&lt;li&gt;Each node caches the contents of the entire data set. It means that the effective cache size of the cluster is as large as the cache on any of the nodes. Adding more nodes does not scale cache efficiency. &lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Now imagine adding more nodes to your cluster. You end up with a system which has&lt;b&gt; non-linear scaling in message complexity and&lt;/b&gt;&lt;b&gt; data movement&lt;/b&gt;.&amp;nbsp; It's common to see such systems struggle beyond 4 nodes on typical OLTP workloads.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Data Movement&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Consider another example which poses problems for shared disk systems:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;i&gt;SELECT sum(mycol) FROM mytable WHERE something = x&lt;/i&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Let's assume that &lt;i&gt;mytable&lt;/i&gt; contains 1 Billion rows and that the &lt;i&gt;something = x&amp;nbsp;&lt;/i&gt;&lt;b&gt;&lt;i&gt; &lt;/i&gt;&lt;/b&gt;leaves us 1,000 rows. With a shared disk system, if the predicate results in a table scan, &lt;b&gt;the entire contents of the 1B row table must be transferred &lt;/b&gt;to the node evaluating the query!&lt;b&gt; &lt;/b&gt;&amp;nbsp;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;And from the previous example, we see that it's not just a matter of data movement across the SAN infrastructure. The system must also maintain cache coherence, which means &lt;b&gt;lots of cache management traffic between all the nodes.&lt;/b&gt; Such queries on large data sets can bring the whole cluster to its knees.&lt;/div&gt;&lt;br /&gt;&lt;b&gt;Centralized Query Resolution&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;It's also worth while to point out that &lt;b&gt;only a single node within the cluster can participate in query evaluation&lt;/b&gt;. So even if you throw more nodes at your system, it's not generally going to help you speed up that slow query. In fact, adding more nodes may slow down the system as the cost of managing cache coherence increases in a larger cluster.&lt;/div&gt;&lt;br /&gt;&lt;b&gt;Scaling with Sharding&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;It's possible to scale with shared disk, but it requires an extensive engineering effort. The application is written (a) to keep affinity of data accesses to nodes and (b) to keep each shard within its own data silo. You &lt;b&gt;lose transparency of data access&lt;/b&gt; at the application level. The app can no longer send any query to any one of the nodes, or it will result in a catastrophic performance problem.&lt;br /&gt;&lt;br /&gt;While you end up with a very expensive means of implementing sharding, there is some advantage to this approach over regular sharding. In case of node failure, you can more easily bring in a host standby without having to keep a dedicated slave for each shard. But in practice, getting fault tolerance right with a SAN based shared disk infrastructure is no easy task -- just ask anyone who manages SANs for a living.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-vCW-H3tCXy0/TrRZE-R3kRI/AAAAAAAAB6k/9kLi7H6ubnE/s1600/shared+disk+3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-vCW-H3tCXy0/TrRZE-R3kRI/AAAAAAAAB6k/9kLi7H6ubnE/s1600/shared+disk+3.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;ul&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-7850895367987723231?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/5cy0UpL5XK4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/7850895367987723231/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=7850895367987723231" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/7850895367987723231?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/7850895367987723231?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/5cy0UpL5XK4/distributed-database-architectures_04.html" title="Distributed Database Architectures: Shared Disk" /><author><name>Sergei</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="24" height="32" src="http://3.bp.blogspot.com/_VHQJkYQ5-dY/SoixbUYDYOI/AAAAAAAAAro/so2zLlfjOuc/S220/profile.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-ABKLHvI2pz8/TrMvECJwiII/AAAAAAAAB5E/NLS98-fFHgM/s72-c/shared+disk+1.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2011/11/distributed-database-architectures_04.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkYMQX04cSp7ImA9WhVTFEo.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-7675236556491653244</id><published>2011-10-25T08:34:00.001-07:00</published><updated>2012-02-28T15:03:00.339-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-28T15:03:00.339-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="performance" /><category scheme="http://www.blogger.com/atom/ns#" term="TPCC" /><title>Percona Evaluates Clustrix and MySQL</title><content type="html">&lt;style&gt; &lt;!--  /* Font Definitions */ @font-face  {font-family:Calibri;  panose-1:2 15 5 2 2 2 4 3 2 4;  mso-font-charset:0;  mso-generic-font-family:auto;  mso-font-pitch:variable;  mso-font-signature:3 0 0 0 1 0;}  /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal  {mso-style-parent:"";  margin-top:0in;  margin-right:0in;  margin-bottom:10.0pt;  margin-left:0in;  line-height:115%;  mso-pagination:widow-orphan;  font-size:11.0pt;  font-family:"Times New Roman";  mso-ascii-font-family:Calibri;  mso-ascii-theme-font:minor-latin;  mso-fareast-font-family:Calibri;  mso-fareast-theme-font:minor-latin;  mso-hansi-font-family:Calibri;  mso-hansi-theme-font:minor-latin;  mso-bidi-font-family:"Times New Roman";  mso-bidi-theme-font:minor-bidi;} p.MsoPlainText, li.MsoPlainText, div.MsoPlainText  {mso-style-noshow:yes;  mso-style-link:"Plain Text Char";  margin:0in;  margin-bottom:.0001pt;  mso-pagination:widow-orphan;  font-size:12.0pt;  mso-bidi-font-size:10.5pt;  font-family:"Times New Roman";  mso-ascii-font-family:Calibri;  mso-fareast-font-family:Calibri;  mso-fareast-theme-font:minor-latin;  mso-hansi-font-family:Calibri;  mso-bidi-font-family:"Times New Roman";  mso-bidi-theme-font:minor-bidi;} span.PlainTextChar  {mso-style-name:"Plain Text Char";  mso-style-noshow:yes;  mso-style-locked:yes;  mso-style-link:"Plain Text";  mso-ansi-font-size:12.0pt;  mso-bidi-font-size:10.5pt;  font-family:Calibri;  mso-ascii-font-family:Calibri;  mso-hansi-font-family:Calibri;} @page Section1  {size:8.5in 11.0in;  margin:1.0in 1.25in 1.0in 1.25in;  mso-header-margin:.5in;  mso-footer-margin:.5in;  mso-paper-source:0;} div.Section1  {page:Section1;} --&gt; &lt;/style&gt;      &lt;br /&gt;&lt;div class="MsoPlainText"&gt;Percona ran a Percona-written TPCC benchmark against Percona's MySQL Server on an Intel- SSD based machine, Percona MySQL Server on a FusionIO based machine, and a variety of different sizes of Clustrix's database clusters running Clustrix's Sierra relational database.&lt;/div&gt;&lt;div class="MsoPlainText"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;The results show that Clustrix:&lt;/div&gt;&lt;div class="MsoPlainText"&gt;&lt;/div&gt;&lt;ol&gt;&lt;li&gt;&lt;b&gt;Scales linearly&lt;/b&gt; from 3 to 9 nodes.&lt;/li&gt;&lt;li&gt;&lt;b&gt;431% faster&lt;/b&gt; than the tested FusionIO system.&lt;/li&gt;&lt;li&gt;Tested system configured with 2x redundancy for Fault Tolerance.&lt;/li&gt;&lt;/ol&gt;&lt;div class="MsoPlainText"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-UfdtzZQhnLk/Tqb8DZJrAqI/AAAAAAAAB38/fY9cRWeQT6I/s1600/clustrix-perf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img src="http://4.bp.blogspot.com/-UfdtzZQhnLk/Tqb8DZJrAqI/AAAAAAAAB38/fY9cRWeQT6I/s400/clustrix-perf.png" height="361" border="0" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;Find the complete results and testing methodology in the &lt;a href="http://www.clustrix.com/Portals/146389/docs/clustrix_tpcc_percona.pdf"&gt;Percona-Clustrix TPCC Evaluation&lt;/a&gt;.&lt;/div&gt;&lt;div class="MsoPlainText"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;There are a few conclusions that can be drawn from these results.&lt;/div&gt;&lt;div class="MsoPlainText"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;&lt;b&gt;Scalability&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;First is that the Clustrix database really is scalable.  If  you look at the median throughput graph on page 7, the sixnode results  are two times the three- node results and the nine- node results are  three times the three-node performance at concurrency.  This  is exactly the behavior you need to see as a website or similar  application scales. As a website gets more and more users, concurrency  goes up and aggregate performance requirements continue to grow.  This  is in stark contrast to the MySQL-based systems tested, which followed  the traditional performance curve up to their peak, and then down toward  unresponsiveness as the concurrency increased.&lt;/div&gt;&lt;div class="MsoPlainText"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;&lt;b&gt;Performance&lt;/b&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="MsoPlainText"&gt;The second conclusion we draw from this report  is that performance exhibited by Clustrix is also more consistent than  the MySQL systems. If you look at Appendix A, you can see graphs  representing the instantaneous performance throughout the test.  Clustrix  maintains more consistent performance and lower response times even at  high concurrency -- exactly when your website needs it most.&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="MsoPlainText"&gt;&lt;b&gt;Redundancy and Fault Tolerance&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;A critical thing to note about this comparison is that &lt;b&gt;the MySQL based systems have no redundancy&lt;/b&gt;.  They either have a single FusionIO card or SSD drives in a RAID 0 configuration.  If any component in the system fails (drive, motherboard, memory, etc.), the entire database will fail.  In the Clustrix systems, fault tolerance is built in -- all data in stored redundantly and there is no single point of failure.  The  only way with MySQL to get a similar level of fault tolerance (but  still not as seamless as Clustrix) is to have two servers with  replication.  Generating a replication feed would further  slow down the MySQL systems, so we left them unencumbered by replication  to get the best possible numbers.&lt;/div&gt;&lt;br /&gt;&lt;div class="MsoPlainText"&gt;&lt;b&gt;Complex Workload that Scales&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;The most interesting conclusion, perhaps, is the more general statement that transactional, relational, full-featured SQL can scale and Clustrix has proved it in this testing and at multiple production customer deployments around the world.  The TPCC test is a relatively complicated workload.  There are &lt;b&gt;multi-statement transactions, joins, aggregates, foreign key constraints, and a large percentage of writes&lt;/b&gt;.   &lt;/div&gt;&lt;div class="MsoPlainText"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;These are exactly the things the NoSQL movement has proclaimed as too difficult to scale.  They couldn't make true transactions scale so they introduced eventual consistency.  They couldn't make relational calculus scale so they forced denormalization and complicated logic onto the app developers.  NoSQL "solutions" have thrown all this out and the one bone they throw the app developers is the concept of a "flexible data model."   &lt;/div&gt;&lt;div class="MsoPlainText"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoPlainText"&gt;That's not a revolution, that is a feature and a simple one at that (See our co-founder’s  &lt;a href="http://sergeitsar.blogspot.com/2011/02/clustrix-as-document-store-blending-sql.html"&gt;post&lt;/a&gt; on how this can easily be done in a SQL database and how Clustrix has implemented it).  You don't have to give up transactions or ACID properties, relational semantics, or any of the rich SQL features in the name of performance or scale.  SQL will perform and scale just fine.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-7675236556491653244?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/C69g19QTyoM" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/7675236556491653244/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=7675236556491653244" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/7675236556491653244?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/7675236556491653244?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/C69g19QTyoM/percona-evaluates-clustrix-and-mysql.html" title="Percona Evaluates Clustrix and MySQL" /><author><name>Aaron Passey</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-UfdtzZQhnLk/Tqb8DZJrAqI/AAAAAAAAB38/fY9cRWeQT6I/s72-c/clustrix-perf.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2011/10/percona-evaluates-clustrix-and-mysql.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEENRXs6fCp7ImA9WhdTF0o.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-4246303844118118372</id><published>2011-07-15T15:15:00.000-07:00</published><updated>2011-07-15T16:31:34.514-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-07-15T16:31:34.514-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="use case" /><category scheme="http://www.blogger.com/atom/ns#" term="scalable database" /><title>Use Cases: TheLadders and iOffer</title><content type="html">Earlier this year we set out to create &lt;a href="http://www.clustrix.com/case-studies/"&gt;use cases&lt;/a&gt; with several of our customers. The two companies that we have chosen to release today are &lt;a href="http://www.theladders.com/"&gt;TheLadders&lt;/a&gt; and &lt;a href="http://www.ioffer.com/"&gt;iOffer&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.theladders.com/"&gt;TheLadders.com&lt;/a&gt; is the world’s largest online job search service that lists jobs with annual salaries of $100,000 or more. TheLadders were in search of a database solution that was highly scalable, fault-tolerant and allowed the team to avoid sharding.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.ioffer.com/"&gt;iOffer&lt;/a&gt; is the fastest growing destination for interactive social commerce with a vibrant global community connecting visitors from over 190 countries in every language via millions of item listings. iOffer was trying to find a database that was MySQL- compatible, and did not require changes to their existing architecture, database schemas or applications.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.clustrix.com/case-studies/"&gt;Take a look&lt;/a&gt; and see how Clustrix has solved these companies database problems.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-4246303844118118372?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/P8ALZCOutXE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/4246303844118118372/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=4246303844118118372" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/4246303844118118372?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/4246303844118118372?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/P8ALZCOutXE/use-cases-theladders-and-ioffer.html" title="Use Cases: TheLadders and iOffer" /><author><name>Reanna</name><uri>http://www.blogger.com/profile/02147899287943196246</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2011/07/use-cases-theladders-and-ioffer.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkYMRXYyfyp7ImA9WhZbFUQ.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-8618634929895258243</id><published>2011-06-20T11:12:00.000-07:00</published><updated>2011-06-20T11:23:04.897-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-06-20T11:23:04.897-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="MySQL" /><category scheme="http://www.blogger.com/atom/ns#" term="No SQL" /><category scheme="http://www.blogger.com/atom/ns#" term="Conference" /><category scheme="http://www.blogger.com/atom/ns#" term="Percona" /><category scheme="http://www.blogger.com/atom/ns#" term="database" /><title>Percona Live NYC: Keynote</title><content type="html">Check out Sergei Tsarev's keynote, "&lt;a href="http://www.percona.tv/percona-live/why-sql-wins"&gt;Why SQL Wins&lt;/a&gt;," from the &lt;a href="http://www.percona.com/live/nyc-2011/"&gt;Percona Live NYC&lt;/a&gt; Conference.&lt;br /&gt;&lt;br /&gt;Sergei takes a historical look at database architectures and their current trends. He explores why NoSQL is not the answer to our current problems, and outlines why we should not throw away decades of relational database research and development.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-8618634929895258243?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/N6JS9fpmSSw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/8618634929895258243/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=8618634929895258243" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/8618634929895258243?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/8618634929895258243?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/N6JS9fpmSSw/check-out-sergei-tsarevs-keynote-why.html" title="Percona Live NYC: Keynote" /><author><name>Reanna</name><uri>http://www.blogger.com/profile/02147899287943196246</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2011/06/check-out-sergei-tsarevs-keynote-why.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ck8HQn48fCp7ImA9WhZbEks.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-2762794081442851082</id><published>2011-06-16T14:46:00.001-07:00</published><updated>2011-06-16T14:47:13.074-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-06-16T14:47:13.074-07:00</app:edited><title>New Website</title><content type="html">We refreshed our &lt;a href="http://www.clustrix.com/"&gt;website design&lt;/a&gt;! We're still working on getting more content for it, so look for changes in the upcoming weeks!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-2762794081442851082?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/avLU0f4pz7Y" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/2762794081442851082/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=2762794081442851082" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/2762794081442851082?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/2762794081442851082?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/avLU0f4pz7Y/new-website.html" title="New Website" /><author><name>Sergei</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="24" height="32" src="http://3.bp.blogspot.com/_VHQJkYQ5-dY/SoixbUYDYOI/AAAAAAAAAro/so2zLlfjOuc/S220/profile.jpg" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2011/06/new-website.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkYNSHg9eSp7ImA9WhZbFUQ.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-7639627072388970289</id><published>2011-03-25T11:04:00.000-07:00</published><updated>2011-06-20T11:23:19.661-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-06-20T11:23:19.661-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="MySQL" /><category scheme="http://www.blogger.com/atom/ns#" term="Web 2.0 Expo" /><category scheme="http://www.blogger.com/atom/ns#" term="Conference" /><title>Upcoming Events</title><content type="html">&lt;div class="entry-content"&gt; &lt;p&gt;Please join Clustrix, Inc. at the &lt;a href="http://www.web2expo.com/webexsf2011/"&gt;Web 2.0 Expo&lt;/a&gt;  next week on March 28-31, 2011 in San Francisco, CA. Stop by our booth,  #630, and test out our fault tolerant failure demonstrations in the  Exhibit Hall. In addition to our live demo, we will be giving away cool  prizes throughout the day and a big prize at the end of each day. &lt;/p&gt; &lt;p&gt;CEO Paul Mikesell will be handing out the big giveaway at  approximately 2:30pm at the end of each expo day, must be present to  win.&lt;/p&gt; &lt;p&gt;Clustrix will also be attending the &lt;a href="http://en.oreilly.com/mysql2011/"&gt;MySQL Conference &amp;amp; Expo&lt;/a&gt; on April 11-14, 2011 in Santa Clara, CA as a Gold Sponsor. &lt;/p&gt; &lt;p&gt;Be sure to mark your calendars and stop by our booths because there are “no limits” to your opportunities with Clustrix.&lt;/p&gt;          &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-7639627072388970289?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/Bo5TNamdJkU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/7639627072388970289/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=7639627072388970289" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/7639627072388970289?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/7639627072388970289?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/Bo5TNamdJkU/upcoming-events.html" title="Upcoming Events" /><author><name>Reanna</name><uri>http://www.blogger.com/profile/02147899287943196246</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2011/03/upcoming-events.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkUDRHYzfip7ImA9WhRbEU8.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-7052327350528479013</id><published>2011-02-28T09:49:00.000-08:00</published><updated>2012-02-01T11:24:35.886-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-01T11:24:35.886-08:00</app:edited><title>Profile Driven Performance Optimization</title><content type="html">Recently, one of our support engineers noticed that a customer cluster got a little sluggish during a routine maintenance operation. In looking back at historical data, we noticed that the cluster saw a decrease in transaction throughput. The system should have done its cleanup in the background at a low priority, but that didn't happen. I thought it would be interesting to describe our process for addressing a performance problem like this one. I will also share the tools we developed and used.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Reproduce, Measure, and Automate&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The symptom our support engineer noticed was a "sluggish" system. More accurately, we saw that query latencies increased and overall system throughput dropped. So our first step was to reproduce the problem under some approximation of the customer's workload. It was a bit tricky to get all of the prerequisite conditions just right, but we finally landed on the right set of circumstances.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh4.googleusercontent.com/-RTmHUI63NQw/TWhi4McDktI/AAAAAAAABrI/wRKFUMHg220/s1600/test4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="https://lh4.googleusercontent.com/-RTmHUI63NQw/TWhi4McDktI/AAAAAAAABrI/wRKFUMHg220/s1600/test4.png" /&gt;&lt;/a&gt;&lt;/div&gt;Once we reproduced the issue, the next step was to automate the test to ensure that we could consistently demonstrate the problem. It boiled down to using one of our standard performance harnesses with a couple of tweaks.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Profile&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Our most commonly used profiling tool is based on the Linux oprofile infrastructure. But we had to modify it to better suit our system. Why? Because substantial portions of our codebase are written in an event driven/&lt;a href="http://en.wikipedia.org/wiki/Continuation-passing_style"&gt;continuation passing style&lt;/a&gt; programming model. Call graph reporting in ofprofile is highly dependent on the stack, and the C stack often does not contain enough information to provide a valuable analysis in our system.&lt;br /&gt;&lt;br /&gt;To get around this problem, we modified oprofile to gather additional information from the system. The key concept was adding a method for tagging a series of continuation calls and getting oprofile to sample these tags. The tag itself includes enough information to map the executing code to a logical system module hierarchy. The screen shot below shows an example output from one of our reporting tools.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh6.googleusercontent.com/-9A87ylb1nIA/TWhtMYtM-gI/AAAAAAAABrM/B6dudqgLeZg/s1600/gopd-for-blog.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="308" src="https://lh6.googleusercontent.com/-9A87ylb1nIA/TWhtMYtM-gI/AAAAAAAABrM/B6dudqgLeZg/s640/gopd-for-blog.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Coming back to our original problem, the screen shot shows the output of the profile analysis tool during the dip in performance. The tool allows us to isolate the report to a specific processor. In this case we're looking at the output for our management core. &lt;br /&gt;&lt;br /&gt;Interpreting the results we see that:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The execution engine dominates performance, and most of that is in the storage engine&lt;/li&gt;&lt;li&gt;The storage engine spends 63% of all its cycles on CRC32 calculations&lt;/li&gt;&lt;li&gt;Our particular maintenance task was erroneously bound to the management core &lt;/li&gt;&lt;li&gt;The task should have a lower priority&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;b&gt;CRC Performance&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The root cause of the performance degradation was a scheduling issue. However, it caused me to take a closer look at our crc algorithm. We use the crc implementation from zlib, and it should be reasonably well optimized. Whipping up a quick benchmark, I we see that the zlib implementation takes about 41us to checksum a 32k block. In the test, we were reading about 224MB/s per node. That's 293ms of checksums in a second. No wonder we saw a performance drop.&lt;br /&gt;&lt;br /&gt;I looked around and found a paper from Intel describing their &lt;a href="ftp://download.intel.com/technology/comms/perfnet/download/slicing-by-8.pdf"&gt;slice-by-eight crc approach&lt;/a&gt;. A quick benchmark revealed that the same 32k checksum could be done in 27us -- a 37% percent improvement over zlib's version.&lt;br /&gt;&lt;br /&gt;With Nehalem-based processors, Intel introduced an SSE instruction for a hardware based crc computation. I haven't had a chance to conduct my own benchmarks, but&amp;nbsp; the following &lt;a href="http://www.strchr.com/crc32_popcnt"&gt;results&lt;/a&gt; suggest that we could get a 2-3x speedup over slice by eight.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Source Code&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I'm releasing the &lt;a href="http://www.clustrix.com/wp-content/uploads/2011/02/opd.tgz"&gt;source for the tools&lt;/a&gt; referenced in this post. The user space oprofile daemon replacement requires our database runtime which I am not making public.&lt;br /&gt;&lt;ul&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-7052327350528479013?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/4dJdNVYQYR4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/7052327350528479013/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=7052327350528479013" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/7052327350528479013?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/7052327350528479013?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/4dJdNVYQYR4/profile-driven-performance-optimization.html" title="Profile Driven Performance Optimization" /><author><name>Sergei</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="24" height="32" src="http://3.bp.blogspot.com/_VHQJkYQ5-dY/SoixbUYDYOI/AAAAAAAAAro/so2zLlfjOuc/S220/profile.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lh4.googleusercontent.com/-RTmHUI63NQw/TWhi4McDktI/AAAAAAAABrI/wRKFUMHg220/s72-c/test4.png" height="72" width="72" /><thr:total>1</thr:total><feedburner:origLink>http://blog.clustrix.com/2011/02/profile-driven-performance-optimization.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkUDRHc5eSp7ImA9WhRbEU8.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-4057819847391649010</id><published>2011-02-07T07:50:00.000-08:00</published><updated>2012-02-01T11:24:35.921-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-01T11:24:35.921-08:00</app:edited><title>Clustrix as a Document Store: Blending SQL and JSON Documents</title><content type="html">Many responses to my previous post claimed that my comparisons to MongoDB were unfair because MongoDB is a "document store" and Clustrix is a SQL RDBMS.&amp;nbsp; Somehow that distinction made almost any comparison invalid. In response, I spent my weekend coding up a prototype document store interface for Clustrix to demonstrate that &lt;b&gt;a different data model is not in itself an architectural differentiator&lt;/b&gt;. &lt;br /&gt;&lt;br /&gt;At first I thought of just creating a different front end for Clustrix; our &lt;a href="http://sergeitsar.blogspot.com/2011/02/sierra-and-clustrix-database-stack.html"&gt;architectural model&lt;/a&gt; makes this easy. However, I quickly decided against it because in many ways, it would be much more limiting than a SQL based interface (e.g. joins, flexibile aggregation, subqueries, etc.)&lt;br /&gt;&lt;br /&gt;Instead, I extended our SQL syntax to support native operations on JSON objects. So now you can do the following:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;clustrix&amp;gt; create table files (id int primary key auto_increment, doc json);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;Query OK, 0 rows affected (0.04 sec)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;clustrix&amp;gt; insert into files (doc) values ('{"foo": {"bar": 1}, "baz": [1,2,3,4]}');&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;Query OK, 1 row affected (0.00 sec)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;clustrix&amp;gt; select id, files.doc::foo.baz from files where files.doc::foo.bar = 1;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;+----+--------------------+&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;| id | files.doc::foo.baz |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;+----+--------------------+&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;|&amp;nbsp; 1 | [1,2,3,4]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;+----+--------------------+&lt;/span&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;1 row in set (0.00 sec)&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;clustrix&amp;gt; create index foo_bar on files (doc::foo.bar);&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;Query OK, 0 rows affected (0.08 sec)&lt;/div&gt;&lt;br /&gt;The database has native support for dealing with JSON documents. We're not simply storing text blobs inside of some column and getting them back. We're exposing the contents of the JSON document to the underlying planner and execution engine. Immediately, we get the following advantages:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Ability to do joins across document collections&lt;/li&gt;&lt;li&gt;Extremely powerful and flexible query language&lt;/li&gt;&lt;li&gt;Ability to index into JSON objects&lt;/li&gt;&lt;li&gt;Transactional semantics built-in&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Taking it alittle further, I added a select clause modifier which instructs the database to return all row data as a JSON document, including fields which come from "a relational column." The following example shows how the database can seamlessly join between our json data type and other data types in the system, and then return the result as a JSON objects.&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;clustrix&amp;gt; select _json_ f.doc, u.doc::username&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;gt;&amp;nbsp;&amp;nbsp; from files f&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;gt;&amp;nbsp;&amp;nbsp; join users u on f.doc::user_id = u.user_id&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;gt;&amp;nbsp; where f.doc::foo.bar = 1\G&lt;br /&gt;*************************** 1. row ***************************&lt;br /&gt;json: {"f.doc": {"foo": {"bar": 1}, "baz": [1,2,3,4]}, "u.doc::username": "sergei"}&lt;br /&gt;1 row in set (0.01 sec)&lt;/div&gt;&lt;br /&gt;We can continue to extend the syntax. For example, adding operators to manipulate lists within JSON. Or adding optional schema checking for contents of the JSON (i.e. something along the lines of DTD for XML). I'm sure you can think of more.&lt;br /&gt;&lt;br /&gt;An any case, one can build a system which combines the best characteristics of a document store with the power of SQL. Both models can coexist in the same database, allowing the devloper to trully choose&amp;nbsp;the data model which best suits his or her needs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-4057819847391649010?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/4mRUb4XbxWY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/4057819847391649010/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=4057819847391649010" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/4057819847391649010?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/4057819847391649010?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/4mRUb4XbxWY/clustrix-as-document-store-blending-sql.html" title="Clustrix as a Document Store: Blending SQL and JSON Documents" /><author><name>Sergei</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="24" height="32" src="http://3.bp.blogspot.com/_VHQJkYQ5-dY/SoixbUYDYOI/AAAAAAAAAro/so2zLlfjOuc/S220/profile.jpg" /></author><thr:total>3</thr:total><feedburner:origLink>http://blog.clustrix.com/2011/02/clustrix-as-document-store-blending-sql.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkUDRHc_eCp7ImA9WhRbEU8.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-6445436204027684650</id><published>2011-02-02T15:53:00.000-08:00</published><updated>2012-02-01T11:24:35.940-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-01T11:24:35.940-08:00</app:edited><title>Sierra and the Clustrix Database Stack</title><content type="html">A lot of the responses to my previous posts criticized my choice of comparing a document database like MongoDB to a relational database like Clustrix. I tried to examine aspects of the database architecture which have nothing to do with the data model, but somehow the data model would always come up. There is a set of concerns that's common to all database systems, whether you are a document store or a relational database.&lt;br /&gt;&lt;br /&gt;But first, think about how your database would handle the following workload. Chose whatever data model you find best suited for my use case.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;get me all the records from foo where a = ? and b = ?&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;/span&gt;foo has an index over A and another index over&amp;nbsp; B.&lt;/li&gt;&lt;li&gt; A and B have non-uniform data distributions. A is 90% value "X" and B is 90% value "Y".&lt;/li&gt;&lt;li&gt;We have 1 billion records in foo.&lt;/li&gt;&lt;li&gt; The database has a choice: index A, index B, or scan and filter.&lt;/li&gt;&lt;li&gt;50% of the queries are a = X and b = Z, the other 50% are a = Z and b = Y&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;What does your database do?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;If the database always chooses index A or index B for all queries, then it ends up examining 900,000 rows 50% of the time.&lt;br /&gt;&lt;br /&gt;We'll come back to my question later. First, I will briefly describe what the database stack in Clustrix looks like. I need to set a foundation so that my next set of posts makes sense.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_VHQJkYQ5-dY/TUnZW_lhjKI/AAAAAAAABq8/iF_iwZ-UZUo/s1600/clustrix-stack.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_VHQJkYQ5-dY/TUnZW_lhjKI/AAAAAAAABq8/iF_iwZ-UZUo/s1600/clustrix-stack.png" /&gt;&lt;/a&gt;&lt;/div&gt;To anyone who has experience with DBMS systems, the stack should look very familiar. We have fairly strict abstraction of interfaces between the various portions of the stack.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The Protocol Handler and Query Parser are responsible for taking in user connections and translating SQL into our intermediate representation called &lt;b&gt;Sierra&lt;/b&gt;. We can actually support multiple type of dialects at these two layers; the constraint is that we must be able to express the query language in Sierra. And &lt;b&gt;Sierra is much more expressive than SQL&lt;/b&gt;. The value of Sierra is that it provides an extensive planner framework for reasoning about distributed database queries.&lt;br /&gt;&lt;br /&gt;So the Planner/Optimizer only accepts Sierra. It runs through a search space of possible plans and prunes them based on cost estimates. After coming up with the best plan candidate, it translates into another intermediate representation used by the Distributed Compiler, which reasons out the physical execution plan for our query. Finally, we compile the query into machine code and execute it.&lt;br /&gt;&lt;br /&gt;As a performance optimization, we cache the compiled programs and plans. Clustrix does not need to optimize and compile every query, and you don't need to use prepared statements to get this behavior.&lt;br /&gt;&lt;br /&gt;Back to my question. What did you come up with? Well, I can tell you what happens in Clustrix. During the planning phase of Sierra, we examine the various statistics that the database keeps about the data distribution in our indexes. The statistics include tracking:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Number of distinct values across index columns&lt;/li&gt;&lt;li&gt;Quantile distributions&lt;/li&gt;&lt;li&gt;Hotlist tracking top n values within a column&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Clustrix correctly chooses the plan which will result in the least number of rows examined for all input parameters.&lt;br /&gt;&lt;br /&gt;Now, don't get me wrong. I am not claiming that data distribution statistics are unique to Clustrix. On the contrary, they are very common in any modern RDBMS. I'm using it as an example of a requirement that's independent of any data model, and it's actually a very important feature to have.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-6445436204027684650?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/g5GURfE-Unk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/6445436204027684650/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=6445436204027684650" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/6445436204027684650?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/6445436204027684650?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/g5GURfE-Unk/sierra-and-clustrix-database-stack.html" title="Sierra and the Clustrix Database Stack" /><author><name>Sergei</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="24" height="32" src="http://3.bp.blogspot.com/_VHQJkYQ5-dY/SoixbUYDYOI/AAAAAAAAAro/so2zLlfjOuc/S220/profile.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/_VHQJkYQ5-dY/TUnZW_lhjKI/AAAAAAAABq8/iF_iwZ-UZUo/s72-c/clustrix-stack.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2011/02/sierra-and-clustrix-database-stack.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkUDRHc9fyp7ImA9WhRbEU8.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-4526996759410953573</id><published>2011-02-01T10:01:00.000-08:00</published><updated>2012-02-01T11:24:35.967-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-01T11:24:35.967-08:00</app:edited><title>MongoDB vs. Clustrix Comparison: Part 2</title><content type="html">&lt;b&gt;Introduction&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In Part 1 of my comparison, I ran some performance benchmarks to  establish that relational systems can scale performance. In this post I  would like to focus more on the High Availability and Fault Tolerance  aspects of the two systems. The post will go over the approach of each  system and what it means for fault tolerance and availability. I will  also conduct a test of the claims: I'm going to fail a node by pulling  power to see what happens.&lt;br /&gt;&lt;b&gt;&lt;br /&gt;A Primer on Clustrix Data Distribution&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Clustrix has a fine grained approach to data distribution. The following graphic demonstrates the basic concepts and terminology used by our system. Notice that unlike MongoDB (and many other systems for that matter), Clustrix applies a per-index distribution strategy.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_VHQJkYQ5-dY/TUi4Xrcoz6I/AAAAAAAABq4/hDv1CsjWrLU/s1600/slices-intro.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/_VHQJkYQ5-dY/TUi4Xrcoz6I/AAAAAAAABq4/hDv1CsjWrLU/s1600/slices-intro.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;There are many interesting implications for query evaluation and execution in our model, and the topic deserves its own set of posts. For the curious, you can get a brief &lt;a href="http://www.clustrix.com/wp-content/uploads/2010/04/clustrix-whitepaper-01-no-on-sql-mysql-object-key-value-store-database-scaling.pdf"&gt;introduction to our evaluation model&lt;/a&gt; from our white paper on the subject. For this post, I'm going to stick to how our model applies to fault tolerance and availability.&lt;br /&gt;&lt;br /&gt;You can find documentation for &lt;a href="http://www.mongodb.org/display/DOCS/Sharding"&gt;MongoDB's distribution approach&lt;/a&gt; on their website. In brief, MongoDB chooses a single distribution key for the collection. Indexes are co-located with the primary key shard.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Clustrix Fault Tolerance and Availability Demo&lt;/b&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="http://i.ytimg.com/vi/L4PAo_du6fI/0.jpg" height="266" width="320"&gt;&lt;param name="movie" value="http://www.youtube.com/v/L4PAo_du6fI?f=user_uploads&amp;c=google-webdrive-0&amp;app=youtube_gdata" /&gt;&lt;param name="bgcolor" value="#FFFFFF" /&gt;&lt;embed width="320" height="266" src="http://www.youtube.com/v/L4PAo_du6fI?f=user_uploads&amp;c=google-webdrive-0&amp;app=youtube_gdata" type="application/x-shockwave-flash"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;/div&gt;&lt;b&gt;Fault Tolerance&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Both Clustrix and MongoDB rely on replicas for fault tolerance. A loss of a node results in a loss of some copy of the data which we can find elsewhere in the system. The MongoDB team put together a good set of documentation &lt;a href="http://www.mongodb.org/display/DOCS/Replication"&gt;describing their replication model&lt;/a&gt;. Perhaps one of the most salient differences between the two approaches is the granularity of data distribution.&amp;nbsp; The unit of recovery on Clustrix is the replica (a small portion of an index), while the unit of recovery in MongoDB is a full instance of a Replication Set.&lt;br /&gt;&lt;br /&gt;For Clustrix, this means that the reprotection operation happens in a many-to-many fashion. Several nodes copy small portions of data from each of their disks to several other nodes onto many disks. The advantages of this approach are:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;No single disk in the system becomes overloaded with writes or reads&lt;/li&gt;&lt;li&gt;No single node hotspot for driving the reprotect work&lt;/li&gt;&lt;li&gt;Incremental progress toward full protection&lt;/li&gt;&lt;li&gt;Independent replica factors for each index (e.g. primary key 3x, indexes 2x)&lt;/li&gt;&lt;li&gt;Automatic reprotection which doesn't require operator intervention&lt;/li&gt;&lt;li&gt;All replicas are always consistent&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;One of the interesting aspects of the system is the &lt;b&gt;complete automation of every recovery task&lt;/b&gt;. It's built in. I don't have to do anything to make that happen. So if I have a 10 node system, and a node fails, in an hour or so I will have a completely protected&amp;nbsp; 9 node system without any operator intervention at all. When the 10th node comes back, the system will simply perceive a distribution imbalance and start moving data back onto that node.&lt;br /&gt;&lt;br /&gt;While the Replica Sets feature in MongoDB is nicer than replication in say MySQL, it's still highly manual. So in contrast with the above list for Clustrix, for MongoDB we have:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Manual intervention to recover from failure&lt;/li&gt;&lt;li&gt;The data is moved in a one-to-one fashion&lt;/li&gt;&lt;li&gt;All data within a Replica Set has the same protection factor&lt;/li&gt;&lt;li&gt;Failures can lead to inconsistency&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;b&gt;Availability&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Both systems rely on having multiple copies of data for Availability. I've seen a lot of interesting discussion recently about the CAP theorem and what it means for real-world distributed database systems. It's another deep topic which really deserves its own set of posts, so I'll simply link to a couple posts on the subject which I find interesting and illuminating:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://voltdb.com/blog/clarifications-cap-theorem-and-data-related-errors"&gt;Stonebraker of VoltDB on partition tolerance&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://perspectives.mvdirona.com/2010/02/24/ILoveEventualConsistencyBut.aspx"&gt;James Hamilton of Amazon/AWS on consistency&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;At Clustrix, we think that Consistency, Availability, and Performance are much more important than Partition tolerance. &lt;b&gt;Within a cluster, Clustrix keeps availability in the face of node loss while keeping strong consistency guarantees.&lt;/b&gt; But we do require that more than half of the nodes in the cluster group membership are online before accepting any user requests. So a cluster provides fully ACID compliant transactional semantics while keeping a high level of performance, but you need majority of the nodes online.&lt;br /&gt;&lt;br /&gt;However, Clustrix also offers a lower level of consistency in the way of asynchronous replication between clusters. So if you want to setup a disaster recovery target in another physical location over high-latency link, we're able to accommodate that mode. It simply means that your backup cluster may be out of date by some number of transactions.&lt;br /&gt;&lt;br /&gt;MongoDB has relaxed consistency all around. The Replication Set itself uses an asynchronous replication model. The MongoDB guys are &lt;a href="http://www.mongodb.org/display/DOCS/Replica+Set+Design+Concepts"&gt;upfront about the kinds of anomalies&lt;/a&gt; they expose. The end user gets the equivalent of &lt;b&gt;read uncommitted&lt;/b&gt; isolation. Mongo's claim is that they do this because they (1) can achieve higher performance, and (2) "&lt;i&gt;merging back old operations later, after another node has accepted writes, is a hard problem.&lt;/i&gt;" &lt;span style="color: black;"&gt;Yes. Distributed protocols are a hard problem, but it doesn't mean you should punt on them.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Availability Continued&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;There's also a more nuanced discussion to availability. One of the principal design features of Clustrix has been to aim for lock-free operation whenever possible. We have Multi-Version Concurrency Control (MVCC) deeply ingrained in the system. It allows a transaction to see a consistent snapshot of the database without interfering with writes. So a read in our system will not block a write.&lt;br /&gt;&lt;br /&gt;Building on top of MVCC, Clustrix has implemented &lt;b&gt;a transactionally safe, lockless, and fully consistent method for moving data in the cluster without blocking any writes to that data&lt;/b&gt;. All of this happens completely automatically. No administrator intervention required. So when the Rebalancer decides to move a replica from Node 1 to Node 3, the replica can continue to take writes. We have a mechanism to sync changes to the source replica with the target replica without limiting the replica availability.&lt;br /&gt;&lt;br /&gt;Compare that to what many other systems do: read lock the source to get a consistent view for a replica copy. You end up locking out writers for the duration of the data copy. So while your data is available for reads, it is not available for writes.&lt;br /&gt;&lt;br /&gt;After a node failure (or to be more precise, a replica failure within a set), MongoDB advocates the following approach:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Quiesce the master (read lock)&lt;/li&gt;&lt;li&gt;Flush dirty buffers to disk&amp;nbsp; (fsync)&lt;/li&gt;&lt;li&gt;Take an LVM snapshot of the resulting files&lt;/li&gt;&lt;li&gt;Unlock the master&lt;/li&gt;&lt;li&gt;Move the data files over to the slave&lt;/li&gt;&lt;li&gt;Let the slave catch up from the snapshot&lt;/li&gt;&lt;/ol&gt;So a couple of points (a) the &lt;b&gt;MongoDB is not available for writes&lt;/b&gt; during steps (1) and (2),&amp;nbsp; and (b) it's a &lt;b&gt;highly manual process&lt;/b&gt;. It reminds me very much of the MySQL best practices for setting up a slave. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I've seen a lot of heated debates about Consistency, Availability, Performance, and Fault Tolerance. These issues are deeply interconnected and it's difficult to write about any of them in isolation. Clustrix maintains a high level of performance without sacrificing consistency and very high degree of availability. I know that it's possible to build such a system because we actually built it. And you shouldn't sacrifice these features in your application because you believe it's the only way to achieve good performance.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-4526996759410953573?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/mQ8WRE02eeA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/4526996759410953573/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=4526996759410953573" title="6 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/4526996759410953573?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/4526996759410953573?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/mQ8WRE02eeA/mongodb-vs-clustrix-comparison-part-2.html" title="MongoDB vs. Clustrix Comparison: Part 2" /><author><name>Sergei</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="24" height="32" src="http://3.bp.blogspot.com/_VHQJkYQ5-dY/SoixbUYDYOI/AAAAAAAAAro/so2zLlfjOuc/S220/profile.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_VHQJkYQ5-dY/TUi4Xrcoz6I/AAAAAAAABq4/hDv1CsjWrLU/s72-c/slices-intro.png" height="72" width="72" /><thr:total>6</thr:total><feedburner:origLink>http://blog.clustrix.com/2011/02/mongodb-vs-clustrix-comparison-part-2.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkUDRHczcCp7ImA9WhRbEU8.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-6210544906923762590</id><published>2011-01-30T19:55:00.000-08:00</published><updated>2012-02-01T11:24:35.988-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-01T11:24:35.988-08:00</app:edited><title>MongoDB vs. Clustrix Comparison: Part 1 -- Performance</title><content type="html">&lt;span style="font-weight: bold;"&gt;UPDATE:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;You can now &lt;a href="http://www.clustrix.com/wp-content/uploads/2011/01/mongo-bench.tgz"&gt;download the benchmark&lt;/a&gt; code. As mentioned in my post, I populated both databases with 300M rows. I played around with the best client/host/thread ratio for each database to achieve peak throughput.  I used MongoDB v1.6.5.&lt;br /&gt;&lt;br /&gt;For the MongoDB read tests, I used the read-test.cpp harness. I didn't have time to do a proper getopt parsing for it, so I would modify it by hand for test runs. But it's very straight forward.&lt;br /&gt;&lt;br /&gt;The C version of the MySQL/Clustrix harness is not included because I used a Clustrix internal test harness framework -- to save time. It doesn't impart Clustrix any advantage in the test, and it relies too much on our infrastructure to be of any value. You can still use the Python based test harness -- it just requires a lot of client cpu power.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;UPDATE 2:&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;There's an &lt;a href="http://news.ycombinator.com/item?id=2161753"&gt;interesting conversation&lt;/a&gt; over at Hacker News about this post. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Introduction&lt;/span&gt; &lt;br /&gt;&lt;br /&gt;With all the recent buzz about NoSQL and non-relational databases, the marketing folks at Clustrix asked a question: Do we have the right solution for today's market? It's a fair question, especially since Clustrix is a fully featured RDBMS with a SQL interface. And we all heard that SQL doesn't scale, right?&lt;br /&gt;&lt;br /&gt;So that brings us around to the next question: What do people actually want out of their database? Surely it's not simply the absence of a SQL based interface. Because if that's the case,&amp;nbsp; &lt;a href="http://en.wikipedia.org/wiki/Berkeley_DB"&gt;Berkeley DB&lt;/a&gt; would be a lot more popular than say &lt;a href="http://www.sqlite.org/"&gt;SQLite&lt;/a&gt;. Over the years, we've had many conversations with people about their database needs. Over and over, the following has always come up a list of must have features for any modern system:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li style="font-weight: bold;"&gt;Incrementally Scalable Performance&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;High Availability and Fault Tolerance&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Ease of Overall System Management&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Interestingly enough, we never heard that SQL or the relational model was the root of all their problems. It appears that the anti-SQL sentiment came around with this sort of false reasoning.&lt;br /&gt;&lt;blockquote&gt;I have a SQL based RDBMS.&lt;br /&gt;I can't seem to scale my database beyond a single system.&lt;br /&gt;Therefore SQL is the problem.&lt;/blockquote&gt;&lt;br /&gt;The NoSQL movement embraced this reasoning. The NoSQL proponents began to promote all sorts of "scalable" systems at the expense of venerable DBMS features like durability. And they kept going. What else don't we need? Well, we don't need Consistency! Why? Because that's really hard to do and keep performance.&amp;nbsp; Slowly but surely, these systems would claim to have a panacea for all of your scalable database needs at the expense of cutting features we've come to expect from 40 years of database systems design.&lt;br /&gt;&lt;br /&gt;Well, that's just bullshit. There is absolutely nothing about SQL or the relational model preventing it from scaling out. &lt;br /&gt;&lt;br /&gt;Over the next set of posts, I'm going to compare &lt;a href="http://mongodb.org/"&gt;MongoDB&lt;/a&gt; and &lt;a href="http://www.clustrix.com/"&gt;Clustrix&lt;/a&gt; using the above evaluation criteria: Scalable Performance,  Availability and Fault Tolerance, and Ease of Use.  I am going to start with Performance because no one believes that you can grow a relational database to Internet Scale. And to put the results into context, I chose to compare Clustrix to MongoDB because (1) it doesn't support SQL, (2) it can transparently scale to multiple nodes, and (3) it seems to be the new poster child for NoSQL.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Performance&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Conducting performance benchmarks is always challenging. First, you have to decide on a model workload. Next, you have to accurately simulate that workload. The system under test should be as close as possible to your production environment. The list gets long. In the end, no benchmark is going to be perfect. Best you can hope for is reasonable.&lt;br /&gt;&lt;br /&gt;So I looked at some common themes in the workloads from &lt;a href="http://gigaom.com/cloud/clustrix-lifts-the-curtain-on-early-database-customers/"&gt;some of our customers&lt;/a&gt;, and decided that I would simulate a basic use case of keeping metadata about a collection of 1 Billion files. Whether you're a cloud based file storage provider or a photo sharing site, the use case is familiar. The test would use the appropriate access patterns for the database. Since MongoDB does not support joins, I'm not going to put it at a disadvantage by moving join logic into the application. Instead, I'm going to make full use of the native document centric interface.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Benchmark Overview&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;10 node cluster&lt;/span&gt; of dedicated hosts (SSD, 8 cores/node, 32GB ram/node)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;2x replication&lt;/span&gt; factor&lt;br /&gt;&lt;/li&gt;&lt;li&gt;A data set containing information about &lt;strike&gt;&lt;span style="font-weight: bold;"&gt;1 Billion files&lt;/span&gt;&lt;/strike&gt;&lt;span style="font-weight: bold;"&gt; &lt;span style="color: red;"&gt;300 Million&lt;/span&gt; &lt;/span&gt;(see bellow)&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;A read only performance test&lt;/li&gt;&lt;li&gt;A read/write performance test&lt;/li&gt;&lt;li&gt;Both databases will use the exact same hardware &lt;/li&gt;&lt;/ul&gt;The test uses the following schema:&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;CREATE TABLE files (&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; id &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;                  bigint     NOT NULL,&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; path&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;                 varchar    NOT NULL,&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; size&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;                 bigint     NOT NULL,&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; server_id&amp;nbsp;&amp;nbsp;&amp;nbsp;            int        NOT NULL,&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; deleted&amp;nbsp; &amp;nbsp; &amp;nbsp;              smallint   NOT NULL,&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; last_updated         datetime   NOT NULL,&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; created&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;              datetime   NOT NULL,&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; user_id&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;              bigint     NOT NULL,&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; PRIMARY KEY (id),&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; KEY server_id_deleted (server_id, deleted),&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; KEY user_id_updated (user_id, last_updated),&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; KEY user_id_path (user_id, path)&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;);&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;Additionally, the data set has the following characteristics:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;path&lt;/span&gt; is a randomly generated string between the length of 32 and 128 characters&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;server_id&lt;/span&gt; has a distribution of 0-32&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;deleted&lt;/span&gt; has 1% value 1, and the rest 0 (lumpy data distributions tests)&lt;/li&gt;&lt;li&gt;for MongoDB, we use &lt;span style="font-style: italic; font-weight: bold;"&gt;user_id&lt;/span&gt; as the shard key&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Test 1: Loading the Initial Data Set&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The test harness itself is a multi-host, multi-process, and multi-threaded python application. Because of the &lt;a href="http://wiki.python.org/moin/GlobalInterpreterLock"&gt;GIL&lt;/a&gt; in python, I ended up designing the test harness so that it forks off multiple processes, with each process having some number of threads. It also turns out that I needed more than 10 client machines to saturate the cluster with reads using python, so I rewrote the read tests suing C++ for MongoDB and C for Clustrix.&lt;br /&gt;&lt;br /&gt;While populating the dataset into MongoDB,&amp;nbsp; I kept on running into a huge drop off in performance at around 55-60M rows. A 10 node cluster has an aggregate of 320GB or ram and 80 160GB SSD drives. That's more than enough iron to handle that much data. As I started to dig in more, I saw that 85% of the data was distributed to a single node. MongoDB had split the data into multiple chunks, but its balancer could not (would not?) move the data to other nodes. Once the database size exceeded that node's available memory, everything went to shit. The box started thrashing pretty badly. It seems that under a constant high write load, MongoDB is unable to automatically redistribute data within the cluster.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span id="goog_1576347085"&gt;&lt;/span&gt;&lt;span id="goog_1576347086"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_VHQJkYQ5-dY/TUOsCf5fduI/AAAAAAAABqk/6SHSqvlasZE/s1600/mongo-write-fail.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_VHQJkYQ5-dY/TUOsCf5fduI/AAAAAAAABqk/6SHSqvlasZE/s1600/mongo-write-fail.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;To get the test going, I split the files collection into an even distribution. Without any load on the cluster, I watched MongoDB move the chunks onto the 10 replica sets for an even layout. Now I was finally getting somewhere.&lt;br /&gt;&lt;br /&gt;Immediately, I noticed that MongoDB had a highly variable write throughput. I was also surprised at how low the numbers were. Which led me to discover Mongo's concurrency control: a single mutex over the database instance. Furthermore, in tracking the insert performance along with memory utilization, I could see that getting to more than 300 million records would spill some of the data set to disk. While that's a reasonable benchmark for a database, I decided that I would keep the data set memory resident for Mongo's sake.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_VHQJkYQ5-dY/TUOt52JGdmI/AAAAAAAABqo/ADMuOUMrKxc/s1600/insert.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_VHQJkYQ5-dY/TUOt52JGdmI/AAAAAAAABqo/ADMuOUMrKxc/s1600/insert.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The drop-off on the Clustrix happened because not all of the load scripts finished at the same time. A couple of the client nodes were slower, so they took a bit longer to finish up their portion of the load.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;For a system which eschews consistency and durability, the write performance on MongoDB looks atrocious. Initially, I thought that Mongo completely trashing Clustrix on the write performance.&amp;nbsp; The result was a complete surprise. Here's why I think Clustrix did so much better:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Highly concurrent b-tree implementation with fine grained locking&lt;/li&gt;&lt;li&gt;Buffer manager and transaction log tuned for SSD&lt;/li&gt;&lt;li&gt;Completely lock-free Split and Move operations (very cool stuff, another post)&lt;/li&gt;&lt;/ul&gt;Conversely, Mongo did so poorly because it: &lt;br /&gt;&lt;ul&gt;&lt;li&gt;Has a big fat lock, severely limiting concurrency&lt;/li&gt;&lt;li&gt;It relies entirely on the kernel buffer manager&lt;/li&gt;&lt;/ul&gt;Total time to load was &lt;b&gt;0:37 &lt;/b&gt;(hh:mm)&lt;b&gt; on Clustrix&lt;/b&gt; and &lt;b&gt;4:47 on MongoDB&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b style="color: #990000;"&gt;Clustrix was 775% faster for writes than MongoDB!&lt;/b&gt;&lt;br /&gt;And that's with fully durable and fully consistent writes on the Clustrix side.&lt;br /&gt;&lt;ul&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Test 2: Read Only&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;The read test consists of the following basic workloads:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;get the 10 latest updated files for a specific user&lt;/li&gt;&lt;li&gt;count the number of deleted files on a given server id&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;/ul&gt;&lt;br /&gt;I chose these queries because they are representative of the types of queries our example application would generate, and they are not simple point selects. Getting a distributed hash table working is easy. But DHTs tend to fall apart fairly quickly when queries start introducing ordering, examining multiple rows,&amp;nbsp; or other non key-value lookups. In other words, real-world use.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;MongoDB&lt;/b&gt;&lt;br /&gt;C++ test harness. Peak throughput at 64 concurrent client threads.&lt;br /&gt;&lt;div style="color: #444444;"&gt;db.files.find({user_id: user_id}).sort({last_updated: -1}).limit(10)&lt;/div&gt;&lt;div style="color: #990000;"&gt;55,103 queries/sec&lt;/div&gt;&lt;div style="color: #444444;"&gt;db.files.find({'server_id': server_id, 'deleted': 1}).count()&lt;/div&gt;&lt;div style="color: #38761d;"&gt;675 queries/sec&lt;/div&gt;&lt;br /&gt;&lt;b&gt;Clustrix&lt;/b&gt;&lt;br /&gt;C test harness. Peak throughput at 256 concurrent client threads. &lt;br /&gt;&lt;div style="color: #444444;"&gt;select * from benchmark.files where user_id = .. order by last_updated desc limit 10 &lt;/div&gt;&lt;div style="color: #38761d;"&gt;56,641 queries/sec&amp;nbsp;&lt;/div&gt;&lt;div style="color: #444444;"&gt;select count(1) from benchmark.files where server_id = .. and deleted = 1&lt;/div&gt;&lt;div style="color: #990000;"&gt;625 queries/sec&lt;/div&gt;&lt;br /&gt;So on a read-only test, MongoDB and Clustrix are within 1% of each other for test1. Clustrix is faster on test 1 and MongoDB is 7% faster on test2. I captured a profile of Clustrix during a test1 run, and saw that the the execution engine dominates CPU time (as opposed to say SQL parsing or query planning). In looking at the profiles during test2 runs on Clustrix, I saw that we  had a bunch of idle time in the system, so there's room for  optimization.&lt;br /&gt;&lt;br /&gt;But real-world loads tend to be read/write, so let's see how Mongo does when we add writes to the equation.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Test 3: Read/Write &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;My initial plan called for a combination of read-centric and write-centric loads. It seems that most web infrastructures are heavier on the reads than writes, but there are many exceptions. In Clustrix, we use &lt;a href="http://en.wikipedia.org/wiki/Multiversion_concurrency_control"&gt;Multi-Version Concurrency Control&lt;/a&gt;, which means that readers are never blocked by writers. We handle both read heavy and write heavy workloads equally well. Since MongoDB seems to do much better with reads than writes, I decided to stick to a read-centric workload.&lt;br /&gt;&lt;br /&gt;The Clustrix test shows show very little drop off in performance for reads. On the Mongo side, I expected to see a drop off in performance directly proportional to the amount of write load.&lt;br /&gt;&lt;br /&gt;However, what I saw mind blowing: &lt;b&gt;Mongo completely starved the readers&lt;/b&gt;! The following graph shows the query load on one of the 10 shards during the write portion of the test. I simply started up &lt;b&gt;a single write thread&lt;/b&gt; while letting the read test run. The write was active for all of 60 seconds, and it took Mongo an additional 15 seconds to recover after the writer stopped.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_VHQJkYQ5-dY/TUO3RAn8SNI/AAAAAAAABqs/FJKgl_HgBWA/s1600/mongo-rw-fail.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_VHQJkYQ5-dY/TUO3RAn8SNI/AAAAAAAABqs/FJKgl_HgBWA/s1600/mongo-rw-fail.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;MongoDB&lt;br /&gt;&lt;span style="color: red;"&gt;FAIL&lt;/span&gt; &lt;br /&gt;&lt;br /&gt;Custrix&lt;br /&gt;test1: 4,425 writes/sec and 49,099 reads/sec (total 53,524 queries/sec) (92% read / 8% write) &lt;br /&gt;test2: 4,450 writes/sec and 625 reads/sec&lt;br /&gt;&lt;br /&gt;The test2 aggregate query is much more computationally expensive compared to an insert. So the read/write ratio for test2 became very skewed. Note that Clustrix did not drop in read throughput at all. &lt;br /&gt;&lt;br /&gt;Overall, you can see why every modern DBMS choose to go with the MVCC model for concurrency control.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Conclusion&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The SQL relational model can clearly scale. The only place where MongoDB could compete with Clustrix was on pure read-only workloads. But that's just not representative of real world application loads &lt;br /&gt;&lt;br /&gt;Building a scalable distributed system is more about good architecture and solid engineering. Now that we have scale and performance out of the way, I'm going to review the other important aspects of a DBMS in my upcoming posts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-6210544906923762590?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/jziHchKu-X4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/6210544906923762590/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=6210544906923762590" title="22 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/6210544906923762590?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/6210544906923762590?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/jziHchKu-X4/mongodb-vs-clustrix-comparison-part-1.html" title="MongoDB vs. Clustrix Comparison: Part 1 -- Performance" /><author><name>Sergei</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="24" height="32" src="http://3.bp.blogspot.com/_VHQJkYQ5-dY/SoixbUYDYOI/AAAAAAAAAro/so2zLlfjOuc/S220/profile.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_VHQJkYQ5-dY/TUOsCf5fduI/AAAAAAAABqk/6SHSqvlasZE/s72-c/mongo-write-fail.png" height="72" width="72" /><thr:total>22</thr:total><feedburner:origLink>http://blog.clustrix.com/2011/01/mongodb-vs-clustrix-comparison-part-1.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak4DQno4fip7ImA9WhRbEU8.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-4450245759362029219</id><published>2010-10-22T11:02:00.002-07:00</published><updated>2012-02-01T12:42:53.436-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-01T12:42:53.436-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="MySQL" /><category scheme="http://www.blogger.com/atom/ns#" term="No SQL" /><title>SQL Is Not The Problem</title><content type="html">&lt;div class="entry-content"&gt;
There is a lot of buzz about NoSQL out 
there.  According to the hype,  NoSQL solves scalability issues, NoSQL 
simplifies development, NoSQL  has unlimited write performance, etc.  
While these claims may  technically be true, they are in no way 
exclusive to the NoSQL offerings  out there.  All of these things are 
achievable in a SQL database with  full ACID and relational 
capabilities.  In a recent post, Derrick Harris  asks &lt;a href="http://gigaom.com/cloud/will-scalable-data-stores-make-nosql-a-non-starter-2/"&gt;“Will Scalable Data Stores Make NoSQL a Non-Starter?”&lt;/a&gt;.
   He Makes the point that scalable SQL is a reality today so why would 
 someone go to a NoSQL solution?  What benefit could it possibly bring? 
  Would anyone choose to give up transactions if they didn’t have to?   
Would anyone choose to give up ACID properties?  Would anyone choose to 
 give up the ability to do relational operations?  How would giving up  
any of these things simplify development?  Looking at the comments on  
Derrick’s post provides a clue.  They mention the “flexible schema” as  
the key feature.&lt;br /&gt;
&lt;br /&gt;
A flexible schema is defined by the ability to 
assign arbitrary  properties to an object without having to have defined
 columns for those  properties.  For example, in a single table, object A
 could represent a  picture and have “height” and “width” properties 
while object B could  represent an audio stream that has “length” and 
“bitrate” properties.   In a traditional RDBMS environment, you’d 
probably create multiple  tables for each different object type which 
can be inconvenient and  possibly inefficient.&lt;br /&gt;
Is that it?  Why 
wouldn’t you just add that feature to a scalable SQL  database rather 
than throw away all the good things SQL databases have  to offer?  You 
could easily add a “map” column to a table that allows  you to store an 
arbitrary map of key/values associated with an object.   You now have 
the tools to implement a flexible schema while maintaining  relational 
capabilities, ACID properties, and full scalability.  It’s  the best of 
both worlds.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-4450245759362029219?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/H9HSI_zaOto" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/4450245759362029219/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=4450245759362029219" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/4450245759362029219?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/4450245759362029219?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/H9HSI_zaOto/sql-is-not-problem_22.html" title="SQL Is Not The Problem" /><author><name>Aaron Passey</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2010/10/sql-is-not-problem_22.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0UCSXo8fyp7ImA9WhRbEU8.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-6154326611436564754</id><published>2010-06-25T10:58:00.001-07:00</published><updated>2012-02-01T12:47:48.477-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-01T12:47:48.477-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scaling" /><category scheme="http://www.blogger.com/atom/ns#" term="architecture" /><category scheme="http://www.blogger.com/atom/ns#" term="web" /><category scheme="http://www.blogger.com/atom/ns#" term="database" /><title>Surprising Advice For Startups</title><content type="html">&lt;div class="entry-content"&gt;
Clustrix just exhibited at two good shows: &lt;a href="http://event.gigaom.com/structure/"&gt;Structure&lt;/a&gt; and &lt;a href="http://velocityconf.com/velocity2010"&gt;Velocity&lt;/a&gt;.  I spent my time at Structure.  One of the highlights of the show for Clustrix was the &lt;a href="http://gigaom.com/2010/06/24/structure-2010-the-future-of-sql-in-the-cloud/"&gt;The Future of SQL In the Cloud&lt;/a&gt;
  panel that Paul was on.  There was some good discussion about SQL vs. 
 NoSQL and different products available for scaling your database.  The 
 title is a bit of a misnomer, however.  Database challenges are  
orthogonal to the cloud.  The database is a critical challenge no matter
  how the application is deployed.&lt;br /&gt;
&lt;br /&gt;
But this isn’t what surprised me the most at Structure.  That came in the &lt;a href="http://gigaom.com/2010/06/24/structure-2010-how-does-a-company-scale-in-real-time/"&gt;How Does a Company Scale in Real Time?&lt;/a&gt;
  panel.  This panel had representatives from PayPal, Engine Yard,  
Yahoo!, Facebook, and Zynga.  These panelists are all responsible for  
ensuring their products continue to function smoothly.  These companies 
 are all recognizable and successful.  I was really hoping to hear some 
 good insight into operating large scale web properties and perhaps some
  good advice for new web startups.  I think the money question came at 
 around 16:30 in the video.  Jonathan asks the panelists to rewind the  
clock and say what they would do differently and to give advice to  
startups for what they should focus on to avoid application bottlenecks.
   There was a lot of advice offered.  There was talk of establishing a 
 single-signon infrastructure.  There was emphasis on avoiding SQL  
entirely, saying it is far and away the biggest scalability problem in  
web architectures today.  There was a suggestion you should invest up  
front in separating out your architecture.  You should think hard about 
 the abstraction layers in key parts of your system and get the  
infrastructure right first.  Another talks about getting the right  
instrumentation and metrics in your application.  You should put in the 
 right levels of abstraction, and the right levels of caching.  You need
  to force yourself to run your application on two servers.  Think about
  sharding and replicating at the beginning and force your scaling  
challenges to come up in advance before you have to scale.&lt;br /&gt;
&lt;br /&gt;
None of
 this advice resonated with me.  Startups have enough  challenges 
finding the right business model and securing the initial set  of 
customers.  Why would you spend your very limited money, and more  
importantly, time on infrastructure and architecture of the back-end  
before you’ve proved your business?  Matthew Mengerink, VP of Customer  
Quality, Engineering Services, and Site Operations at PayPal finally  
stepped in and made some sense.  He says he would do nothing different. 
  To him, a startup working on architecture is a waste of a dollar.  He 
 advises spending your time and money on making the business model work.
   Thank you, Matthew!&lt;br /&gt;
&lt;br /&gt;
This discussion reminded me of a post titled &lt;a href="http://coderoom.wordpress.com/2010/05/18/start-in-the-middle/"&gt;Start In the Middle&lt;/a&gt;&lt;a href="http://coderoom.wordpress.com/2010/05/18/start-in-the-middle/"&gt; &lt;/a&gt;
 I read a while ago.  In there, it offers a bit of advice that should be
  obvious that is so often not: solve the interesting bit of the problem
  first.  Prove there’s value in the core idea, and &lt;i&gt;then&lt;/i&gt; flesh 
out  the infrastructure around it.  Do what makes your business unique 
first  and put the majority of time into that.  Delay building the rest 
of the  surrounding architecture until its really needed or when 
possible, just  buy those bits.  The rest of that BS is not what your 
customers see and  not what makes you money.&lt;br /&gt;
&lt;br /&gt;
So what’s my advice? 
 Make the code and architecture as simple as you  possibly can but no 
simpler.  Don’t abstract too early and don’t make  premature 
optimizations.  Rewriting code and adding abstractions later  is not a 
sign of coding failure, rather it’s a sign the product is  successful.  
Choose the data model that fits your problem.  That is  frequently SQL. 
 Full transactional and relational SQL is exceptionally  expressive and 
it fits many problems so well.  Why wouldn’t you use it?   Don’t 
re-invent things that are not part of your core business.  When  the 
product becomes successful and the load starts to ramp, collect hard  
data to find the bottlenecks and fix them when they become a problem.   
When your data shows that your database is that bottleneck, perhaps  
Clustrix can help.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-6154326611436564754?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/ruZi3hmS9jU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/6154326611436564754/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=6154326611436564754" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/6154326611436564754?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/6154326611436564754?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/ruZi3hmS9jU/surprising-advice-for-startups_25.html" title="Surprising Advice For Startups" /><author><name>Aaron Passey</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2010/06/surprising-advice-for-startups_25.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0EAQH48eCp7ImA9WhRbEU8.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-6126483752556916162</id><published>2010-04-29T10:56:00.001-07:00</published><updated>2012-02-01T12:54:01.070-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-01T12:54:01.070-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scaling" /><category scheme="http://www.blogger.com/atom/ns#" term="cluster" /><category scheme="http://www.blogger.com/atom/ns#" term="distributed" /><category scheme="http://www.blogger.com/atom/ns#" term="architects" /><title>Not All Clusters Are the Same</title><content type="html">&lt;div class="entry-content"&gt;
There are a wide variety of techniques out
 there for clustering  storage appliances.  The question is: what 
problem are you really trying  to solve?  If you look at &lt;a href="http://www.isilon.com/"&gt;Isilon&lt;/a&gt;’s
  clustered storage appliances (where I was chief architect), you’ll see
  that the clustering is done at the block level.  The block addresses 
are  generalized into a generic (node, drive, block_num) tuple and the  
on-disk data structures simply use that generalized address everywhere a
  block address would normally be used (plus a bunch of details I’m  
glossing over).  The communication on the back end of an Isilon cluster 
 is block reads and writes, transaction messages, and lock messages 
(plus  some other miscellaneous bits).  Each read or write operation is 
 controlled by the initiator, and the smallest granularity of locking is
  at the block level.  Cache lives both at the disk and at the 
initiator.   If you were to put it into an architecture category, you’d 
call it an  Infiniband SAN (Storage Area Network).  This is perfect for a
 file  system.  This architecture lends itself to zero-copy, extremely  
high-performance file access for streaming files, very low CPU  
utilization on the nodes holding the disks (which allows the addition of
  the accelerator nodes for high speed FibreChannel and 10GbE), infinite
  scalability, and extremely low latency for operations on cached data.&lt;br /&gt;
&lt;br /&gt;
However,
 it doesn’t support high read/write concurrency on a single  file.  
Imagine if you ran an OLTP database with a high write load using  an 
architecture like that.  With the locking done at the block level,  you 
can never expect to get high concurrency for items smaller than a  
block.  Every node that wants to write to a block would have to get an  
exclusive lock on that block, which invalidates other nodes’ caches.  If
  you had an active table with massive read/write load sitting on top of
 a  cluster like this, performance would tank, dominated by lock  
contention.  Then why do some databases take this approach to scale?   
How can you possibly make a shared-backend cluster resembling a SAN and 
 expect it to scale with a database workload like some have done?  How  
can you make an expandable storage engine plug-in and expect the entire 
 database to scale?  What works extremely well for a file system does 
not  work at all for a database.  We need a new approach.&lt;br /&gt;
&lt;br /&gt;
Clustrix
 has a new approach.  Rather than shipping the data blocks on  the back 
end, we ship the queries.  That may sound like an innocuous  statement, 
but really it has a far-reaching impact on the architecture.   To learn 
more, read the white paper&lt;a href="http://www.clustrix.com/Default.aspx?app=LeadgenDownload&amp;amp;shortpath=docs%2fClustrix_A_New_Approach.pdf"&gt; A New Approach: Sierra Distributed Database Engine&lt;/a&gt;
  that I wrote on this subject.  It shows how Clustrix has taken a novel
  approach to solve the clustered database problem, resulting in a  
database system that can handle high concurrency at any scale.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-6126483752556916162?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/jiqj3-RCsDA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/6126483752556916162/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=6126483752556916162" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/6126483752556916162?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/6126483752556916162?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/jiqj3-RCsDA/not-all-clusters-are-same_29.html" title="Not All Clusters Are the Same" /><author><name>Aaron Passey</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2010/04/not-all-clusters-are-same_29.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A04BRH8_fyp7ImA9WhRbEU8.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-5702243044788987701</id><published>2010-04-28T10:53:00.001-07:00</published><updated>2012-02-01T12:59:15.147-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-01T12:59:15.147-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="MySQL" /><category scheme="http://www.blogger.com/atom/ns#" term="ACID" /><category scheme="http://www.blogger.com/atom/ns#" term="No SQL" /><category scheme="http://www.blogger.com/atom/ns#" term="database" /><title>The Tyranny of the OR vs. the Genius of the AND</title><content type="html">&lt;div class="entry-content"&gt;
Jim Collins and Jerry Porras first coined this phrase in &lt;i&gt;Built to Last&lt;/i&gt;.
   In their book, they  describe how choosing between seemingly  
contradictory concepts—focusing on this or that—leads to missed  
opportunities.  Is the product low cost or high quality?  Do I focus on 
 short-term opportunities or long-term strategy?  Should the company be 
 bold or conservative?  &lt;i&gt;Built to Last&lt;/i&gt; is focused on business and 
 what makes great companies continue to succeed year after year.   
Collins and Porras discovered that the best companies find a way to  
embrace the positive aspects of both sides of a dichotomy, and instead  
of choosing, they find a way to have both.&lt;br /&gt;
&lt;br /&gt;
These same dichotomies 
exist in technology.  Some common dichotomies  include: low power vs. 
high performance, high stability vs. new  features, and ease of use vs. 
flexibility.  These are false choices.   You can have high performance 
and low power—just look at the huge gains  made through die shrinks.  
You can have new features without sacrificing  stability with the right 
development process and proper QA.  You can  have an easy to use product
 that also enables maximum flexibility.  All  it takes is thoughtful 
design.  We can embrace “the genius of the AND”  here, too.&lt;br /&gt;
&lt;br /&gt;
The tyranny of the &lt;i&gt;OR&lt;/i&gt;
 still is strong in the database world.   Do you want relational and 
ACID capabilities or do you want to be able  to scale?  An entire 
movement has been formed to answer this exact  either-or question: the 
NoSQL movement.  Digg has blogged about it (&lt;a href="http://about.digg.com/blog/looking-future-cassandra"&gt;http://about.digg.com/blog/looking-future-cassandra&lt;/a&gt; and &lt;a href="http://about.digg.com/blog/saying-yes-nosql-going-steady-cassandra"&gt;http://about.digg.com/blog/saying-yes-nosql-going-steady-cassandra&lt;/a&gt;),
  talking about how they did a large rewrite of their code to move to  
Cassandra, working around limitations in the process.  If you look at  
the Digg blogs, they really did have a point.  Once you scale beyond  
what a single box can do, you had two options: partition the database or
  move away from traditional relational databases.  In both cases, you  
lose relational capabilities.  You can either have scale &lt;i&gt;or&lt;/i&gt; you can have the functionality you need.&lt;br /&gt;
&lt;br /&gt;
They &lt;i&gt;did&lt;/i&gt;
 have a point.  Clustrix now offers a third option.   The Clustrix 
database offers full relational and transactional  capabilities &lt;i&gt;and&lt;/i&gt;
 it can scale.  The Clustrix product speaks the  MySQL protocol on the 
wire (but uses no MySQL code) and allows seamless  online scaling.  It 
supports arbitrary relational calculus at any  scale.  It supports full 
ACID semantics—not eventual consistency—yet  writes continue to scale.  
Clustrix has learned to embrace the genius of  the AND.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-5702243044788987701?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/kOYEvC2pagk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/5702243044788987701/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=5702243044788987701" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/5702243044788987701?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/5702243044788987701?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/kOYEvC2pagk/tyranny-of-or-vs-genius-of-and_28.html" title="The Tyranny of the OR vs. the Genius of the AND" /><author><name>Aaron Passey</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2010/04/tyranny-of-or-vs-genius-of-and_28.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEICRnYycCp7ImA9WhZbFUQ.&quot;"><id>tag:blogger.com,1999:blog-6763982002758299128.post-6103841656123052871</id><published>2010-04-15T10:00:00.000-07:00</published><updated>2011-06-20T10:56:07.898-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-06-20T10:56:07.898-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scaling" /><category scheme="http://www.blogger.com/atom/ns#" term="internet" /><category scheme="http://www.blogger.com/atom/ns#" term="web" /><title>How Clustrix was born</title><content type="html">&lt;div class="entry-content"&gt; &lt;p&gt;Back in 2005 I was at Isilon.  We had developed the first (and still  the best) truly distributed scale-out NAS solution.  We were on the road  to an IPO and our customers were excited about our products and what we  had done with them. I was on the phone with Jake, who worked for one of  our largest customers.  Jake said this to me: “This is great for what  you guys have done of our basic file storage, but can you do anything  about our databases?”.  BAM!  Just like that I got super excited about  this idea and started looking around to see who else had these types of  scalability and fault tolerance issues with their databases.  As you  probably know, it turns out that it was an issue for just about  everyone.&lt;/p&gt; &lt;p&gt;I got together with Sergei Tsarev (Clustrix co-founder) and we  started working on the problem.  What is it about these databases that  isn’t scaling?  Storage usage and query performance.  Ok, so why isn’t  it scaling?  Because the query processor is only as big as the box can  be, and scaling up with bigger and bigger boxes is a non-starter  (forklift upgrades, leaving the commodity price curve, etc.).  We saw  some systems out there doing things with virtualized clustered storage  engines running behind traditional query planners, but those always  failed to scale because you wind up pulling all of this data, as well as  dealing with locking and concurrency, over the network.  We realized  that the only way to bring true scalability is to fan out the work as  you grow the cluster.&lt;/p&gt; &lt;p&gt;As Aaron (our CTO) says – ‘Bring the query to the data, not the data to the query’.   In his whitepaper, &lt;a href="http://peewit/wp-content/uploads/2010/04/clustrix-whitepaper-01-no-on-sql-mysql-object-key-value-store-database-scaling.pdf"&gt; A New Approach: Sierra Distributed Database Engine&lt;/a&gt;,   Aaron provides a nice description of how that concept became the seed  for the revolutionary technology that is the heart of our Clustered  Database Systems.&lt;/p&gt;          &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6763982002758299128-6103841656123052871?l=blog.clustrix.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Clustrix/~4/KQFC_JjBAks" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.clustrix.com/feeds/6103841656123052871/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6763982002758299128&amp;postID=6103841656123052871" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/6103841656123052871?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6763982002758299128/posts/default/6103841656123052871?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Clustrix/~3/KQFC_JjBAks/how-clustrix-was-born.html" title="How Clustrix was born" /><author><name>Reanna</name><uri>http://www.blogger.com/profile/02147899287943196246</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.clustrix.com/2010/04/how-clustrix-was-born.html</feedburner:origLink></entry></feed>

