<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;A04FQn46eCp7ImA9WhRUE0U.&quot;"><id>tag:blogger.com,1999:blog-1528820875793831782</id><updated>2012-01-24T16:25:13.010+09:00</updated><category term="performance tuning" /><category term="awk" /><category term="mysql" /><category term="unix" /><category term="os" /><category term="twitter" /><category term="howto" /><category term="spider" /><category term="oltp" /><category term="benchmark" /><category term="dbt2" /><category term="mysql cluster" /><category term="sharding" /><category term="chrome" /><category term="google" /><title>Samurai MySQL</title><subtitle type="html">A Japanese Geek's MySQL Blog.</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://samurai-mysql.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://samurai-mysql.blogspot.com/" /><author><name>Mikiya Okuno</name><uri>http://www.blogger.com/profile/15996977664356830358</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="http://bp2.blogger.com/_3l-X4JQ1EX4/R_1oue3iv-I/AAAAAAAAAB8/wQIXtJx_O5I/S220/exotic_animal_avatar_0235.jpg" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>6</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/SamuraiMysql" /><feedburner:info uri="samuraimysql" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;CUYMSHw9fSp7ImA9WxBaGUk.&quot;"><id>tag:blogger.com,1999:blog-1528820875793831782.post-7521867821875225744</id><published>2010-03-30T18:58:00.001+09:00</published><updated>2010-03-30T18:59:49.265+09:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-03-30T18:59:49.265+09:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="spider" /><category scheme="http://www.blogger.com/atom/ns#" term="mysql" /><category scheme="http://www.blogger.com/atom/ns#" term="twitter" /><category scheme="http://www.blogger.com/atom/ns#" term="sharding" /><title>Not Only NoSQL!! Uber Scaling-Out with SPIDER storage engine</title><content type="html">The history tells that a single RDBMS node cannot handle tons of traffics on web system which come from all over the world, no matter how the database is tuned. MySQL has implemented a master/slave style replication built-in for long time, and it has enabled web applications to handle traffics using a scale-out strategy. Having many slaves has been suitable for web sites where most of traffics are reads. Thus, MySQL's master/slave replication has been used on many web sites, and is being used still.&lt;br /&gt;
&lt;br /&gt;
However, when a site grow large, amount of traffic may exceed the replication's capacity. In such a case, people may use memcached. It's an in-memory, very fast and well-known KVS, key value store, and its read throughput is far better than MySQL. It's been used as a cache for web applications to store 'hot' data with MySQL as a back-end storage, as it can reduce read requests to MySQL dramatically.&lt;br /&gt;
&lt;br /&gt;
While 1:N replication can scale read workload and memcached can reduce read requests, it cannot ease write load well. So, write traffic gets higher and higher when a web site becomes huge. On such web sites, a technique called "Sharding" has been used; it's a technique that the application choose an appropriate MySQL server from several servers.&lt;br /&gt;
&lt;br /&gt;
In that way, MySQL+memcached has been a de-fact standard data store on huge web sites for long time.&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
Since web applications are getting larger still, especially on social media sites, write load is getting higher and higher as people communicate in real-time. In such area, yet another technique is required to handle the write load. Then, some people have chosen NoSQL solutions instead of MySQL+memcached. NoSQL is a kind of buzz word, IMHO, which represents non-relational databases which doesn't require SQL access. Despite lack of SQL access, some NoSQL softwares are suitable for huge scale web applications, like Cassandra. Although people cannot JOIN records on NoSQL system, it is not possible on RDBMS over the shards as well. So, MySQL isn't used as a RDBMS, is used as a data store without joins in other words, on such a web application in the first place.&lt;br /&gt;
&lt;br /&gt;
For further information of this kind of thoughts, I recommend you to read Mark Calleghan's post: &lt;a href="http://mysqlha.blogspot.com/2010/03/plays-well-with-others.html"&gt;http://mysqlha.blogspot.com/2010/03/plays-well-with-others.html&lt;/a&gt;&lt;br /&gt;
and this post: &lt;a href="http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king"&gt;http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
Technically, it is possible to handle huge amount of traffics using MySQL, but a running cost gets expensive, Twitter says. As these techniques are separate ones, so those people have to spent their time to learn all of three who implement the application over them and manage them. On the other hand, &lt;a href="http://en.wikipedia.org/wiki/Apache_Cassandra"&gt;Cassandra&lt;/a&gt; can handle more traffics as a single database management system, so people only have to learn it instead of three. Sounds great? But, is it a really good choice?&lt;br /&gt;
&lt;br /&gt;
No! They're not aware of yet another solution, say &lt;span style="font-size:xx-large; font-style:bold"&gt;SPIDER storage engine!&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size:x-large; font-style:bold"&gt;SPIDER for MySQL&lt;/span&gt;&lt;br /&gt;
&lt;a href="http://spiderformysql.com/"&gt;http://spiderformysql.com/&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
SPIDER is a storage engine developed by a Japanese MySQL hacker, Mr. Kentoku Shiba, it makes use of MySQL's partitioning functionality and store partitioned data onto remote servers. I may say it's a Sharding storage engine. While flexibility of MySQL's storage engine API enables such an engine, but I value Kentoku's design a lot.&lt;br /&gt;
&lt;br /&gt;
The following picture depicts how SPIDER storage engine works. (This is a snippet from the site above.)&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_3l-X4JQ1EX4/S6f4IHWlpdI/AAAAAAAAATk/_e4oLASegOE/s1600-h/Screen+shot+2010-03-23+at+7.50.38+AM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://3.bp.blogspot.com/_3l-X4JQ1EX4/S6f4IHWlpdI/AAAAAAAAATk/_e4oLASegOE/s320/Screen+shot+2010-03-23+at+7.50.38+AM.png" width="293" /&gt;&lt;/a&gt;&lt;/div&gt;In this entry, I do not explain how to use SPIDER storage engine, but I tell you how great its ability is. If you want to try it out, please refer to &lt;a href="http://datacharmer.blogspot.com/2009/04/test-driving-spider-storage-engine.html"&gt;Giuseppe Maxia's post.&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
Please look at the following graph, which represents an INSERT performance comparing a single MySQL server (InnoDB), 2 SPIDER node + 2 backend MySQL server and 4 SPIDER node + 4 backend MySQL Server. You can see how good it scales.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_3l-X4JQ1EX4/S6f4PT3tg_I/AAAAAAAAATs/64Rrf9vgEmw/s1600-h/Screen+shot+2010-03-23+at+7.40.44+AM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="443" src="http://4.bp.blogspot.com/_3l-X4JQ1EX4/S6f4PT3tg_I/AAAAAAAAATs/64Rrf9vgEmw/s640/Screen+shot+2010-03-23+at+7.40.44+AM.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;The next graph is a SELECT performance. Read scales pretty good as well.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_3l-X4JQ1EX4/S6f4Q_aURtI/AAAAAAAAAT0/itunj7dOCjI/s1600-h/Screen+shot+2010-03-23+at+7.43.34+AM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="448" src="http://1.bp.blogspot.com/_3l-X4JQ1EX4/S6f4Q_aURtI/AAAAAAAAAT0/itunj7dOCjI/s640/Screen+shot+2010-03-23+at+7.43.34+AM.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;Red circles indicate where working set sizes exceed memory sizes. While performance drops when a working set size exceeds the available memory size, SPIDER is able to expand the memory so that a working set fits in it. SPIDER can make use of memory on all remote servers, as if there is a huge buffer pool in total.&lt;br /&gt;
&lt;br /&gt;
For more information about SPIDER's performance test, please refer to &lt;a href="http://www.slideshare.net/Kentoku/spider-performance-testbench-mark04242009"&gt;Kentoku's slide&lt;/a&gt;. It's surprising.&lt;br /&gt;
&lt;br /&gt;
The most significant problem for twitter is to scale out read/write load with less running cost. Unfortunately, they had chosen NoSQL solution due to the fact that "MySQL replication + memcached + sharding" cannot handle write intensive workload well. However, such a problem can be resolved using SPIDER storage engine with MySQL!&lt;br /&gt;
&lt;br /&gt;
Generally, KVS cannot solve certain problems like below:&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;JOIN&lt;/li&gt;
&lt;li&gt;Sort (ORDER BY)&lt;/li&gt;
&lt;li&gt;Aggregation (GROUP BY)&lt;/li&gt;
&lt;/ul&gt;When using KVS, these problems can be handled using MapReduce, however, we can process the same task using a very simple SQL in general. Thus, SQL allows us to develop a complex logic very efficiently. When I ask Kentoku permission to write an article about his storage engine, he told me his philosophy like below:&lt;br /&gt;
&lt;blockquote style="border-style:dotted; border-width:thin; padding: 10px; background-color:ivory"&gt;I think that the most significant benefit to use RDB is its usefulness and flexibility. It is a very important characteristic for developers in order to keep the application competitive, especially for those developers who have to add new features/functionalities day by day, like web services. I develop SPIDER storage engine in order to provide developers such useful and flexible RDB's characteristics, even on the environment where the traffic and data is huge thus Sharding is required.&lt;br /&gt;
&lt;/blockquote&gt;I 100% agree with his opinion. If you are facing the problem caused by high traffic and huge data just like twitter, please consider to use SPIDER storage engine before migrating to NoSQL solutions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1528820875793831782-7521867821875225744?l=samurai-mysql.blogspot.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/ZGJedgt6jIZhy5_tPI4ba_jalls/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/ZGJedgt6jIZhy5_tPI4ba_jalls/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/ZGJedgt6jIZhy5_tPI4ba_jalls/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/ZGJedgt6jIZhy5_tPI4ba_jalls/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/SamuraiMysql/~4/Wtb-_7swK-0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://samurai-mysql.blogspot.com/feeds/7521867821875225744/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://samurai-mysql.blogspot.com/2010/03/not-only-nosql-uber-scaling-out-with.html#comment-form" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1528820875793831782/posts/default/7521867821875225744?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1528820875793831782/posts/default/7521867821875225744?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/SamuraiMysql/~3/Wtb-_7swK-0/not-only-nosql-uber-scaling-out-with.html" title="Not Only NoSQL!! Uber Scaling-Out with SPIDER storage engine" /><author><name>Mikiya Okuno</name><uri>http://www.blogger.com/profile/15996977664356830358</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="http://bp2.blogger.com/_3l-X4JQ1EX4/R_1oue3iv-I/AAAAAAAAAB8/wQIXtJx_O5I/S220/exotic_animal_avatar_0235.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_3l-X4JQ1EX4/S6f4IHWlpdI/AAAAAAAAATk/_e4oLASegOE/s72-c/Screen+shot+2010-03-23+at+7.50.38+AM.png" height="72" width="72" /><thr:total>7</thr:total><feedburner:origLink>http://samurai-mysql.blogspot.com/2010/03/not-only-nosql-uber-scaling-out-with.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkYCSHY6eSp7ImA9WxBaFEs.&quot;"><id>tag:blogger.com,1999:blog-1528820875793831782.post-1320531722866822571</id><published>2009-12-30T07:46:00.002+09:00</published><updated>2010-03-25T07:02:49.811+09:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-03-25T07:02:49.811+09:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="awk" /><category scheme="http://www.blogger.com/atom/ns#" term="mysql" /><category scheme="http://www.blogger.com/atom/ns#" term="unix" /><title>awk and mysqldump</title><content type="html">Awk, which has been existing for long time on UNIX like operating system, may sound legacy for people who live in modern web-based programming world. However, awk is really sophisticated and useful tool for various purposes. Today, I'll show you how to use it with mysqldump ;)&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
One feature which is missing on the current mysqldump command is to separate dump files per DB or table. It can store all data into a large single file only, whether we run the command with --result-file option or redirect the output to some file. Of course it is possible to run mysqldump command several times per DB or table, but it cannot produce a consistent snapshot at a certain time, each dump may be time-shifted in other words, unless all involved tables are locked during dump. So, we need to backup database using a single mysqldump instance if we need a consistent data.&lt;br /&gt;
&lt;br /&gt;
Now, you can make use of awk! It is a really powerful text processor. As an output of mysqldump is text, you can process the output using awk. See the following awk script:&lt;br /&gt;
&lt;pre name="code" class="js"&gt;#!/usr/bin/awk -f

function is_new_db(db_name) {
  for (i = 1; i &lt;= num_db; i++) {
    if (db_name == db[i]) {
      return 0;
    }
  }
  return 1;
}

BEGIN {
  num_db = 0
  num_prelines = 0
  num_postlines = 0
  current_file = "/dev/null"
}

/^\-\-/ {
  if ($2 == "Current" &amp;&amp; $3 == "Database:") {
    close(current_file);
    db_name = $4
    gsub("`", "", db_name);
    current_file = db_name ".sql";
    if (is_new_db(db_name)) {
      db[++num_db] = db_name;
      print "--\n" $0 "\n--\n" &gt; current_file;
      for (i = 1; i &lt;= num_prelines; i++)
        print prelines[i] &gt;&gt; current_file;
    }
  } else if (num_db == 0) {
    num_prelines++;
    prelines[num_prelines] = $0;
  } else if ($2 == "Dump" &amp;&amp; $3 == "completed") {
    num_postlines++;
    postlines[num_postlines] = "";
    num_postlines++;
    postlines[num_postlines] = $0;
  } else {
    print $0 &gt;&gt; current_file
  }
  next;
}

/^\/\*.+\*\/;/ {
  if (match($0, "character|collation")) {
    print $0 &gt;&gt; current_file;
  } else if (match($0, "SET")) {
    if (num_db == 0) {
      if (match(prelines[num_prelines], "^\-\-")) {
        num_prelines++;
        prelines[num_prelines] = "";
      }
      num_prelines++;
      prelines[num_prelines] = $0;
    } else {
      num_postlines++;
      postlines[num_postlines] = $0;
    }
  } else {
    print $0 &gt;&gt; current_file;
  }
  next;
}

{ print $0 &gt;&gt; current_file }

END {
  for (i = 1; i &lt;= num_db; i++) {
    current_file = db[i] ".sql";
    print "" &gt;&gt; current_file
    for (j = 1; j &lt;= num_postlines; j++) {
      print postlines[j] &gt;&gt; current_file;
    }
  }
}
&lt;/pre&gt;Save the script into a file named separate-dump.awk etc, and make it exectable (chmod u+rx). You can separate dump files per DB by using the script like below:&lt;br /&gt;
&lt;pre name="code" class="js"&gt;shell&gt; mysqldump -A --single-transaction --master-data=2 --flush-logs | ./separate-dump.awk
&lt;/pre&gt;Then, dump files named like "database_name.sql" are created under your current directory! Of course, you can also process an existing dump file like below:&lt;br /&gt;
&lt;pre name="code" class="js"&gt;shell&gt; ./separate-dump.awk &lt; dump.sql
&lt;/pre&gt;
Making use of "good old fashioned" unixy command tools will make us happy ;)&lt;br /&gt;
&lt;br /&gt;
Enjoy!!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1528820875793831782-1320531722866822571?l=samurai-mysql.blogspot.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/sjFoWZWilQqV0t-kSNjTEKvI9dE/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/sjFoWZWilQqV0t-kSNjTEKvI9dE/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/sjFoWZWilQqV0t-kSNjTEKvI9dE/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/sjFoWZWilQqV0t-kSNjTEKvI9dE/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/SamuraiMysql/~4/VzZe_PbBVZE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://samurai-mysql.blogspot.com/feeds/1320531722866822571/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://samurai-mysql.blogspot.com/2009/12/awk-and-mysqldump.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1528820875793831782/posts/default/1320531722866822571?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1528820875793831782/posts/default/1320531722866822571?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/SamuraiMysql/~3/VzZe_PbBVZE/awk-and-mysqldump.html" title="awk and mysqldump" /><author><name>Mikiya Okuno</name><uri>http://www.blogger.com/profile/15996977664356830358</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="http://bp2.blogger.com/_3l-X4JQ1EX4/R_1oue3iv-I/AAAAAAAAAB8/wQIXtJx_O5I/S220/exotic_animal_avatar_0235.jpg" /></author><thr:total>2</thr:total><feedburner:origLink>http://samurai-mysql.blogspot.com/2009/12/awk-and-mysqldump.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CU8CQXcycSp7ImA9WxBSGUo.&quot;"><id>tag:blogger.com,1999:blog-1528820875793831782.post-5311288661698613930</id><published>2009-12-28T13:17:00.000+09:00</published><updated>2009-12-28T13:17:40.999+09:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-28T13:17:40.999+09:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="mysql" /><title>MySQL Cheat Sheet v1.0</title><content type="html">For those (or those who have friends) who newly start using MySQL, I release a simple "MySQL Cheat Sheet". You can print it in A4 size pretty. Print it out and put it on the wall in front of you ;)&lt;br /&gt;
&lt;br /&gt;
You can download it from here: &lt;a href="http://www.mysqlpracticewiki.com/files/cheat-sheet-en.pdf"&gt;http://www.mysqlpracticewiki.com/files/cheat-sheet-en.pdf&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
The license is CC-BY-SA. You can redistribute it and modify it under CC license.&lt;br /&gt;
&lt;br /&gt;
Happy New Year!!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1528820875793831782-5311288661698613930?l=samurai-mysql.blogspot.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/1Nry-eLRaAmTniDBoWEDFEJP6aU/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/1Nry-eLRaAmTniDBoWEDFEJP6aU/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/1Nry-eLRaAmTniDBoWEDFEJP6aU/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/1Nry-eLRaAmTniDBoWEDFEJP6aU/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/SamuraiMysql/~4/FXFvfzgWDWg" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://samurai-mysql.blogspot.com/feeds/5311288661698613930/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://samurai-mysql.blogspot.com/2009/12/mysql-cheat-sheet-v10.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1528820875793831782/posts/default/5311288661698613930?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1528820875793831782/posts/default/5311288661698613930?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/SamuraiMysql/~3/FXFvfzgWDWg/mysql-cheat-sheet-v10.html" title="MySQL Cheat Sheet v1.0" /><author><name>Mikiya Okuno</name><uri>http://www.blogger.com/profile/15996977664356830358</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="http://bp2.blogger.com/_3l-X4JQ1EX4/R_1oue3iv-I/AAAAAAAAAB8/wQIXtJx_O5I/S220/exotic_animal_avatar_0235.jpg" /></author><thr:total>4</thr:total><feedburner:origLink>http://samurai-mysql.blogspot.com/2009/12/mysql-cheat-sheet-v10.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUMNQXs7fip7ImA9WxJUFkg.&quot;"><id>tag:blogger.com,1999:blog-1528820875793831782.post-7017971438278698837</id><published>2009-07-15T20:14:00.002+09:00</published><updated>2009-07-15T20:31:30.506+09:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-07-15T20:31:30.506+09:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="os" /><category scheme="http://www.blogger.com/atom/ns#" term="mysql" /><category scheme="http://www.blogger.com/atom/ns#" term="chrome" /><category scheme="http://www.blogger.com/atom/ns#" term="google" /><title>Expectation For Chrome OS</title><content type="html">I was really impressed by the news; Google plans to release "Chrome OS". When I heard it, I felt like some big puzzle in my mind was resolved as if I had a key hole and the news was a key. Most of people may think that Chrome OS will be one of Linux distribution simply which implements Chrome OS. But totally I don't think so. It will never be such a boring OS, I guess!! When I imagine the upcoming OS, I am a bit excited.&lt;br /&gt;&lt;br /&gt;While the technology details have not been published, I expect that Chrome OS architecture will be like below:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_3l-X4JQ1EX4/Sl26bVzXDoI/AAAAAAAAAOA/ZDI5oDfDQlc/s1600-h/chrome-os-arch.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 268px;" src="http://4.bp.blogspot.com/_3l-X4JQ1EX4/Sl26bVzXDoI/AAAAAAAAAOA/ZDI5oDfDQlc/s400/chrome-os-arch.png" alt="" id="BLOGGER_PHOTO_ID_5358644110512557698" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Please note that this is my personal expectation, so the actual architecture will differ ;)  If the act will totally different from this, please go ahead and laugh about this picture!&lt;br /&gt;&lt;br /&gt;Anyway, let's go to the details about imaginary Chrome OS architecture in my mind.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;1. UI = Chrome Browser&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I'm confident that &lt;span style="color: rgb(255, 102, 0);font-size:100%;" &gt;&lt;span style="font-weight: bold;"&gt;Chrome Browser will be an only UI which a user can access&lt;/span&gt;&lt;/span&gt;. There will be no UNIX shells which we are familiar with. Geeks may say "OS without shells are not UNIX!". It's true. But Google doesn't target such geeks at all. All operation will be performed via Chrome Browser.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;2. Local Web Server&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;If my assumption above is correct, &lt;span style="font-weight: bold; color: rgb(255, 102, 0);"&gt;Chrome OS will have a web server inside&lt;/span&gt; like lighttpd or Apache for its own use. Why? If the only UI which a user can access is Chrome Browser, how does he or she shutdown the computer? I expect that Chrome OS will have a system management page which allows a user to manage or configure the system, which is a kind of Web application running on the local web server.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;3. User Application Runs On Web Server&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;It is needless to say, Chrome OS will make use of Google cloud. However, it's a not good idea that doing everything, even a trivial thing, on the cloud. Will Chrome OS run Calculator application on the cloud? I don't think so. If Chrome OS will do such a stupid thing, Google cloud will not able to handle megatons of requests. Then, such a trivial &lt;span style="font-weight: bold; color: rgb(255, 102, 0);"&gt;application will run on the client using Javascript or as an web application&lt;/span&gt;. In either case, Chrome OS will have rich applications, because there are lots of web programmers who can develop such applications all over the world.&lt;br /&gt;&lt;br /&gt;Why this is interesting is; if Chrome OS can open up the network port to the internal web server, users can access the application running on the own computer for each other. This means that&lt;span style="font-weight: bold; color: rgb(255, 102, 0);"&gt; the desktop application suddenly turn out to be a web application!&lt;/span&gt; Sounds exciting? In order to keep the security, Chrome OS may use Open IDs provided by Google.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;4. Good Bye Files&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Probably, this could be the most significant change upon Chrome OS. I expect that &lt;span style="font-weight: bold; color: rgb(255, 102, 0);"&gt;Chrome OS will not have any method to access files on the computer like "Windows Explorer"&lt;/span&gt;. Using files for data store and arranging files on a tree based filesystem are really primitive, such a method has been used since dinosaurs ruled the earth. One should imagine the content of file based on the filename, but a filename does not always describe its content correctly, so managing data using files is messy. People really want to do so? No! They simply want to store data, keep data and search data quickly when needed. Using files upon tree based filesystem is not the only way to manage data, and it's an inefficient way.&lt;br /&gt;&lt;br /&gt;Instead, &lt;span style="font-weight: bold; color: rgb(255, 102, 0);"&gt;Chrome OS will store data on a database management system&lt;/span&gt;. Local web applications will access database as if usual web applications make use of underlying database nowadays. Then, people will leave away from managing files.&lt;br /&gt;&lt;br /&gt;If so, what kind of RDBMS will be used on Chrome OS? The most probable answer is "sqlite". However, the problem is that sqlite cannot handle queries quickly when data grows large. Then, Chrome OS should employ more scalable RDBMS like MySQL, firebird or PostgreSQL IMHO. &lt;span style="font-weight: bold; color: rgb(255, 0, 0);"&gt;I strongly recommend Google to use MySQL as a MySQLer&lt;/span&gt;. If Google thinks that GPLv2 is not suitable for Chrome OS, and that's why MySQL is not employed by Chrome OS, please consider to apply FOSS Exception, which enables OSS project to release the software using other licenses than GPLv2 if the license is listed in: &lt;a href="http://www.mysql.com/about/legal/licensing/foss-exception/"&gt;http://www.mysql.com/about/legal/licensing/foss-exception/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;5. Robust Security&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;On Chrome OS, &lt;span style="font-weight: bold; color: rgb(255, 102, 0);"&gt;a user will be unable to invoke programs like we do now&lt;/span&gt;, and it will be unnecessary. Since the only program which user will access is Chrome Browser, it can be invoked by OS in advance. OS can invoke the browser as "root" user, then it can change the user afterward using setuid() system call. In this way, Chrome Browser program image on the filesystem can be owned by root and can be invoked by root only. An ordinal user doesn't have to invoke any programs on Chrome OS, because a user can do everything upon the browser. Even an ordinal user doesn't have to access any files on the computer at all.&lt;br /&gt;&lt;br /&gt;In this case, even if the browser is cracked and operated by a malicious user, the browser cannot invoke any programs! How secure such OS is?! In addition, people will be able to get off scanning virus on the computer everyday. Because there will be no user programs to scan!&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;.....&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;While the above is completely my personal imagination. I'm lost in wild fancies like this, however if it is true, Chrome OS can be recognized as "It's our attempt to re-think what operating systems should be." as &lt;a href="http://googleblog.blogspot.com/2009/07/introducing-google-chrome-os.html"&gt;described in the official blog&lt;/a&gt;. AFAICT, I will be a fun of Chrome OS if it is true :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1528820875793831782-7017971438278698837?l=samurai-mysql.blogspot.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/_2H5OHRiUvN-RZwdWr1XiUYUHtE/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/_2H5OHRiUvN-RZwdWr1XiUYUHtE/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/_2H5OHRiUvN-RZwdWr1XiUYUHtE/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/_2H5OHRiUvN-RZwdWr1XiUYUHtE/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/SamuraiMysql/~4/yrUA2ovSB_0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://samurai-mysql.blogspot.com/feeds/7017971438278698837/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://samurai-mysql.blogspot.com/2009/07/expectation-for-chrome-os.html#comment-form" title="14 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1528820875793831782/posts/default/7017971438278698837?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1528820875793831782/posts/default/7017971438278698837?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/SamuraiMysql/~3/yrUA2ovSB_0/expectation-for-chrome-os.html" title="Expectation For Chrome OS" /><author><name>Mikiya Okuno</name><uri>http://www.blogger.com/profile/15996977664356830358</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="http://bp2.blogger.com/_3l-X4JQ1EX4/R_1oue3iv-I/AAAAAAAAAB8/wQIXtJx_O5I/S220/exotic_animal_avatar_0235.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/_3l-X4JQ1EX4/Sl26bVzXDoI/AAAAAAAAAOA/ZDI5oDfDQlc/s72-c/chrome-os-arch.png" height="72" width="72" /><thr:total>14</thr:total><feedburner:origLink>http://samurai-mysql.blogspot.com/2009/07/expectation-for-chrome-os.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0ADQ305cCp7ImA9WxVbGUQ.&quot;"><id>tag:blogger.com,1999:blog-1528820875793831782.post-5155406612070047733</id><published>2009-04-06T14:16:00.007+09:00</published><updated>2009-04-06T14:29:32.328+09:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-04-06T14:29:32.328+09:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="mysql cluster" /><category scheme="http://www.blogger.com/atom/ns#" term="mysql" /><category scheme="http://www.blogger.com/atom/ns#" term="performance tuning" /><category scheme="http://www.blogger.com/atom/ns#" term="sharding" /><title>Distribution Awareness - Must For MySQL Cluster</title><content type="html">&lt;div style="text-align: center;"&gt;&lt;span style="font-style: italic; font-weight: bold; color: rgb(255, 102, 0);"&gt;"Performance will go down when number of data nodes increases on MySQL Cluster setup"&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Have you ever heard such rumor? This is a little bit true, but mostly false. Such performance disadvantages are caused by badly designed schema in most cases. Schema design is really important in order to bring out MySQL Cluster performance.&lt;br /&gt;&lt;br /&gt;MySQL Cluster stores and distribute rows amongst node groups according to PRIMARY KEY per row. If there is no bias on its PRIMARY KEY values, tables are distributed evenly. MySQL Cluster stores data like sharding.&lt;br /&gt;&lt;br /&gt;Lookups based on PRIMARY KEY is lightning fast on MySQL Cluster, because the SQL node knows which data node have the target row. On the other hand, if a table is accessed without PK, the SQL node doesn't know where the target rows are stored and should access all the data nodes, thus performance will go down. Look at two figures below:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_3l-X4JQ1EX4/SdmQcoRJvLI/AAAAAAAAAMw/dQBLx_JtI5g/s1600-h/Picture+14.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 229px;" src="http://3.bp.blogspot.com/_3l-X4JQ1EX4/SdmQcoRJvLI/AAAAAAAAAMw/dQBLx_JtI5g/s400/Picture+14.png" alt="" id="BLOGGER_PHOTO_ID_5321443256235048114" border="0" /&gt;&lt;/a&gt;Lookup by PK&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_3l-X4JQ1EX4/SdmQcpMEMUI/AAAAAAAAAM4/1JTIL4JNCDg/s1600-h/Picture+15.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 287px;" src="http://2.bp.blogspot.com/_3l-X4JQ1EX4/SdmQcpMEMUI/AAAAAAAAAM4/1JTIL4JNCDg/s400/Picture+15.png" alt="" id="BLOGGER_PHOTO_ID_5321443256482148674" border="0" /&gt;&lt;/a&gt;Read without PK&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;When looking-up by PK, the SQL node send a single request to the data node which has the row, and the data node responds. However, looking-up without PK specified, the data node which receives a request transfers the request to other data nodes. This increases the network traffic between the nodes significantly. (These two operations are handled completely differently within NDBAPI.)&lt;br /&gt;&lt;br /&gt;When a SQL nodes starts a transaction, it select one data node as a transaction coordinator, TC. TC is responsible for the transaction until the transaction finishes, and all communications between SQL node and data node go through TC. So, even if lookups are based on PK, additional communications between data nodes may be required when accessing second or later tables within the transaction. See the figure below.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_3l-X4JQ1EX4/SdmTAn2is-I/AAAAAAAAANI/E5HA8fLnLqU/s1600-h/Picture+17.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 270px;" src="http://4.bp.blogspot.com/_3l-X4JQ1EX4/SdmTAn2is-I/AAAAAAAAANI/E5HA8fLnLqU/s400/Picture+17.png" alt="" id="BLOGGER_PHOTO_ID_5321446073621984226" border="0" /&gt;&lt;/a&gt;These problems will be exposed when the number of data nodes is increased. Then, the network will become the bottle-neck. The former case causes many network traffics, and the latter case could be impacted when more than one tables are updated within a transaction.&lt;br /&gt;&lt;br /&gt;MySQL Cluster, as of 6.3, implements a feature which can mitigate these problems - so called Distribution Awareness. With Distribution Awareness, developers can design the schema so that they are distributed by any keys included in PK. We can expect that tables are accessed more efficiently with Distribution Awareness.&lt;br /&gt;&lt;br /&gt;We can see the effect of Distribution Awareness with benchmark tests such as DBT-2. When I tested its effect on my PC, I got the following result:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_3l-X4JQ1EX4/SdmQcc_0gbI/AAAAAAAAAMo/_3YZT35Ly4M/s1600-h/Picture+9.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 192px;" src="http://1.bp.blogspot.com/_3l-X4JQ1EX4/SdmQcc_0gbI/AAAAAAAAAMo/_3YZT35Ly4M/s400/Picture+9.png" alt="" id="BLOGGER_PHOTO_ID_5321443253209563570" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;This benchmark is measured using a single machine, and data nodes are created on the same OS instance. While this benchmark doesn't reflect the real world's performance because the cluster is configured with a single machine, it would help understanding how Distribution Awareness impact the performance.&lt;br /&gt;&lt;br /&gt;Benchmark environment overview:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;OS:Ubuntu 8.10&lt;/li&gt;&lt;li&gt;MySQL Cluster 6.3.23&lt;/li&gt;&lt;li&gt;CPU:AMD/2 cores/2.3GHz&lt;/li&gt;&lt;li&gt;Physical Memory:8GB&lt;/li&gt;&lt;li&gt;HDD:SATA 7200rpm&lt;/li&gt;&lt;li&gt;DataMemory=400MB&lt;/li&gt;&lt;li&gt;IndexMemory=64M&lt;/li&gt;&lt;li&gt;NoOfReplicas=2&lt;/li&gt;&lt;li&gt;Data Nodes:10 (5 node groups)&lt;/li&gt;&lt;li&gt;SQL Node:1&lt;/li&gt;&lt;li&gt;Benchmark Software: DBT-2/# warehouses=5&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Each result stands for the following.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;InnoDB&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;This is just for comparison. MySQL Server is configured with innodb_buffer_pool_size=4G so that all data is cached in the memory.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;NDB w/o DA&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Distribution Awareness does not take effect. Data is loaded as MyISAM firstly, then they are simply converted to NDB using ALTER TABLE tbl_name ENGINE NDB statement.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;NDB w/DA&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Distribution Awareness takes effect. Most of tables on DBT-2 test have a column refers PK on `warehouses` table, and such a column is included in PK. So, I altered the table using the following statements so that they are distributed by warehouse id.&lt;br /&gt;&lt;pre name="code" class="sql"&gt;mysql&gt; alter table customer engine ndb partition by key (c_w_id);&lt;br /&gt;mysql&gt; alter table district engine ndb partition by key (d_w_id);&lt;br /&gt;mysql&gt; alter table history add h_rowid int unsigned not null auto_increment, add primary key (h_rowid, h_w_id);&lt;br /&gt;mysql&gt; alter table history engine ndb partition by key (h_w_id);&lt;br /&gt;mysql&gt; alter table item engine ndb;&lt;br /&gt;mysql&gt; alter table new_order engine ndb partition by key (no_w_id);&lt;br /&gt;mysql&gt; alter table order_line engine ndb partition by key (ol_w_id);&lt;br /&gt;mysql&gt; alter table orders engine ndb partition by key (o_w_id);&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;You can distribute rows amongst data nodes based on a given key using PARTITION BY KEY clause. As the key should be included in the explicitly defined PK in order to use PARTITION BY KE clause, I added a new AUTO-INCREMENT column to `history` table.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;NDB w/full DA&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;While `item` table is not correlated to `warehouses` table at all, Distribution Awareness will not take effect when `item` table and other tables are accessed within a single transaction. In order to make Distribution Awareness always effective, I converted `item` table and `warehouse` table to MyISAM.&lt;br /&gt;&lt;pre name="code" class="sql"&gt;mysql&gt; alter table item engine myisam;&lt;br /&gt;mysql&gt; alter table warehouse engine myisam;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;NDB single node&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;This is just for comparison, with the following configuration.&lt;br /&gt;DataMemory=4GB&lt;br /&gt;IndexMemory=640MB&lt;br /&gt;NoOfReplica=1&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Considerations&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;You can see a big difference with or without Distribution Awareness. The benchmark was best when all transactions are fully distribution aware, which was achieved by converting an uncorrelated table to MyISAM. While `item` is a kind of read-only table, it can be deployed as a MyISAM table practically in production.&lt;br /&gt;&lt;br /&gt;MySQL Cluster's shared nothing architecture is very similar to sharding. So, when tables are defined like sharding and accessed like sharding, MySQL Cluster demonstrates the best performance. For example, sharded application accesses its underlying database using user-id by determining which database node has required data. In that case, user-id is always required when accessing a database. Similarly, in order to make SQL query distribution aware on MySQL Cluster, condition like "AND userid = 12345" should be always specified.&lt;br /&gt;&lt;br /&gt;If Distribution Awareness does not take effect, the network will become a bottle-neck when number of data nodes increases. This causes performance degradation when data nodes increase. On the other hand, if Distribution Awareness takes effect properly, the network may rarely be the bottle-neck in this way. So, the performance is likely to scale as data node increases. So, whether tables and queries can be Distribution Aware or not is a very important point to determine if you will use MySQL Cluster or not in your application.&lt;br /&gt;&lt;br /&gt;You can see if Distribution Awareness is effective or not using EXPLAIN PARTITIONS statement. If only a single partition is listed, the query is distribution aware!&lt;br /&gt;&lt;pre name="code" class="sql"&gt;&lt;br /&gt;mysql&gt; EXPLAIN PARTITIONS SELECT no_o_id FROM new_order WHERE no_w_id = 2  AND  no_d_id = 7\G&lt;br /&gt;*************************** 1. row ***************************&lt;br /&gt;           id: 1&lt;br /&gt;  select_type: SIMPLE&lt;br /&gt;        table: new_order&lt;br /&gt;   partitions: p5  &amp;lt;--- OK if only a single partition is listed like this&lt;br /&gt;         type: ref&lt;br /&gt;possible_keys: PRIMARY&lt;br /&gt;          key: PRIMARY&lt;br /&gt;      key_len: 8&lt;br /&gt;          ref: const,const&lt;br /&gt;         rows: 10&lt;br /&gt;        Extra: &lt;br /&gt;1 row in set (0.00 sec)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The benchmark is done on a single machine. So, communications between nodes are faster than the real case because they do not go through a network. If the cluster is configured using multiple machines, difference between with or without Distribution Awareness will be more significant.&lt;br /&gt;&lt;br /&gt;You may think like "MySQL Cluster is messy because I should take care of its distribution... Sharding is easier than it for me, isn't it?" However, you have to implement your own logic to separate load between databases when sharding, on the other hand, on MySQL Cluster, you can achieve the same thing by tuning your tables and queries a little bit. So the trade off is "Implementing own sharding logic" vs "Tuning tables and queries". Which is suitable for you? ;)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Conclusion&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;As the number of data nodes increases...&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Performance goes down without Distribution Awareness.&lt;/li&gt;&lt;li&gt;Performance scales with Distribution Awareness.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;In order to make use of Distribution Awareness, you have to tune both tables and queries:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;CREATE TABLE ... ENGINE NDB PARTITION BY KEY (userid);&lt;/li&gt;&lt;li&gt;SELECT ... WHERE ... and userid = 12345;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;You can determine if Distribution Awareness is take effect using...:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;EXPLAIN PARTITIONS SELECT ...\G&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;You application may scale if it is distribution aware. MySQL Cluster's Distribution Awareness comes in handy because you don't have to implement additional logics on your application. So, please consider using MySQL Cluster if you are going to shard your application.&lt;br /&gt;&lt;br /&gt;Enjoy!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1528820875793831782-5155406612070047733?l=samurai-mysql.blogspot.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/D3b_pe_-BGKluC9IC0uRi40nkKY/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/D3b_pe_-BGKluC9IC0uRi40nkKY/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/D3b_pe_-BGKluC9IC0uRi40nkKY/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/D3b_pe_-BGKluC9IC0uRi40nkKY/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/SamuraiMysql/~4/YrZW86qj5bw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://samurai-mysql.blogspot.com/feeds/5155406612070047733/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://samurai-mysql.blogspot.com/2009/04/distribution-awareness-must-on-mysql.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1528820875793831782/posts/default/5155406612070047733?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1528820875793831782/posts/default/5155406612070047733?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/SamuraiMysql/~3/YrZW86qj5bw/distribution-awareness-must-on-mysql.html" title="Distribution Awareness - Must For MySQL Cluster" /><author><name>Mikiya Okuno</name><uri>http://www.blogger.com/profile/15996977664356830358</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="http://bp2.blogger.com/_3l-X4JQ1EX4/R_1oue3iv-I/AAAAAAAAAB8/wQIXtJx_O5I/S220/exotic_animal_avatar_0235.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_3l-X4JQ1EX4/SdmQcoRJvLI/AAAAAAAAAMw/dQBLx_JtI5g/s72-c/Picture+14.png" height="72" width="72" /><thr:total>1</thr:total><feedburner:origLink>http://samurai-mysql.blogspot.com/2009/04/distribution-awareness-must-on-mysql.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkYAQHg4fCp7ImA9WxVUFU0.&quot;"><id>tag:blogger.com,1999:blog-1528820875793831782.post-5807854002268188738</id><published>2009-03-20T07:38:00.003+09:00</published><updated>2009-03-20T08:09:01.634+09:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-20T08:09:01.634+09:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="oltp" /><category scheme="http://www.blogger.com/atom/ns#" term="mysql" /><category scheme="http://www.blogger.com/atom/ns#" term="howto" /><category scheme="http://www.blogger.com/atom/ns#" term="dbt2" /><category scheme="http://www.blogger.com/atom/ns#" term="benchmark" /><title>Settingup DBT-2</title><content type="html">DBT-2 is a TPC-C like OLTP benchmark, and very popular amongst many MySQL users. It is used by MySQL QA team to test the stability and performance before release. However, steps to setup DBT-2 is a little bit messy, and its README files include some dummy information. So I introduce you these steps below:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;1. Download it!&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;You can download the source code from here: &lt;a href="http://osdldbt.sourceforge.net/"&gt;http://osdldbt.sourceforge.net/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;2. Required packages&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The following perl packages are required to build DBT-2. Unfortunately, configure script doesn't complain even if they are missing. Install them using, e.g. CPAN.&lt;br /&gt;&lt;br /&gt;shell&gt; sudo cpan Statistics::Descriptive&lt;br /&gt;shell&gt; sudo cpan Test::Parser&lt;br /&gt;shell&gt; sudo cpan Test::Reporter&lt;br /&gt;&lt;br /&gt;If you want to make a graph from the output, you have to install gnuplot in advance. e.g. Ubuntu users can install it like below:&lt;br /&gt;&lt;br /&gt;shell&gt; sudo apt-get install gnuplot&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;3. Compile it&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;You can build the software in very standard way, configure and make. For MySQL users, --with-mysql option is mandatory. If you want to specify install path, then use --prefix option.&lt;br /&gt;&lt;br /&gt;shell&gt; ./configure --with-mysql=/usr/local/mysql [options] &amp;amp;&amp;amp; make&lt;br /&gt;shell&gt; sudo make install&lt;br /&gt;&lt;br /&gt;Ubuntu users may need to install build-essentials package in advance.&lt;br /&gt;&lt;br /&gt;shell&gt; sudo apt-get install build-essentials&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;4. Generate and load data&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;shell&gt; datagen -w 10 -d /var/tmp/dbt2 --mysql&lt;br /&gt;&lt;br /&gt;datagen is a command to generate data under the given directory. -w option specifies the number of warehouse, more is larger data. If you want to test CPU bound test, use small value.&lt;br /&gt;&lt;br /&gt;shell&gt; scripts/mysql/build_db.sh \&lt;br /&gt;-w 3 -d dbt2 -f /var/tmp/dbt2 -s /tmp/mysql.sock \&lt;br /&gt;-h localhost -u user -p password -e INNODB&lt;br /&gt;&lt;br /&gt;Then, you can load the data generated by datagen program using build_db.sh, located under scripts/mysql directory under the source code directory. The following indicates the size of data with 10 warehouses for example.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;mysql&gt; select table_name,sum(data_length) from tables where table_schema='dbt2' group by table_name with rollup;&lt;br /&gt;+------------+------------------+&lt;br /&gt;| table_name | sum(data_length) |&lt;br /&gt;+------------+------------------+&lt;br /&gt;| customer   |        200081408 |&lt;br /&gt;| district   |            16384 |&lt;br /&gt;| history    |         25755648 |&lt;br /&gt;| item       |         11026432 |&lt;br /&gt;| new_order  |          4702208 |&lt;br /&gt;| orders     |         22626304 |&lt;br /&gt;| order_line |        314572800 |&lt;br /&gt;| stock      |        381648896 |&lt;br /&gt;| warehouse  |            16384 |&lt;br /&gt;| NULL       |        960446464 |&lt;br /&gt;+------------+------------------+&lt;br /&gt;10 rows in set (0.54 sec)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;5. Run benchmark&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;shell&gt; export USE_PGPOOL=0&lt;br /&gt;shell&gt; export LD_LIBRARY_PATH=/usr/local/mysql/lib&lt;br /&gt;shell&gt; scripts/run_workload.sh -c 16 -d 600 -n -w 10 -s 10 -u user -x password&lt;br /&gt;&lt;br /&gt;The benchmark is done using run_workload.sh script. This script accepts the following option.&lt;br /&gt;&lt;br /&gt;-c ... thread concurrency&lt;br /&gt;-d ... duration&lt;br /&gt;-n ... no thinking time. do not miss this option if you want to see the database performance ;)&lt;br /&gt;-w ... number of warehouse&lt;br /&gt;-s ... delay before starting each new thread in millisecond&lt;br /&gt;- u ... MySQL user name&lt;br /&gt;-x ... MySQL password&lt;br /&gt;-l ... MySQL TCP/IP port number&lt;br /&gt;-H ... MySQL server hostname&lt;br /&gt;&lt;br /&gt;Then, the benchmark runs during the period specified by -d option, and generates data under ./output directory. You may see the error like following. This error indicates that no data is written in the ./output directory. It is not a Test::Parser::Dbt2.pm problem at all!&lt;br /&gt;&lt;br /&gt;Can't use an undefined value as an ARRAY reference at /usr/local/share/perl/5.10.0/Test/Parser/Dbt2.pm line 521.&lt;br /&gt;&lt;br /&gt;One reason why this error appears is that libmysqlclient.so wasn't loaded correctly. Do not forget specifying the right PATH to the library before the benchmark.&lt;br /&gt;&lt;br /&gt;shell&gt; export LD_LIBRARY_PATH=/usr/local/mysql/lib&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Enjoy!!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1528820875793831782-5807854002268188738?l=samurai-mysql.blogspot.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/Jyu2LYd1XG8fmIp70aTiN9DPbKM/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/Jyu2LYd1XG8fmIp70aTiN9DPbKM/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/Jyu2LYd1XG8fmIp70aTiN9DPbKM/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/Jyu2LYd1XG8fmIp70aTiN9DPbKM/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/SamuraiMysql/~4/XrneF4RLO9E" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://samurai-mysql.blogspot.com/feeds/5807854002268188738/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://samurai-mysql.blogspot.com/2009/03/settingup-dbt-2.html#comment-form" title="8 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1528820875793831782/posts/default/5807854002268188738?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1528820875793831782/posts/default/5807854002268188738?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/SamuraiMysql/~3/XrneF4RLO9E/settingup-dbt-2.html" title="Settingup DBT-2" /><author><name>Mikiya Okuno</name><uri>http://www.blogger.com/profile/15996977664356830358</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="http://bp2.blogger.com/_3l-X4JQ1EX4/R_1oue3iv-I/AAAAAAAAAB8/wQIXtJx_O5I/S220/exotic_animal_avatar_0235.jpg" /></author><thr:total>8</thr:total><feedburner:origLink>http://samurai-mysql.blogspot.com/2009/03/settingup-dbt-2.html</feedburner:origLink></entry></feed>

