<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:blogger='http://schemas.google.com/blogger/2008' xmlns:georss='http://www.georss.org/georss' xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6415786925319620734</id><updated>2026-03-04T15:08:50.361-05:00</updated><category term="Database scalability"/><category term="Database"/><category term="Scale out"/><category term="MySQL"/><category term="Scalability"/><category term="Sharding"/><category term="big data"/><category term="cloud"/><category term="OLTP"/><category term="ScaleBase"/><category term="Parallelism"/><category term="Database Grid"/><category term="Oracle"/><category term="analytics"/><category term="Cluster"/><category term="Data warehouse"/><category term="NoSQL"/><category term="AWS"/><category term="data distribution"/><category term="Amazon"/><category term="Columnar Storage"/><category term="Distributed database"/><category term="ARM"/><category term="Abstraction Layer"/><category term="EC2"/><category term="Latency"/><category term="MongoDB"/><category term="Partitioning"/><category term="PostgreSQL"/><category term="RDS"/><category term="Real Application Cluster"/><category term="Replication"/><category term="Single Master Replication"/><category term="multi master replication"/><category term="virtualization"/><category term="ACID"/><category term="ALL_ROWS"/><category term="CAP"/><category term="Cassandra"/><category term="Concurrency"/><category term="FIRST_ROWS"/><category term="Facebook"/><category term="HBase"/><category term="Hadoop"/><category term="Memcached"/><category term="Migration"/><category term="Multi-tenant"/><category term="Performance"/><category term="Pinterest"/><category term="Porting"/><category term="RAC"/><category term="SLA"/><category term="Security"/><category term="Throughput"/><category term="elasticity"/><category term="map reduce"/><title type='text'>The Database Scalability Blog - Doron Levari</title><subtitle type='html'>I&#39;ve lived around databases all my life, 21st century is challenging for them: big data, throughput, complexity, virtualization, global distribution - it&#39;s all scalability.&#xa;I&#39;m the founder and CTO of &lt;a href=&quot;http://www.scalebase.com&quot;&gt;ScaleBase&lt;/a&gt;, solving this problem is a workoholic&#39;s heaven, so I&#39;m having great time!&#xa;&lt;br&gt;&#xa;My agenda is to stay technical, no marketing and sales BS, give my summarized set of views and opinions to urgent topics, events and latest news in database scalability.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default?start-index=26&amp;max-results=25'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>30</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-944869758009673365</id><published>2015-05-18T16:45:00.000-04:00</published><updated>2015-05-18T16:45:05.740-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Amazon"/><category scheme="http://www.blogger.com/atom/ns#" term="AWS"/><category scheme="http://www.blogger.com/atom/ns#" term="cloud"/><category scheme="http://www.blogger.com/atom/ns#" term="MongoDB"/><category scheme="http://www.blogger.com/atom/ns#" term="Multi-tenant"/><category scheme="http://www.blogger.com/atom/ns#" term="Security"/><title type='text'>Attending MongoDB World in NYC?</title><content type='html'>Hi folks,&lt;br /&gt;
&lt;br /&gt;
I&#39;ve been quiet for a long time since I last wrote.&lt;br /&gt;
The last several months were exciting times for me in many fronts.&lt;br /&gt;
I took some time to stay away, get some breath, retune and refocus.&lt;br /&gt;
&lt;br /&gt;
I have taken a super-exciting opportunity with &lt;b&gt;Cisco&lt;/b&gt;, joined a startup within the big corp, running separately, running fast, located in Cambridge MA, at the heart of where all innovation is happening.&lt;br /&gt;
&lt;br /&gt;
We&#39;re developing a new paradigm in the management of security appliances, from the cloud with a lot of analytics, wisdom and science around it.&lt;br /&gt;
&lt;br /&gt;
I get to do a lot of creating, architecting and engineering around these same things I love, data and databases, and with an awesome group of super-talented people around me. Living the dream! :)&lt;br /&gt;
&lt;br /&gt;
If you are in New York on the beginning of June, come over to MongoDB World conference!&lt;br /&gt;
And while you&#39;re there, make sure you come and hear what I have to say in my session &quot;&lt;b&gt;Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS Application&lt;/b&gt;&quot;&lt;br /&gt;
&lt;br /&gt;
Monday, June 1, at 11:40.&lt;br /&gt;
&lt;br /&gt;
Use the code &lt;b&gt;SpeakerFriend&lt;/b&gt; for a 15% discount at &lt;a href=&quot;http://www.mongodbworld.com/&quot;&gt;http://www.mongodbworld.com&lt;/a&gt;!!&lt;br /&gt;
&lt;br /&gt;
See you at #MongoDBWorld!!&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;http://mongodbworld.com/mongodbworldog.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://mongodbworld.com/mongodbworldog.png&quot; height=&quot;320&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/944869758009673365/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2015/05/attending-mongodb-world-in-nyc.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/944869758009673365'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/944869758009673365'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2015/05/attending-mongodb-world-in-nyc.html' title='Attending MongoDB World in NYC?'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-3311858259945120322</id><published>2014-09-12T12:04:00.000-04:00</published><updated>2014-09-12T12:04:22.881-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="ACID"/><category scheme="http://www.blogger.com/atom/ns#" term="CAP"/><category scheme="http://www.blogger.com/atom/ns#" term="Cassandra"/><category scheme="http://www.blogger.com/atom/ns#" term="HBase"/><category scheme="http://www.blogger.com/atom/ns#" term="NoSQL"/><title type='text'>Differences between NoSQL databases</title><content type='html'>Just sharing an answer I gave today to a question in Quora: &lt;a href=&quot;http://www.quora.com/Whats-the-difference-between-the-different-NoSQL-databases&quot;&gt;http://www.quora.com/Whats-the-difference-between-the-different-NoSQL-databases&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
I think the question is relevant, I think other answers were very relevant and this was my humble addition to the thread:&lt;br /&gt;
&lt;br /&gt;
I think answers above very very good, in my POV, the right NoSQL database for you is the one best fit your requirements in:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Data representation: as said above, key-value, document, graph, etc.&lt;/li&gt;
&lt;li&gt;Data usage pattern: OLTP (high concurrency throughput, many queries and updates) vs. Analytics? (low concurrency, few big queries, no updates)&lt;/li&gt;
&lt;li&gt;Data availability and consistency: this is the main topic I wish to add&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
While all relational databases provide the virtues of ACID to keep transactions and data Atomicity, Consistency, Isolation, Durability - few NoSQLs provide full ACID, most do not provide full ACID but rather provide interesting tradeoffs around CAP theorem (&lt;a href=&quot;http://en.wikipedia.org/wiki/CAP_theorem&quot;&gt;http://en.wikipedia.org/wiki/CAP_theorem&lt;/a&gt;). Since you can&#39;t have all 3, different databases give different combinations, for example 2 NoSQLs from Apache, HBase provides CP and Cassandra provides AP (&lt;a href=&quot;http://wiki.apache.org/cassandra/ArchitectureOverview&quot;&gt;http://wiki.apache.org/cassandra/ArchitectureOverview&lt;/a&gt;).&lt;br /&gt;
&lt;br /&gt;
Hope that helped.&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/3311858259945120322/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2014/09/differences-between-nosql-databases.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/3311858259945120322'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/3311858259945120322'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2014/09/differences-between-nosql-databases.html' title='Differences between NoSQL databases'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-4197658846655957247</id><published>2014-05-20T14:08:00.001-04:00</published><updated>2014-05-20T14:08:20.584-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="AWS"/><category scheme="http://www.blogger.com/atom/ns#" term="EC2"/><category scheme="http://www.blogger.com/atom/ns#" term="RDS"/><category scheme="http://www.blogger.com/atom/ns#" term="SLA"/><title type='text'>Kudos to RDS&#39;s SLA, proving the point of the public cloud</title><content type='html'>If you go and spin a new RDS server, you&#39;ll see this new page added before the wizard:&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKRkkrogJOTLprZaAq50VSTylyFM1f43M2axwPZQtx96HNg9MiE8HnY_oaxx1hb20fyyqS_7UElHCxp_e4YAB9X8PPOp17uzv1hx9UrrPbnAmpKD-ntrx1QVnx0Q9Mz7syK9YAz3IoyoPd/s1600/2014-05-20_1005.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKRkkrogJOTLprZaAq50VSTylyFM1f43M2axwPZQtx96HNg9MiE8HnY_oaxx1hb20fyyqS_7UElHCxp_e4YAB9X8PPOp17uzv1hx9UrrPbnAmpKD-ntrx1QVnx0Q9Mz7syK9YAz3IoyoPd/s1600/2014-05-20_1005.png&quot; height=&quot;212&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
My perception over the last months is that AWS improved RDS availability, multi-AZ, and they are pushing it more aggressively.&lt;br /&gt;
&lt;br /&gt;
An availability&amp;nbsp;factor&amp;nbsp;of &quot;three and a half nines&quot; (~8hr/year of downtime)&amp;nbsp;is very very good, it usually has a very high price tag attached to it (hardware, software &amp;amp; labor) and usually is a dream for the smaller-medium IT organizations.&lt;br /&gt;
&lt;br /&gt;
Enabling it on a utility low price, 25%-33% higher than the corresponding EC2 machine, RDS makes a real bargain for everyone, making it harder to stay out of public cloud.&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/4197658846655957247/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2014/05/kudos-to-rdss-sla-proving-point-of.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/4197658846655957247'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/4197658846655957247'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2014/05/kudos-to-rdss-sla-proving-point-of.html' title='Kudos to RDS&#39;s SLA, proving the point of the public cloud'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKRkkrogJOTLprZaAq50VSTylyFM1f43M2axwPZQtx96HNg9MiE8HnY_oaxx1hb20fyyqS_7UElHCxp_e4YAB9X8PPOp17uzv1hx9UrrPbnAmpKD-ntrx1QVnx0Q9Mz7syK9YAz3IoyoPd/s72-c/2014-05-20_1005.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-3840495586867498662</id><published>2014-05-03T21:18:00.001-04:00</published><updated>2014-05-04T12:00:57.159-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="MongoDB"/><category scheme="http://www.blogger.com/atom/ns#" term="MySQL"/><category scheme="http://www.blogger.com/atom/ns#" term="NoSQL"/><category scheme="http://www.blogger.com/atom/ns#" term="Oracle"/><category scheme="http://www.blogger.com/atom/ns#" term="PostgreSQL"/><title type='text'>Eventual consistency of NoSQL marketing</title><content type='html'>Yesterday I learnt an important lesson about an important difference between NoSQL and MySQL, at least when it comes to the marketing and hype.&lt;br /&gt;
&lt;br /&gt;
I saw a tweet from around marketing of one of NoSQL leaders:&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpN7Kj6AMUYMqlDeL5R4xPvu-cmC5f2HvqeWrvBwa-r95ZLfiKwL574GLE-iJsFN19oMmq0pldYmc-NHn81pFseyKnvmYgDM7e4lLXcRyHUOM2VxINaZleU5P3MaUyuYlLfncV3jkWsA0o/s1600/Tweet.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpN7Kj6AMUYMqlDeL5R4xPvu-cmC5f2HvqeWrvBwa-r95ZLfiKwL574GLE-iJsFN19oMmq0pldYmc-NHn81pFseyKnvmYgDM7e4lLXcRyHUOM2VxINaZleU5P3MaUyuYlLfncV3jkWsA0o/s1600/Tweet.png&quot; height=&quot;190&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Most people apparently would just conclude from the tweet&#39;s text, however I actually clicked the link, and couldn&#39;t believe eyes:&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2hwdoiBz3FACiKdEInlt17Wto1cBCZIYz7bJp7emcgfWjaWZniin8cBCOYvwbtM7wusAAl6JgiOc3G4pi9fCyrEEThqUsNHOHE7G9otIRGrHsIZlxq5XcejpLFEMH3l2vyMZHkh5yq2ue/s1600/DBEngines.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2hwdoiBz3FACiKdEInlt17Wto1cBCZIYz7bJp7emcgfWjaWZniin8cBCOYvwbtM7wusAAl6JgiOc3G4pi9fCyrEEThqUsNHOHE7G9otIRGrHsIZlxq5XcejpLFEMH3l2vyMZHkh5yq2ue/s1600/DBEngines.png&quot; height=&quot;215&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
I guess that in NoSQL, when it comes to the integrity of data as well as hype - it is eventually consistent...&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/3840495586867498662/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2014/05/eventual-consistency-of-nosql-marketing.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/3840495586867498662'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/3840495586867498662'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2014/05/eventual-consistency-of-nosql-marketing.html' title='Eventual consistency of NoSQL marketing'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpN7Kj6AMUYMqlDeL5R4xPvu-cmC5f2HvqeWrvBwa-r95ZLfiKwL574GLE-iJsFN19oMmq0pldYmc-NHn81pFseyKnvmYgDM7e4lLXcRyHUOM2VxINaZleU5P3MaUyuYlLfncV3jkWsA0o/s72-c/Tweet.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-4055981035350617493</id><published>2014-05-01T00:22:00.000-04:00</published><updated>2014-05-01T00:22:25.830-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="MySQL"/><category scheme="http://www.blogger.com/atom/ns#" term="NoSQL"/><title type='text'>Explaining the case for MySQL</title><content type='html'>My faithful readers, please spare 10 mins of your time, and read Baron&#39;s excellent post: &lt;a href=&quot;https://vividcortex.com/blog/2014/04/30/why-mysql&quot;&gt;https://vividcortex.com/blog/2014/04/30/why-mysql&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
Nuff said.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Since I can&#39;t really shut up, and only if you do like my (humble) take on this, I could say in short:&lt;br /&gt;
&lt;br /&gt;
Every technology/platform/framework I choose, will end up surprising me, limiting me for things can be done easily, and throw many painful challenges at me if and when I need to do things that are closer to the platform&#39;s &quot;edges&quot;. This is true for everything including Rails, JEE, Hibernate, MongoDB, MySQL.&lt;br /&gt;
&lt;br /&gt;
I&#39;ve learned that the more mature, generically-capable, transparent and ecosystem-rich a solution is - the less painful surprises for me in the worst timings - and more successful I am in my job.&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/4055981035350617493/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2014/05/explaining-case-for-mysql.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/4055981035350617493'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/4055981035350617493'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2014/05/explaining-case-for-mysql.html' title='Explaining the case for MySQL'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-6380675543418054517</id><published>2014-04-09T11:02:00.002-04:00</published><updated>2014-04-09T11:02:33.887-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cloud"/><category scheme="http://www.blogger.com/atom/ns#" term="Migration"/><category scheme="http://www.blogger.com/atom/ns#" term="MySQL"/><category scheme="http://www.blogger.com/atom/ns#" term="Oracle"/><category scheme="http://www.blogger.com/atom/ns#" term="Porting"/><title type='text'>Porting from Oracle to MySQL</title><content type='html'>A potential customer asked my about porting her application from Oracle Database to MySQL.&lt;br /&gt;
&lt;br /&gt;
I always try to start with the &quot;why&quot; (a dear friend bought me this book, recommended: http://www.amazon.com/Start-Why-Leaders-Inspire-Everyone/dp/1591846447).&lt;br /&gt;
&lt;br /&gt;
She said &quot;cloud!&quot;. I said &quot;OK!&quot;.&lt;br /&gt;
&lt;br /&gt;
I conducted a short research, found many things in many places all over the place, brought them to a nice email I sent her back and then thought I&#39;ll post it here and make it public as it might be useful for us all. If you feel that I missed something, add comments, send feedback. &lt;br /&gt;
&lt;br /&gt;
These are the leading tools to do the actual migration of the data structure, data export/import, sprocs, triggers, etc.:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;MySQL Workbench has a migration feature: http://www.mysql.com/products/workbench/migrate/&lt;/li&gt;
&lt;li&gt;MySQLYog can be used to migrate: http://tkurek.blogspot.com/2013/04/migrate-oracle-to-mysql.html &amp;nbsp;(already in the conversation in the second comment there)&lt;/li&gt;
&lt;li&gt;Navicat can be used to migrate: http://www.navicat.com/products/navicat-for-mysql&lt;/li&gt;
&lt;li&gt;Tungsten support Oracle-to-MySQL replication: http://www.continuent.com/downloads/software&lt;/li&gt;
&lt;li&gt;Focused data migrators:&lt;/li&gt;
&lt;ol&gt;
&lt;li&gt;http://www.ispirer.com/products/oracle-to-mysql-migration&lt;/li&gt;
&lt;li&gt;https://www.youtube.com/watch?v=IW3vKHWJljY&lt;/li&gt;
&lt;li&gt;http://www.slideshare.net/Tess98/oracle-to-mysql-migration-presentation&lt;/li&gt;
&lt;li&gt;http://www.dbload.com/&lt;/li&gt;
&lt;li&gt;http://dbconvert.com/convert-oracle-to-mysql-pro.php&lt;/li&gt;
&lt;li&gt;http://www.spectralcore.com/omegasync/&lt;/li&gt;
&lt;/ol&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;br /&gt;
The way I see it, migrating the data is 15% of a database porting project. Efforts are in (partial list):&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Porting drivers and driver behavior in the app code&lt;/li&gt;
&lt;li&gt;Porting SQL commands all around the app code&lt;/li&gt;
&lt;ol&gt;
&lt;li&gt;Conversion of non-standard SQL flavor&lt;/li&gt;
&lt;li&gt;Work-around restrictions and non-supported commands&lt;/li&gt;
&lt;/ol&gt;
&lt;li&gt;Ecosystem, monitoring, tuning, tools, scripts, hardware best practices, ops skills, dev skills&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
Way before the migration of the data on d-day.&lt;br /&gt;
&lt;br /&gt;
A lot of services, some tools. Services-wise I see around:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Pythian: http://www.percona.com/live/mysql-conference-2012/sessions/oracle-mysql-migration&lt;/li&gt;
&lt;li&gt;Baron (Percona): http://www.xaprb.com/blog/2009/03/13/50-things-to-know-before-migrating-oracle-to-mysql/&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
I bet the big SIs (Accenture et al) are strong in this game, as those would be the default go-to service provider for the Oracle shops.&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/6380675543418054517/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2014/04/porting-from-oracle-to-mysql.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/6380675543418054517'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/6380675543418054517'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2014/04/porting-from-oracle-to-mysql.html' title='Porting from Oracle to MySQL'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-1212720029702884797</id><published>2014-03-13T16:01:00.001-04:00</published><updated>2014-03-13T16:01:27.456-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Amazon"/><category scheme="http://www.blogger.com/atom/ns#" term="cloud"/><category scheme="http://www.blogger.com/atom/ns#" term="data distribution"/><category scheme="http://www.blogger.com/atom/ns#" term="Database"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Distributed database"/><category scheme="http://www.blogger.com/atom/ns#" term="EC2"/><category scheme="http://www.blogger.com/atom/ns#" term="elasticity"/><category scheme="http://www.blogger.com/atom/ns#" term="Replication"/><category scheme="http://www.blogger.com/atom/ns#" term="Scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Scale out"/><category scheme="http://www.blogger.com/atom/ns#" term="ScaleBase"/><title type='text'>How Elasticity is dictated by Data Model?</title><content type='html'>I talk a lot about &quot;Elasticity&quot; and &quot;Data Model&quot;, a prospect asked me today &quot;what makes you think they are related?&quot;.&lt;br /&gt;
&lt;br /&gt;
Not only are they related, the relation between them holds big part of the substance of ScaleBase, the technology I&#39;ve been working on for the last 5 years...&lt;br /&gt;
&lt;br /&gt;
Elasticity is the ability to grow or shrink in accordance to the demand.&lt;br /&gt;
The cloud makes it very easy to spin more machines, on demand and kill them a day after, pay by the hour, only for real usage. This alone offers fantastic elasticity. Remember that AWS&#39;s EC2 stands for &quot;Elastic Compute Cloud&quot;.&lt;br /&gt;
&lt;br /&gt;
Volatile/transient/stateless servers are easier to make elastic, AKA application servers, web servers. Just spin another same-image-server behind a round-robin load balancer would solve 80% of the problem.&lt;br /&gt;
&lt;br /&gt;
Data is harder to &quot;elastify&quot;.&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Data can be &lt;b&gt;replicated &lt;/b&gt;across multiple identical servers behind the same round-robin load balancer, but data-replication multiplies data size (bad ROI) and cannot scale writes and updates to the data.&amp;nbsp;&lt;/li&gt;
&lt;li&gt;The only way to scale data is to have it &lt;b&gt;distributed &lt;/b&gt;across multiple non-identical servers.&amp;nbsp;&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
New challenges:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;How would all data consumers (apps, tools) know where the data they look for resides?&amp;nbsp;&lt;/li&gt;
&lt;li&gt;If all for every access they need data from several (or all) the servers, load will end-up multiplied rather than distributed. = no scalability.&lt;/li&gt;
&lt;li&gt;OK not all or most, but the minority of accesses do need data from several (or all) the servers. How this data can be found on all quickly and aggregated?&amp;nbsp;&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
Challenge 1 is the simplest, just have an index expressing &quot;I want to distribute my data by profile_id&quot; and &quot;put profiles 1-1000 on db1 and 1001-1500 on db2&quot;, and then force all data consumers check this index before every data access.&lt;br /&gt;
&lt;br /&gt;
Challenges 2 and 3 are where data model kicks in. For NoSQLs, data model is a document, complete and self-contained, challenges 2 and 3 do not exist.&lt;br /&gt;
For SQL databases, a relational data model, takes challenges 2 and 3 to the extreme.&lt;br /&gt;
&lt;br /&gt;
A carefully crafted &lt;b&gt;data distribution policy&lt;/b&gt; and the ability to do &lt;b&gt;real-time data aggregation&lt;/b&gt; are crucial for a successful scaling relational database.&lt;br /&gt;
&lt;br /&gt;
In our profiles&amp;nbsp;distribution example, identifying that &quot;a profile&quot; is actually a chunk of related data from 100 tables in a complex, multi-level, deep hierarchy - is a hard task to do.&lt;br /&gt;
ScaleBase Analysis Genie simplifies the authoring of a data distribution policy that makes sure that related data is stored together on the same server, solving challenge 2.&lt;br /&gt;
&lt;br /&gt;
ScaleBase Controller employs multi-threaded massive parallel execution and advanced result aggregation, supporting all SQL aspects including support for GROUP BY, ORDER BY, HAVING, UNION, JOIN, SUBSELECT to solve challenge 3.&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
See here for more info:&amp;nbsp;&lt;a href=&quot;http://www.scalebase.com/products/product-architecture&quot;&gt;http://www.scalebase.com/products/product-architecture&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/1212720029702884797/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2014/03/how-elasticity-is-dictated-by-data-model.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/1212720029702884797'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/1212720029702884797'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2014/03/how-elasticity-is-dictated-by-data-model.html' title='How Elasticity is dictated by Data Model?'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total><georss:featurename>Newton Centre, MA 02459, USA</georss:featurename><georss:point>42.3028239 -71.1864397</georss:point><georss:box>42.2088859 -71.347801199999992 42.3967619 -71.0250782</georss:box></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-8184916403577093084</id><published>2013-11-14T10:48:00.001-05:00</published><updated>2013-11-14T10:48:52.899-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Amazon"/><category scheme="http://www.blogger.com/atom/ns#" term="AWS"/><category scheme="http://www.blogger.com/atom/ns#" term="MySQL"/><category scheme="http://www.blogger.com/atom/ns#" term="PostgreSQL"/><category scheme="http://www.blogger.com/atom/ns#" term="RDS"/><title type='text'>Will AWS plans for PostgreSQL RDS help it finally pick up?</title><content type='html'>&lt;div&gt;
&lt;div&gt;
&quot;Amazon to add Postgres to its most-favored database list&quot; says GigaOM:&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;a href=&quot;http://gigaom.com/2013/11/12/amazon-to-add-postgres-to-its-most-favored-database-list/&quot;&gt;http://gigaom.com/2013/11/12/amazon-to-add-postgres-to-its-most-favored-database-list/&lt;/a&gt;&lt;div&gt;
&lt;blockquote class=&quot;tr_bq&quot;&gt;
&quot;To many this is no-brainer. Amazon wants to support the databases that its developer audiences want to use. This is simply a &amp;nbsp;case of Amazon responding to user demand and oh-by-the-way making its cloud infrastructure more attractive to a specific target audience. Some say Postgres has gained traction since Oracle’s acquisition of MySQL via its Sun buyout a few years back.&quot;&lt;/blockquote&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Some people I know said &quot;yea, the writing was on the wall...&quot;. Well, was it?? Really?&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
AWS finally got the time to &quot;plan&quot; for supporting Postgres now? After supporting MySQL, Oracle and SQL Servers for almost 3 years?!&amp;nbsp;Writing was on the wall? Where can I find a wall this old?&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
PostgreSQL has not picked up.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
This is why it is a far 4th on Amazon&#39;s list. The writer of the text above also makes clear efforts not to pick a side here... &quot;&lt;u&gt;to many&lt;/u&gt; this is a no-brainer&quot; or &quot;&lt;u&gt;some say&lt;/u&gt; Postgres has gained traction&quot;.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
It has been around for ages, thru many &quot;oh! it&#39;s now happening!&quot; events, such as the acquisition by of MySQL by Sun, then by Oracle...&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Technically, PostgreSQL&#39;s few superior capabilities, especially around schema online modifications (which gets more important these days!), probably could not change its fate, and it&#39;s still being held back by too many inferior capabilities, around performance, robustness, ecosystem...&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
So - with plans for RDS, will Postgres now pick up?&amp;nbsp;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Feel free to Share your thoughts...&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/8184916403577093084/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2013/11/will-aws-plans-for-postgresql-rds-help.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/8184916403577093084'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/8184916403577093084'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2013/11/will-aws-plans-for-postgresql-rds-help.html' title='Will AWS plans for PostgreSQL RDS help it finally pick up?'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-9054575007256977182</id><published>2013-04-24T16:34:00.000-04:00</published><updated>2013-04-24T16:34:08.008-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="analytics"/><category scheme="http://www.blogger.com/atom/ns#" term="Concurrency"/><category scheme="http://www.blogger.com/atom/ns#" term="data distribution"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Distributed database"/><category scheme="http://www.blogger.com/atom/ns#" term="Parallelism"/><category scheme="http://www.blogger.com/atom/ns#" term="Scale out"/><category scheme="http://www.blogger.com/atom/ns#" term="Sharding"/><title type='text'> Concurrency is not parallelism</title><content type='html'>No so new, but still good piece of reading:&amp;nbsp;&lt;a href=&quot;http://blog.golang.org/2013/01/concurrency-is-not-parallelism.html&quot;&gt;http://blog.golang.org/2013/01/concurrency-is-not-parallelism.html&lt;/a&gt;&lt;br /&gt;
&lt;blockquote class=&quot;tr_bq&quot;&gt;
&quot;Concurrency is the composition of &lt;u&gt;independently&lt;/u&gt; executing processes, while Parallelism is the simultaneous execution of (possibly &lt;u&gt;related&lt;/u&gt;) computations&quot;&lt;/blockquote&gt;
As I wrote several times in the &lt;a href=&quot;http://database-scalability.blogspot.com/2012/12/database-performance-ferrari-and-truck.html&quot; target=&quot;_blank&quot;&gt;past&lt;/a&gt;, in OLTP, throughput is king, concurrency is the main thing that is put into the test.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;&lt;u&gt;Concurrency&lt;/u&gt;&lt;/b&gt; is where Facebook has a million &quot;Like&quot;s every second, each &quot;Like&quot; is&amp;nbsp;&lt;u&gt;independent&lt;/u&gt;, and they need to be processed concurrently.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;&lt;u&gt;Parallelism&lt;/u&gt;&lt;/b&gt;, is where few concurrent activities, say a few analytic reports run in Oracle Exadata, Vertica or GreenPlum. Every report is is sliced into many &lt;u&gt;related&lt;/u&gt; computations that execute simultaneously.&lt;br /&gt;
&lt;br /&gt;
Are these the same?&lt;br /&gt;
&lt;br /&gt;
From 50,000 feet, we see many things&amp;nbsp;running&amp;nbsp;in the same time, in parallel, concurrently, maybe even distributed. But we need to be accurate, there is a huge difference, and it is in the source: how many &quot;original&quot; transactions we had to process? A million &quot;Like&quot;s vs. a few big analytic report. In both cases I see million operations coming out of them at the back, but:&lt;br /&gt;
In the &quot;Like&quot;s use case - those are the real transactions, concurrently running, distributed.&lt;br /&gt;
In the report use case - those are million pieces of the same initial single job.&lt;br /&gt;
&lt;br /&gt;
Important! Not to be confused! Big difference! One is great for throughput scalability and one is not. More in my next post.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/9054575007256977182/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2013/04/concurrency-is-not-parallelism.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/9054575007256977182'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/9054575007256977182'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2013/04/concurrency-is-not-parallelism.html' title=' Concurrency is not parallelism'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-3235856840831072862</id><published>2013-04-03T19:19:00.000-04:00</published><updated>2013-04-03T19:19:00.905-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="data distribution"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Latency"/><category scheme="http://www.blogger.com/atom/ns#" term="MySQL"/><category scheme="http://www.blogger.com/atom/ns#" term="Scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Scale out"/><category scheme="http://www.blogger.com/atom/ns#" term="Sharding"/><title type='text'>MySQL thread pool and scalability examples </title><content type='html'>Nice article about SimCity outage and ways to defend databases:&amp;nbsp;&lt;a href=&quot;http://www.mysqlperformanceblog.com/2013/03/16/simcity-outages-traffic-control-and-thread-pool-for-mysql/&quot;&gt;http://www.mysqlperformanceblog.com/2013/03/16/simcity-outages-traffic-control-and-thread-pool-for-mysql/&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
The graphs showing throughput with and without the thread pool are taken from the benchmark performed by Oracle and taken from here:&lt;br /&gt;
&lt;a href=&quot;http://www.mysql.com/products/enterprise/scalability.html&quot;&gt;http://www.mysql.com/products/enterprise/scalability.html&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
The main take away is this graph (all rights reserved to Oracle, &lt;a href=&quot;http://www.mysql.com/common/images/enterprise/MySQL_Threadpool_Benchmark_RW.png&quot; target=&quot;_blank&quot;&gt;picture original URL&lt;/a&gt;):&lt;br /&gt;
&lt;img alt=&quot;20x Better Scalability: Read/Write&quot; height=&quot;480&quot; src=&quot;http://www.mysql.com/common/images/enterprise/MySQL_Threadpool_Benchmark_RW.png&quot; width=&quot;640&quot; /&gt;&lt;br /&gt;
Scalability is where throughput can grow and grow, as demand grows. I need to get more from the database, the question is: &quot;can it scale to give it to me?&quot;. Scalability is where the response time remains &quot;acceptable&quot; while the throughput grows and grows.&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;
Every database has a &quot;knee point&quot;.&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;In the &lt;span style=&quot;color: red;&quot;&gt;best case scenario&lt;/span&gt;, in this knee-point, throughput will go into a flat plateau, and&amp;nbsp;On the same point BTW,&amp;nbsp; response time will start climbing, passing the non-acceptable point.&lt;/li&gt;
&lt;li&gt;In a &lt;span style=&quot;color: blue;&quot;&gt;worse case scenario&lt;/span&gt;,&amp;nbsp;in this knee-point, throughput,&amp;nbsp;instead of a flat plateau, it&amp;nbsp;will take a&amp;nbsp;plunger. On the same point BTW, response time will start climbing fast to the roof.&lt;/li&gt;
&lt;/ol&gt;
Actually, the red best case scenario, is actually pretty bad... There&#39;s NO scalability there, throughput has a hard limit! It&#39;s around 6,500 transactions per second. I need to do more on my DB, there are additional connections - but the DB is not giving even 1 inch of additional&amp;nbsp;throughput. It doesn&#39;t scale.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;direction: ltr;&quot;&gt;
The thread pool feature is no more than a defense mechanism. It doesn&#39;t break the scalability limit of a single machine, rather its job is to defend the database from death.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Real scalability is when throughput graph is neither dropping or becoming flat - it goes up and up and up with a stable response time. This can be achieved only by Scale Out. Getting 7,500 TPS with 1 database with 32 connections, then add an additional database and the straight line going up will reach, say, 14,000. A system with 3 database can support 96 connections and 21,000 TPS... and on and on it goes...&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Data needs to be distributed across those databases, so the load can be&amp;nbsp;distributed&amp;nbsp;as well.&amp;nbsp;Maintaining&amp;nbsp;this distributed data on the scaled-out database is the key... I&#39;ll touch that in future posts.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/3235856840831072862/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2013/04/mysql-thread-pool-and-scalability.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/3235856840831072862'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/3235856840831072862'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2013/04/mysql-thread-pool-and-scalability.html' title='MySQL thread pool and scalability examples '/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-6718704017445317573</id><published>2013-03-26T19:02:00.000-04:00</published><updated>2013-03-26T19:02:11.344-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="data distribution"/><category scheme="http://www.blogger.com/atom/ns#" term="Distributed database"/><category scheme="http://www.blogger.com/atom/ns#" term="MySQL"/><category scheme="http://www.blogger.com/atom/ns#" term="NoSQL"/><category scheme="http://www.blogger.com/atom/ns#" term="Sharding"/><title type='text'>They say: &quot;Relational Databases Aren&#39;t Dead&quot;</title><content type='html'>This is a good read, claiming: &quot;&lt;b&gt;&lt;u&gt;Relational Databases Aren&#39;t Dead. Heck, They&#39;re Not Even Sleeping&lt;/u&gt;&lt;/b&gt;&quot;,&amp;nbsp;&lt;a href=&quot;http://readwrite.com/2013/03/26/relational-databases-far-from-dead&quot;&gt;http://readwrite.com/2013/03/26/relational-databases-far-from-dead&lt;/a&gt;. A key quote:&lt;br /&gt;
&lt;blockquote class=&quot;tr_bq&quot;&gt;
&quot;While not comprehensive, the uses for NoSQL databases center around the acquisition of fast-growing data or data that does not easily fit within uniform structures.&quot;&lt;/blockquote&gt;
&lt;br /&gt;
There were 2 parts in the statement about NoSQL&#39;s uses. I&#39;ll start with the latter:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;&lt;u&gt;&quot;data that does not easily fit within uniform structures&quot;&lt;/u&gt;&lt;/b&gt; - NoSQL is probably the right choice, hmm although I always encourage thinking and architecting in advance. And also online structure changes do exist in the RDBMS world and recently in MySQL: &lt;a href=&quot;http://dev.mysql.com/doc/refman/5.6/en/innodb-online-ddl.html&quot;&gt;http://dev.mysql.com/doc/refman/5.6/en/innodb-online-ddl.html&lt;/a&gt;...&lt;br /&gt;
I would definitely warn about the caveats of NoSQL when it comes to actually use and query the data that is so easily stored there...&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;&lt;u&gt;&quot;acquisition of fast-growing data&quot;&lt;/u&gt;&lt;/b&gt; - is no longer a no-go for RDBMS and MySQL database. Distributed RDBMS solutions do exist today and they can exploit performance and scalability from the good old MySQL itself&lt;br /&gt;
&lt;br /&gt;
What do you think?&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/6718704017445317573/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2013/03/they-say-relational-databases-arent-dead.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/6718704017445317573'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/6718704017445317573'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2013/03/they-say-relational-databases-arent-dead.html' title='They say: &quot;Relational Databases Aren&#39;t Dead&quot;'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-9148381314499484368</id><published>2013-01-04T09:56:00.000-05:00</published><updated>2013-01-04T09:56:08.990-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="Cluster"/><category scheme="http://www.blogger.com/atom/ns#" term="Database"/><category scheme="http://www.blogger.com/atom/ns#" term="Database Grid"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Partitioning"/><category scheme="http://www.blogger.com/atom/ns#" term="ScaleBase"/><category scheme="http://www.blogger.com/atom/ns#" term="Sharding"/><category scheme="http://www.blogger.com/atom/ns#" term="Single Master Replication"/><title type='text'>Partial partitioning and sharding</title><content type='html'>I came across this:&amp;nbsp;&lt;a href=&quot;http://stackoverflow.com/questions/14136633/difference-between-partial-replication-and-sharding&quot;&gt;http://stackoverflow.com/questions/14136633/difference-between-partial-replication-and-sharding&lt;/a&gt;&lt;br /&gt;
&lt;blockquote class=&quot;tr_bq&quot;&gt;
I was wondering if sharding is an alternate name for partial replication or not. What I have figured out that --&lt;ul&gt;
&lt;li&gt;Partial Repl. – each data item has only copies at some but not all of the nodes (‘Sharding’?)&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;Pure Partial Repl. – has only copies of a subset of the data item but no node contains a full copy of the database&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;Hybrid Partial Repl. – a set of nodes are full replicas and another set of nodes are partial replicas&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;br /&gt;
&lt;div style=&quot;direction: ltr;&quot;&gt;
I thought it was a good topic, I write a really nice answer, but there was a problem when I pressed &quot;post this answer&quot;, probably some error or mistake on their side. Anyway - this is what I have to say about partial partitioning and sharding:&lt;/div&gt;
&lt;div style=&quot;direction: ltr;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;direction: ltr;&quot;&gt;
Partial replication is an interesting way, in which you distribute the data with replication from a master to slaves, each contains a portion of the data. Eventually you get an array of smaller DBs, read only, each contains a portion of the data. Reads can very well be distributed and parallelized.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;b&gt;&lt;u&gt;But what about the writes?&amp;nbsp;&lt;/u&gt;&lt;/b&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Those are still clogged, in 1 big fat lazy master database, tasks as buffer management, locking, thread locks/semaphores, and recovery tasks - are the real bottleneck of the OLTP, they make writes impossible to scale... As I wrote in many previous posts, for example here: &lt;a href=&quot;http://database-scalability.blogspot.com/2012/08/scale-up-partitioning-scale-out.html&quot;&gt;http://database-scalability.blogspot.com/2012/08/scale-up-partitioning-scale-out.html&lt;/a&gt;.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Sharding is where every piece of data lives only &lt;u&gt;in one place&lt;/u&gt;, within an array of DBs. Each database is the &lt;u&gt;complete owner&lt;/u&gt; of the data: data is read from there, data is written to there. This way, reads and writes are distributed and parallelized, real scale-out can be&amp;nbsp;achieved.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
The idea behind sharding is great, the ultimate scaling solution (ask Facebook, Google, Twitter and all the other big guys) but it&#39;s a mess to handle, to maintain. It&#39;s hard as hell if done by yourself, &lt;a href=&quot;http://www.scalebase.com/&quot; target=&quot;_blank&quot;&gt;ScaleBase&lt;/a&gt; enables an automatic transparent scale-out machine - that does all that, so you won&#39;t have too...&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/9148381314499484368/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2013/01/partial-partitioning-and-sharding.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/9148381314499484368'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/9148381314499484368'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2013/01/partial-partitioning-and-sharding.html' title='Partial partitioning and sharding'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-1269759140669561140</id><published>2012-12-21T22:44:00.000-05:00</published><updated>2012-12-21T22:44:07.316-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Scalability"/><title type='text'>The battle on the OS... </title><content type='html'>I came across this piece:&amp;nbsp;&lt;a href=&quot;http://www.zdnet.com/windows-has-fallen-behind-apple-ios-and-google-android-7000008699/&quot;&gt;http://www.zdnet.com/windows-has-fallen-behind-apple-ios-and-google-android-7000008699/&lt;/a&gt;, and I&#39;m all nostalgic now...&lt;br /&gt;
&lt;br /&gt;
I had an Atari,&amp;nbsp;my first computer was Apple IIc, &amp;nbsp;bunch of my friends had a Commodore 64 (I envied them, it was much cooler than the Apple!) and Amiga (wow!!).&lt;br /&gt;
&lt;br /&gt;
Then arrived almost 2 lost decades where people &lt;b&gt;knew&lt;/b&gt;* that Wintel is the only thing out there, and couldn&#39;t believe there was or ever will be anything else. Just like in the&amp;nbsp;quote from MIB below...&lt;br /&gt;
&lt;br /&gt;
Thank you smartphone, and thank you RIM, Apple and then Google. You made the world a better place.&lt;br /&gt;
I know you&#39;re not doing anything for us, you&#39;re doing it for your own good, to generate value and yield for your&amp;nbsp;stockholders, while pumping&amp;nbsp;ridiculous&amp;nbsp;paychecks to the executives...&lt;br /&gt;
&lt;br /&gt;
Still, you managed to make the world a better, more interesting place to live in, and more challenges to cope with... Big Data, OLTP velocity, app and data sprawl. More challenges for people like me to seek a solution for! You made scalability a big challenge! So, thank you.&lt;br /&gt;
&lt;br /&gt;
BTW, as it turns out to be, I have a Wintel laptop and a Nokia Windows&amp;nbsp;(splendid!)&amp;nbsp;phone, but I&#39;m doing it out of being open minded, and celebrating plurality!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* - The&amp;nbsp;unforgettable&amp;nbsp;&lt;a href=&quot;http://www.imdb.com/title/tt0119654/quotes&quot; target=&quot;_blank&quot;&gt;quote&lt;/a&gt; from &quot;Men In Black&quot;:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote class=&quot;tr_bq&quot;&gt;
&quot;...Fifteen hundred years ago everybody &lt;u&gt;knew&lt;/u&gt; the Earth was the center of the universe. Five hundred years ago, everybody &lt;u&gt;knew&lt;/u&gt; the Earth was flat, and fifteen minutes ago, you &lt;u&gt;knew&lt;/u&gt; that humans were alone on this planet. Imagine what you&#39;ll &lt;u&gt;know&lt;/u&gt; tomorrow&quot;&lt;/blockquote&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/1269759140669561140/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2012/12/the-battle-on-os.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/1269759140669561140'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/1269759140669561140'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2012/12/the-battle-on-os.html' title='The battle on the OS... '/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-8858289137598099929</id><published>2012-12-17T14:24:00.000-05:00</published><updated>2012-12-17T14:24:16.054-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="analytics"/><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="Database"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Latency"/><category scheme="http://www.blogger.com/atom/ns#" term="Performance"/><category scheme="http://www.blogger.com/atom/ns#" term="Scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Throughput"/><title type='text'>Database Performance, a Ferrari and a truck</title><content type='html'>In the last days I got several queries, from colleagues and customers, about one thing I thought it&#39;s a given, well well known, but found out differently: &quot;&lt;b&gt;&lt;u&gt;What is database performance?&lt;/u&gt;&lt;/b&gt;&quot;. Is it speed? Is it throughput? What are the metrics and how do you measure?&lt;br /&gt;
&lt;br /&gt;
I tried to refer to an existing link, but then had to write and describe myself. The thing nearest to describing what I think &quot;Database Performance&quot; really is, is this, it&#39;s not bad yet I was able to make it even simpler to my esteemed colleagues and more esteemed customers.&lt;br /&gt;
&lt;br /&gt;
Database performance, in an essence, derived from 2 major metrics:&lt;br /&gt;
Latency: the time we wait for an operation to finish. Measured in milliseconds (ms) or any other time unit.&lt;br /&gt;
Throughput: number of transactions/commands per time unit usually second or minute.&lt;br /&gt;
&lt;br /&gt;
In the classic world of Data Warehouse and Analytics, throughput is usually a non-issue and latency is king. When database grows larger and larger, analytics complex queries take longer and longer to finish, and the demand is &quot;I need speed!&quot;.&lt;br /&gt;
&lt;br /&gt;
In the world of OLTP, throughput is the important measure. TPC-C benchmarks for example, measure only throughput (New Order Transactions per Minute). Oracle made it to meet 30,249,688 NO Transactions Per Minute, nice job, we as readers of the results have no way to know if a single transaction tool 1ms and they managed to squeeze thousands of those in parallel in 1 minute to meet this number, or maybe, the scenario transaction took exactly 1 minute, and Oracle managed to perform 30,249,688 such transaction in parallel. The truth is somewhere in the middle, between the 1 millisecond and 1 minute...&lt;br /&gt;
&lt;br /&gt;
In OLTP the latency should be bearable (for some it&#39;s 50ms, for some it&#39;s 500ms) and stable as throughput must grow and grow as number of users/sites/devices/accounts/profiles grows and grows.&lt;br /&gt;
&lt;br /&gt;
Another key word is predictability. In my OLTP I need predictable good enough, bearable, constant latency performance. I can&#39;t afford a 50ms transactions to take 1 minute once every while. I need transactions latency to be some X I can live with, I need it constant and predictable - while throughput is growing.&lt;br /&gt;
&lt;br /&gt;
Not a popular comparison, but very very relevant: A Ferrari and a truck. Both have 500 horsepower.&lt;br /&gt;
A Ferrari will take you 200 miles per hour! However a truck will drive a good legal 70, and she&#39;ll go same 70 miles per hour with 100 pounds, 1 ton or 20 tons. Constant, stable, predictable. Yea, I&#39;d like to have a Ferrari for my spare time, or to ace a benchmark, but when it comes to backend server infrastructure they&#39;re more like a truck to me... and they deliver...&lt;br /&gt;
&lt;br /&gt;
Life&#39;s not fair sometimes... at least one of these has definitely got the looks:&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhPxa3vpnSW4HeBU_ziEPwSKGdTyW7h_AEntQ16x-GwNBqSxtFWfaco6WTS11k0Am_8z5JO6qEE0I9H7vYf1baCpBii0G9Uo459hi7NWgkuf-VU-PsqEhYdaSHfepL94D4ElclaV84ETTk/s1600/FerrariTruck.png&quot; imageanchor=&quot;1&quot; style=&quot;clear: left; float: left; margin-bottom: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;132&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhPxa3vpnSW4HeBU_ziEPwSKGdTyW7h_AEntQ16x-GwNBqSxtFWfaco6WTS11k0Am_8z5JO6qEE0I9H7vYf1baCpBii0G9Uo459hi7NWgkuf-VU-PsqEhYdaSHfepL94D4ElclaV84ETTk/s400/FerrariTruck.png&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/8858289137598099929/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2012/12/database-performance-ferrari-and-truck.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/8858289137598099929'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/8858289137598099929'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2012/12/database-performance-ferrari-and-truck.html' title='Database Performance, a Ferrari and a truck'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhPxa3vpnSW4HeBU_ziEPwSKGdTyW7h_AEntQ16x-GwNBqSxtFWfaco6WTS11k0Am_8z5JO6qEE0I9H7vYf1baCpBii0G9Uo459hi7NWgkuf-VU-PsqEhYdaSHfepL94D4ElclaV84ETTk/s72-c/FerrariTruck.png" height="72" width="72"/><thr:total>0</thr:total><georss:featurename>Newtonville, MA 02460, USA</georss:featurename><georss:point>42.3467771 -71.2072321</georss:point><georss:box>42.323306099999996 -71.2467141 42.3702481 -71.167750099999992</georss:box></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-6338328962199855628</id><published>2012-10-03T15:18:00.000-04:00</published><updated>2012-10-03T15:18:26.640-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cloud"/><category scheme="http://www.blogger.com/atom/ns#" term="Data warehouse"/><category scheme="http://www.blogger.com/atom/ns#" term="Database"/><category scheme="http://www.blogger.com/atom/ns#" term="Database Grid"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Oracle"/><category scheme="http://www.blogger.com/atom/ns#" term="Real Application Cluster"/><category scheme="http://www.blogger.com/atom/ns#" term="Scale out"/><title type='text'>&quot;(Cloud) is complete gibberish. It&#39;s insane. When is this idiocy going to stop?&quot;</title><content type='html'>&lt;br /&gt;
This is Larry Ellison keynotes in Oracle OpenWorld on September 2008. Only&amp;nbsp;4 years ago.&lt;br /&gt;
&lt;blockquote class=&quot;tr_bq&quot;&gt;
&quot;The computer industry is the only industry that is more fashion-driven than women&#39;s fashion. Maybe I&#39;m an idiot, but I have no idea what anyone is talking about. What is it? It&#39;s complete gibberish. It&#39;s insane. When is this idiocy going to stop?&quot;&lt;/blockquote&gt;
&lt;blockquote class=&quot;tr_bq&quot;&gt;
&quot;We&#39;ll make cloud computing announcements. I&#39;m not going to fight this thing. But I don&#39;t understand what we would do differently in the light of cloud.&quot;&lt;/blockquote&gt;
The above along with additional&amp;nbsp;marbles&amp;nbsp;are all here:&amp;nbsp;&lt;a href=&quot;http://news.cnet.com/8301-13953_3-10052188-80.html&quot;&gt;http://news.cnet.com/8301-13953_3-10052188-80.html&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
Yesterday Larry stood on that same stage and announced Oracle 12c. The c stands for... Cloud!&lt;br /&gt;
&lt;br /&gt;
And what makes Oracle 12c cloud ready?&lt;br /&gt;
12c is a &quot;container database.&quot; It&#39;s function is to hold lots of other databases, keeping their data separate, but allowing them to share underlying hardware resources like memory or file storage. So this way&amp;nbsp;12c can be used for software-as-a-service tech companies that need a way to let multiple customers access a single database. It&#39;s also geared toward large enterprises who may have hundreds of Oracle databases. It would let them consolidate their databases onto less hardware, saving them money on that and making all of those databases easier to manage.&lt;br /&gt;
&lt;br /&gt;
So in short 2 words: multi-tenancy.&lt;br /&gt;
&lt;br /&gt;
Oracle is still Shared-everything, big boxes, and allow virtualization using internal division and allocation of those shared resources to multiple smaller &quot;virtual databases&quot;. Indeed, it&#39;s great for consolidation and also multi-tenancy.&lt;br /&gt;
&lt;br /&gt;
Cloud? IMHO, to me, cloud is multi-tenancy, but also scale (out) and elasticity. Amazon calls their cloud services EC2, the E stands for Elasticity. The new &quot;DB made for the cloud&quot; has no news about scale-out or about elasticity. Maybe we&#39;ll need to wait for Oracle 13e... :)&lt;br /&gt;
&lt;br /&gt;
Also there&#39;s a new Exadata database machine, called&amp;nbsp;x3... Yet another bigger box to do all the above. They say it&#39;s Oracle competition to SAP HANA.&lt;br /&gt;
&lt;br /&gt;
And finally, we have a new player in the cloud services... Oracle! They&#39;ll have a public cloud offering (like Amazon, Rackspace, HP) and also a private cloud, which&amp;nbsp;is a replica of Oracle&#39;s public cloud that is put in the customer&#39;s own data center. Oracle would still own the hardware and be responsible for running it, securing it and updating it. &quot;as a Service&quot; in your premise. It&#39;s interesting and even rhymes...&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
So it seems someone regained his composure...&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0KxVJKKNIo9xnfPXFBf4BG4dADtxyMPqPmX9PBvw9ZGLJNdky9oOR-sVO9ghjTC1jgwezjpwBj7PCZS8KFQUITPI0JrBLjOJ1trMgn1mOXfI-EdZpvcisfQcmhZidfJ1nqaKc0rtjD6Cs/s1600/LarryCloud.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;240&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0KxVJKKNIo9xnfPXFBf4BG4dADtxyMPqPmX9PBvw9ZGLJNdky9oOR-sVO9ghjTC1jgwezjpwBj7PCZS8KFQUITPI0JrBLjOJ1trMgn1mOXfI-EdZpvcisfQcmhZidfJ1nqaKc0rtjD6Cs/s320/LarryCloud.png&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Ref:&amp;nbsp;&lt;a href=&quot;http://www.businessinsider.com/larry-ellison-just-took-on-amazon-with-a-new-cloud-service-2012-9&quot;&gt;http://www.businessinsider.com/larry-ellison-just-took-on-amazon-with-a-new-cloud-service-2012-9&lt;/a&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/6338328962199855628/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2012/10/cloud-is-complete-gibberish-its-insane.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/6338328962199855628'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/6338328962199855628'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2012/10/cloud-is-complete-gibberish-its-insane.html' title='&quot;(Cloud) is complete gibberish. It&#39;s insane. When is this idiocy going to stop?&quot;'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0KxVJKKNIo9xnfPXFBf4BG4dADtxyMPqPmX9PBvw9ZGLJNdky9oOR-sVO9ghjTC1jgwezjpwBj7PCZS8KFQUITPI0JrBLjOJ1trMgn1mOXfI-EdZpvcisfQcmhZidfJ1nqaKc0rtjD6Cs/s72-c/LarryCloud.png" height="72" width="72"/><thr:total>0</thr:total><georss:featurename>Newtonville, MA 02460, USA</georss:featurename><georss:point>42.3467771 -71.2072321</georss:point><georss:box>42.323306099999996 -71.2467141 42.3702481 -71.167750099999992</georss:box></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-6488283200167532240</id><published>2012-09-28T11:50:00.001-04:00</published><updated>2012-09-28T11:50:07.450-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Abstraction Layer"/><category scheme="http://www.blogger.com/atom/ns#" term="Database"/><category scheme="http://www.blogger.com/atom/ns#" term="Database Grid"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="MySQL"/><category scheme="http://www.blogger.com/atom/ns#" term="Pinterest"/><category scheme="http://www.blogger.com/atom/ns#" term="ScaleBase"/><category scheme="http://www.blogger.com/atom/ns#" term="Sharding"/><title type='text'>Being successful like Pinterest without its DB adventures...</title><content type='html'>I just came across this: &quot;Scaling Pinterest and adventures in database sharding&quot; &amp;nbsp;(&lt;a href=&quot;http://gigaom.com/data/scaling-pinterest-and-adventures-in-database-sharding/&quot;&gt;http://gigaom.com/data/scaling-pinterest-and-adventures-in-database-sharding/&lt;/a&gt;)&lt;br /&gt;
&lt;blockquote class=&quot;tr_bq&quot;&gt;
&quot;Pinterest has learned about scaling the way most popular sites do — the architecture works until one day it doesn’t&quot;&lt;/blockquote&gt;
Pinterest found out that &quot;&lt;u&gt;the architecture&lt;/u&gt;&quot; is not scalable and they turned to &lt;u&gt;development&lt;/u&gt; of a Scale Out mechanism also called Sharding.&lt;br /&gt;
&lt;br /&gt;
I find it amazing that sharding, or in other words, the idea
of &quot;scale out by splitting and parallelizing data across shared-nothing
commodity-hardware&quot; is not supplied &quot;out of the box&quot; by &quot;the architecture&quot; (such as database, load-balancer, any other IT stuff). I&#39;m wondering&amp;nbsp;&lt;b&gt;&lt;u&gt;who was the one that decided that an&amp;nbsp;IT issue like scale-out should be&amp;nbsp;outsourced from the database to the application developers?&lt;/u&gt;&lt;/b&gt;...&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
&lt;o:p&gt;&lt;/o:p&gt;&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
&lt;o:p&gt;&lt;br /&gt;&lt;/o:p&gt;&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
&lt;o:p&gt;Amazing!!&lt;/o:p&gt;&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
&lt;o:p&gt;&lt;br /&gt;&lt;/o:p&gt;&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
&lt;o:p&gt;When was the last time you heard about a PHP or Ruby developer wrote code to enable Scale Out. NEVER! &lt;/o:p&gt;Scale Out in the application layer is enabled easily by a magical box called a load balancer, and you can get one from F5 or wherever for a low 4 digit USD. Commodity!&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
But to scale the database? To enjoy the obvious advantages of &quot;scale out by splitting and parallelizing data across shared-nothing commodity-hardware&quot;? - for this the world still thinks developers need to stop investing effort in innovation, better product,&amp;nbsp;competitive&amp;nbsp;business. Instead they need harness their how-databases-really-work skills to write band-aid code to scale the DB.&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
Amazing... As you know I took it personally, and have been solving this paradox every day now, by bringing a complete, automatic, out-of-the-box&amp;nbsp;&quot;scale-out
machine&quot;, that we like to call &lt;a href=&quot;http://www.scalebase.com/&quot;&gt;ScaleBase&lt;/a&gt;. I think Pinterest story is great, with a great outcome,
but it&#39;s not always the case with this complex matter, and a generic,
repeatable, IT-level solution for Scale Out can make it much easier for all
other &quot;Pinterests&quot; out there to be as successful and make the right choice and enjoy the
great benefits - without the tremendous efforts and labor in home-grown sharding.&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMNoK-vGFIBvXYi328ASiy4uKnVUoKHhjial_5fJSIbMWvZIytsXngsTcNPNL9_qBk3rH-GeQu3lWNki2w0NHliVSApo110fs-xB1mWxNZw2xnk4VNWfyf97IL9uMR0ZmObJlRuWqb9NjA/s1600/PLDB.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;309&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMNoK-vGFIBvXYi328ASiy4uKnVUoKHhjial_5fJSIbMWvZIytsXngsTcNPNL9_qBk3rH-GeQu3lWNki2w0NHliVSApo110fs-xB1mWxNZw2xnk4VNWfyf97IL9uMR0ZmObJlRuWqb9NjA/s320/PLDB.png&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/6488283200167532240/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2012/09/being-successful-like-pinterest-without.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/6488283200167532240'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/6488283200167532240'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2012/09/being-successful-like-pinterest-without.html' title='Being successful like Pinterest without its DB adventures...'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMNoK-vGFIBvXYi328ASiy4uKnVUoKHhjial_5fJSIbMWvZIytsXngsTcNPNL9_qBk3rH-GeQu3lWNki2w0NHliVSApo110fs-xB1mWxNZw2xnk4VNWfyf97IL9uMR0ZmObJlRuWqb9NjA/s72-c/PLDB.png" height="72" width="72"/><thr:total>2</thr:total><georss:featurename>Newtonville, MA 02460, USA</georss:featurename><georss:point>42.3467771 -71.2072321</georss:point><georss:box>42.323306099999996 -71.2467141 42.3702481 -71.167750099999992</georss:box></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-5954728418251390592</id><published>2012-08-31T12:41:00.000-04:00</published><updated>2012-08-31T12:41:06.608-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="Columnar Storage"/><category scheme="http://www.blogger.com/atom/ns#" term="Database"/><category scheme="http://www.blogger.com/atom/ns#" term="Database Grid"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Facebook"/><category scheme="http://www.blogger.com/atom/ns#" term="Scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Scale out"/><category scheme="http://www.blogger.com/atom/ns#" term="Sharding"/><title type='text'>Facebook makes big data look... big!</title><content type='html'>Oh I love these things:&amp;nbsp;&lt;a href=&quot;http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/&quot;&gt;http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
Every day there are 2.5B content items shares, and 2.7B &quot;Like&quot;s. I care less about &lt;a href=&quot;http://en.wikipedia.org/wiki/Garbage_in,_garbage_out&quot;&gt;GiGo&lt;/a&gt; content itself, but metadata, connections, relations are kept transactionally in a relational database. The above 2 use-cases generate 5.2B transactions on the database, and since there are only 86400 seconds a day, we get over 60000 write transactions per second on the database, from these 2 use-cases alone, not to mention all other use-cases, such as new profiles, emails, queries...&lt;br /&gt;
&lt;br /&gt;
And what&#39;s the size of new data, on top of all the existing data, that cannot be deleted so easily, (remember why? Get a hint here:&amp;nbsp;&lt;a href=&quot;http://database-scalability.blogspot.com/2012/08/twitter-and-new-big-data-lifecycle.html&quot;&gt;http://database-scalability.blogspot.com/2012/08/twitter-and-new-big-data-lifecycle.html&lt;/a&gt;). A total 500+TB is added every day, I would&amp;nbsp;exaggerated&amp;nbsp;and assume 98% is pictures and other GiGo content only to leaves us with fuzzy new daily 10TB. There were times Oracle called VLDB to a DB of over 1TB, and here we have 10TB, every day.&lt;br /&gt;
&lt;br /&gt;
So how do FB handle all this? They have a scaled-out grid of several 10000s of MySQL servers.&lt;br /&gt;
&lt;br /&gt;
The size alone is not the entire problem. Enough juice, memory, MPP, columnar - will do the trick.&lt;br /&gt;
The but if we put this throughput of 100Ks transactions per second, it&#39;ll rip the guts out of any database engine. Remember that it translates every write operation into at least 4 internal operations (table, index(s), undo, log) and also needs to do &quot;buffer management, locking, thread locks/semaphores, and recovery tasks&quot;. Can&#39;t happen.&lt;br /&gt;
&lt;br /&gt;
The only way to handle such data size &lt;b&gt;&lt;u&gt;and&lt;/u&gt;&lt;/b&gt; such load is scale-out, divide the big problem to 20000 smaller problems, and this is what FB is doing with their cluster of 10000s of MySQLs.&lt;br /&gt;
&lt;br /&gt;
You&#39;re probably thinking &quot;naaa it&#39;s not my problem&quot;, &quot;hey how many Facebooks are out there?&quot;. Take a look and try to put yourself, your organization, on the chart below:&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgX4JhwIKt0vWNqLbpH0teo0CBxVAdbPx4Qx7fkohI13Rt-BfN4LsqEBW0Xeer9JaeIBW4MGrHhnAFoh0sZhPP-TGjd3Y4dlIw05Zh8mdH4LuPg5juSRznfW1677chUd5YqOgQhKCSJPJ_H/s1600/RoundGraph.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;371&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgX4JhwIKt0vWNqLbpH0teo0CBxVAdbPx4Qx7fkohI13Rt-BfN4LsqEBW0Xeer9JaeIBW4MGrHhnAFoh0sZhPP-TGjd3Y4dlIw05Zh8mdH4LuPg5juSRznfW1677chUd5YqOgQhKCSJPJ_H/s400/RoundGraph.png&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Where are you today? Where will you be in 1 year? In 5 years? Things today go wild and and they go wild faster than ever. Big data is everywhere. New social apps aren&#39;t afraid of &quot;what if no one will show up to my party?&quot;, rather they&#39;re afraid &quot;what if EVERYBODY show up?&quot;&lt;br /&gt;
&lt;br /&gt;
You don&#39;t need to be Facebook to need a good solution to scale out your database</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/5954728418251390592/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2012/08/facebook-makes-big-data-look-big.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/5954728418251390592'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/5954728418251390592'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2012/08/facebook-makes-big-data-look-big.html' title='Facebook makes big data look... big!'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgX4JhwIKt0vWNqLbpH0teo0CBxVAdbPx4Qx7fkohI13Rt-BfN4LsqEBW0Xeer9JaeIBW4MGrHhnAFoh0sZhPP-TGjd3Y4dlIw05Zh8mdH4LuPg5juSRznfW1677chUd5YqOgQhKCSJPJ_H/s72-c/RoundGraph.png" height="72" width="72"/><thr:total>7</thr:total><georss:featurename>Newton, MA 02460, USA</georss:featurename><georss:point>42.3467771 -71.2072321</georss:point><georss:box>42.323306099999996 -71.2467141 42.3702481 -71.167750099999992</georss:box></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-2589120332000830377</id><published>2012-08-28T11:14:00.002-04:00</published><updated>2012-08-28T11:14:49.562-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Abstraction Layer"/><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="cloud"/><category scheme="http://www.blogger.com/atom/ns#" term="Cluster"/><category scheme="http://www.blogger.com/atom/ns#" term="Database"/><category scheme="http://www.blogger.com/atom/ns#" term="Database Grid"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="MySQL"/><category scheme="http://www.blogger.com/atom/ns#" term="OLTP"/><category scheme="http://www.blogger.com/atom/ns#" term="Oracle"/><category scheme="http://www.blogger.com/atom/ns#" term="Partitioning"/><category scheme="http://www.blogger.com/atom/ns#" term="Scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Scale out"/><category scheme="http://www.blogger.com/atom/ns#" term="ScaleBase"/><category scheme="http://www.blogger.com/atom/ns#" term="Sharding"/><category scheme="http://www.blogger.com/atom/ns#" term="virtualization"/><title type='text'>Scale Up, Partitioning, Scale Out</title><content type='html'>On the 8/16 I conducted a webinar titled: &quot;Scale Up vs. Scale Out&quot; (&lt;a href=&quot;http://www.slideshare.net/ScaleBase/scalebase-webinar-816-scaleup-vs-scaleout&quot;&gt;http://www.slideshare.net/ScaleBase/scalebase-webinar-816-scaleup-vs-scaleout&lt;/a&gt;):&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;iframe allowfullscreen=&quot;allowfullscreen&quot; frameborder=&quot;0&quot; height=&quot;421&quot; marginheight=&quot;0&quot; marginwidth=&quot;0&quot; scrolling=&quot;no&quot; src=&quot;http://www.slideshare.net/slideshow/embed_code/14086055&quot; style=&quot;border-width: 1px 1px 0; border: 1px solid #CCC; margin-bottom: 5px;&quot; width=&quot;512&quot;&gt; 
&lt;/iframe&gt; 
&lt;br /&gt;
&lt;div style=&quot;margin-bottom: 5px;&quot;&gt;
&lt;strong&gt; &lt;a href=&quot;http://www.slideshare.net/ScaleBase/scalebase-webinar-816-scaleup-vs-scaleout&quot; target=&quot;_blank&quot; title=&quot;ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut&quot;&gt;ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;http://www.slideshare.net/ScaleBase&quot; target=&quot;_blank&quot;&gt;ScaleBase&lt;/a&gt;&lt;/strong&gt; 
&lt;/div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
The webinar was successful, we had many&amp;nbsp;attendees&amp;nbsp;and great participation in questions and answers&amp;nbsp;throughout&amp;nbsp;the session and in the end.&amp;nbsp;Only after the webinar it only&amp;nbsp;occurred&amp;nbsp;to me that one specific graphic was missing from the webinar deck. It was&amp;nbsp;occurred&amp;nbsp;to me after answering several&amp;nbsp;audience&amp;nbsp;questions about &quot;the difference between partitioning and sharding&quot; or &quot;why partitioning doesn&#39;t qualify as scale-out&quot;.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Having the webinar today, I would definitely include the following picture, describing the core difference between Scale Up, Partitioning, and Scale Out:&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXGAq8jTyrsMsIRq_uNrG3LnLcEEusBsD4dDe6go2BLijSgbF3WdJlADARyHB5N7radRCet_d8ZsTNsxTo-KqUTeZW1PsiGlrrRcZDw9ubLXCyRCkaNdsz4pk-tfiO5NewCPDxPu8zKL1u/s1600/ScaleUpScaleOut.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;246&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXGAq8jTyrsMsIRq_uNrG3LnLcEEusBsD4dDe6go2BLijSgbF3WdJlADARyHB5N7radRCet_d8ZsTNsxTo-KqUTeZW1PsiGlrrRcZDw9ubLXCyRCkaNdsz4pk-tfiO5NewCPDxPu8zKL1u/s400/ScaleUpScaleOut.png&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
In the above (poor) graphics, I used the black server box as the database server machine, the good old cylinder as the disk or storage device, and the colorful square thingy stands for the database engine. Believe it or not, this is a real complete&amp;nbsp;architecture&amp;nbsp;chart of Oracle 10gR2 SGA, miniatured&amp;nbsp;to a small scale. Yes, all databases including Oracle and also MySQL, are complex beasts, a lot of stuff is going on inside the database engine for every command.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
If my DB is like in the &quot;starting point&quot; then I&#39;m either really small, or I&#39;m in a&amp;nbsp;really&amp;nbsp;bad shape by now.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
Partitioning makes wonders as data grows towards being &quot;big data&quot;. It optimizes the data placement on separate files or disks, it makes every partition&amp;nbsp;optimized and&amp;nbsp;&quot;thin&quot; and less fragmented as you would expect from a gigantic busy&amp;nbsp;monolithic&amp;nbsp;table. Still, although splitting the data across files, we&#39;re still &quot;stuck&quot; with&amp;nbsp;busy&amp;nbsp;monolithic database engine that relies on a single box&amp;nbsp;&quot;compute&quot; or&amp;nbsp;&quot;computing power&quot;.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
While we distributed the data, we didn&#39;t distribute the &quot;compute&quot;.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
When there is a heavy join operation, there is one&amp;nbsp;busy&amp;nbsp;monolithic database engine to collect data from all partitions and process this join.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
When there are 10000 concurrent transactions to handle right here and now, there is&amp;nbsp;one&amp;nbsp;busy&amp;nbsp;monolithic database engine to do all database-engine activities such as buffer management, locking, thread locks/semaphores, and recovery tasks. Buffers, locking queues, transaction queues... are still the same for all partitions.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
This is where Scale-out is different than partitions. It&amp;nbsp;enables distribution and&amp;nbsp;parallelism of the &lt;u&gt;data&lt;/u&gt; as well as the so important &lt;u&gt;compute&lt;/u&gt;,&amp;nbsp;brings the compute closer to the data, enables several database engine process different sets of data, handling different sets of the overall session&amp;nbsp;concurrency.&lt;br /&gt;
&lt;br /&gt;
&lt;div style=&quot;direction: ltr;&quot;&gt;
You can think of it as one step forward from partitioning, and it comes with great great results. It&#39;s not a simple step though, an abstraction layer is required to represent the databases grid as one database to the application, same as what it&#39;s used to use.&amp;nbsp;&lt;/div&gt;
&lt;div style=&quot;direction: ltr;&quot;&gt;
In further posts I&#39;ll go into more&amp;nbsp;on this &quot;Scale Out Abstraction Layer&quot;, and also about &lt;a href=&quot;http://www.scalebase.com/&quot;&gt;ScaleBase&lt;/a&gt; which is a provider of such layer&amp;nbsp;&lt;/div&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/2589120332000830377/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2012/08/scale-up-partitioning-scale-out.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/2589120332000830377'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/2589120332000830377'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2012/08/scale-up-partitioning-scale-out.html' title='Scale Up, Partitioning, Scale Out'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXGAq8jTyrsMsIRq_uNrG3LnLcEEusBsD4dDe6go2BLijSgbF3WdJlADARyHB5N7radRCet_d8ZsTNsxTo-KqUTeZW1PsiGlrrRcZDw9ubLXCyRCkaNdsz4pk-tfiO5NewCPDxPu8zKL1u/s72-c/ScaleUpScaleOut.png" height="72" width="72"/><thr:total>0</thr:total><georss:featurename>Newton, MA 02460, USA</georss:featurename><georss:point>42.3467771 -71.2072321</georss:point><georss:box>42.323306099999996 -71.2467141 42.3702481 -71.167750099999992</georss:box></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-7341255878497392045</id><published>2012-08-06T13:50:00.000-04:00</published><updated>2012-08-06T13:50:37.265-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="analytics"/><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="cloud"/><category scheme="http://www.blogger.com/atom/ns#" term="Data warehouse"/><category scheme="http://www.blogger.com/atom/ns#" term="Database Grid"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="OLTP"/><category scheme="http://www.blogger.com/atom/ns#" term="Parallelism"/><category scheme="http://www.blogger.com/atom/ns#" term="Scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Scale out"/><category scheme="http://www.blogger.com/atom/ns#" term="ScaleBase"/><category scheme="http://www.blogger.com/atom/ns#" term="Sharding"/><title type='text'>Twitter and the new big data lifecycle</title><content type='html'>Recently I came across this fine&amp;nbsp;&lt;a href=&quot;http://bits.blogs.nytimes.com/2012/07/24/twitter-is-working-on-a-way-to-retrieve-your-old-tweets&quot;&gt;article&lt;/a&gt;&amp;nbsp;in The New York Times: &quot;&lt;b&gt;&lt;u&gt;Twitter Is Working on a Way to Retrieve Your Old Tweets&lt;/u&gt;&lt;/b&gt;&quot;. Dick Costolo, Twitter’s chief executive, said:&lt;br /&gt;
&lt;blockquote class=&quot;tr_bq&quot;&gt;
&quot;It’s a different way of architecting search, going through all tweets of all time. You can’t just put three engineers on it.&quot;&lt;/blockquote&gt;
Mr. Costolo is right, and pointed the spotlight to a very important change we&#39;re experiencing today, in these such interesting times. The word is&amp;nbsp;&lt;b&gt;&lt;u&gt;expectations&lt;/u&gt;&lt;/b&gt;&amp;nbsp;and those are changing fast!&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;Not so long ago, Big Data was a synonym to Analytics, Data Warehouse, Business Intelligence. Traditionally operational (OLTP) apps held limited amounts of data, only the &quot;current&quot; data, relevant for the ongoing operations. A cashier in a supermarket would hold only recent transactions, to enable lookup of a charge that was done 10 minutes ago, if I need to return and item or dispute the charge while at the cashier. When I come back to the store the day after, I won&#39;t go to the cashier, I should go to &quot;customer service&quot; that with a different application, a different database - I will get the service for my returning items or disputes. A dispute after several months will not be handled by the customer service in the store, but by&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;&quot;the chain&#39;s dispute department&quot;, using a different,&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;3rd app with a 3rd cumulative aggregative DB. And on and on it goes.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
In this simplified example, the organization invested many resources in 3 different DBs and apps aggregating different levels of data, enabling similar and marginal additional functionality. Why?&amp;nbsp;&lt;span style=&quot;background-color: white;&quot;&gt;Data volume and concurrency.&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;At the cashiers, the only place where new data is really generated, there also the highest concurrency. In a global look many thousands of items are &quot;beeped&quot; and sold through the cashiers every minute - data is kept small - generated and extracted out shortly after that. The customer service reps handle tens of customers a minute over larger data, and &quot;the chain&#39;s dispute department&quot; overlooks the biggest data, but handles 1 or 2 cases an hour, and might also execute more &quot;analytic-style&quot; queries to determine the nature of a dispute...&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
This was, in a nutshell, the &quot;lifecycle of the data&quot; in the old world. But&lt;span style=&quot;background-color: white;&quot;&gt;&amp;nbsp;today, everything changes - it&#39;s all online, right here, right now!&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;Enormous&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;&amp;nbsp;amount of (big) data is generated and also searched and analyzed at the same time. Everything is online, here and now. Every tweet (millions a day) is reported instantly to hundreds of followers, participates in saved searches, analyzed by&amp;nbsp;numerous&amp;nbsp;robots and engines throughout the web, and also by Twitter itself. Same goes for every search or e-mail I send in Google and for every status or &quot;like&quot; in Facebook that is is reported to my hundreds of friends and also analyzed at the same time, here and now.&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;Hey its their way to make money, to push the right ads at the right time.&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;And now - we learn the users&amp;nbsp;&lt;b&gt;&lt;u&gt;expect&lt;/u&gt;&lt;/b&gt;&amp;nbsp;to see online data that &quot;old&quot; in the terminology of the old days. I want to see statuses, likes and tweets from 2 and 4 months ago, in the same interface and the same experience I&#39;m used to, don&#39;t send me to the &quot;customer service department&quot;!&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
On the bottom line - it requires scale. Scale you online database to handle online data volumes and throughput, as well as older data,&amp;nbsp;&lt;b&gt;&lt;u&gt;on the same grid&lt;/u&gt;&lt;/b&gt;, without&amp;nbsp;interference,&amp;nbsp;&lt;b&gt;&lt;u&gt;with the same applications&lt;/u&gt;&lt;/b&gt;. This is what scale out is all about. Think outside the (one database server) box. If you have 10 databases for the current data, you can have 10 more with older data, and 100 more with even-older data and so on. Giving a transparent unified view to (or virtualizing) this database grid - is the&amp;nbsp;&lt;a href=&quot;http://www.scalebase.com/&quot;&gt;solution&lt;/a&gt;&amp;nbsp;occupies most of my time, and it&#39;s the missing link to making a database scale-out a commodity.</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/7341255878497392045/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2012/08/twitter-and-new-big-data-lifecycle.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/7341255878497392045'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/7341255878497392045'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2012/08/twitter-and-new-big-data-lifecycle.html' title='Twitter and the new big data lifecycle'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total><georss:featurename>Newton, MA 02460, USA</georss:featurename><georss:point>42.3467771 -71.2072321</georss:point><georss:box>42.323306099999996 -71.2467141 42.3702481 -71.167750099999992</georss:box></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-953914713481895351</id><published>2012-07-10T16:28:00.000-04:00</published><updated>2012-07-10T16:28:00.502-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="analytics"/><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="Data warehouse"/><category scheme="http://www.blogger.com/atom/ns#" term="Database"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Hadoop"/><category scheme="http://www.blogger.com/atom/ns#" term="map reduce"/><category scheme="http://www.blogger.com/atom/ns#" term="Parallelism"/><category scheme="http://www.blogger.com/atom/ns#" term="Scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Scale out"/><title type='text'>So now Hadoop&#39;s days are numbered?</title><content type='html'>Earlier this week we all read GigaOM&#39;s &lt;a href=&quot;http://gigaom.com/cloud/why-the-days-are-numbered-for-hadoop-as-we-know-it/&quot;&gt;article&lt;/a&gt; with this title:&lt;br /&gt;
&lt;blockquote class=&quot;tr_bq&quot;&gt;
&quot;Why the days are numbered for Hadoop as we know it&quot;&lt;/blockquote&gt;
I know GigaOM like to provoke scandals sometimes, we all remember some other unforgettable &lt;a href=&quot;http://gigaom.com/cloud/facebook-trapped-in-mysql-fate-worse-than-death/&quot;&gt;piece&lt;/a&gt;, but there is something behind it...&lt;br /&gt;
&lt;br /&gt;
Hadoop today (after SOA not so long ago) is one of the worst case of an abused buzzword ever known to men. It&#39;s everything, everywhere, can cure illnesses and do &quot;big-data&quot; at the same time! Wow!&amp;nbsp;&lt;span style=&quot;background-color: white;&quot;&gt;Actually Hadoop is a software framework that supports data-intensive distributed applications, derived from Google&#39;s MapReduce and Google File System (GFS) papers.&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
My take from the article is this:&amp;nbsp;&lt;span style=&quot;background-color: white;&quot;&gt;Hadoop is a foundation, low-level platform. I used the word &quot;platform&quot; just because of a lack of a better word. Wait there is a great word that captures it all!&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;This word is &lt;b&gt;&lt;u&gt;Assembler&lt;/u&gt;&lt;/b&gt;.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;When computers begun 70 years ago or so, Assembly is the mother of all programming languages, Assembler made it work in real world computers, silicone and copper. In the world of Big Data, map-reduce, massive distribution and parallelism is the mother of all living things (Assembly). And Hadoop enables it to actually run in the real world (Assembler)...&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;Like Assembler, Hadoop core is far from being really usable. &amp;nbsp;Doing something real, good, working, repeatable with it requires skills that only a few people can really master (Like good Assembler programmers, back in 1960&#39;s).&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHfscznqsboGkpK_TagaXpo3stT31q22noh2JmjLe_RkxZggyOpuP02adiYLta38rYgPMhUSibkc8uE0io-cDtDRVpxemJcCTE2Y8dLb9JxLT84-Lmo0gLDfzwDm_dn7BtOMWyd8RO0bk_/s1600/punchCard.gif&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;181&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHfscznqsboGkpK_TagaXpo3stT31q22noh2JmjLe_RkxZggyOpuP02adiYLta38rYgPMhUSibkc8uE0io-cDtDRVpxemJcCTE2Y8dLb9JxLT84-Lmo0gLDfzwDm_dn7BtOMWyd8RO0bk_/s400/punchCard.gif&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;While I consider myself lucky to have the chance to actually punch cards with brilliant(?) Assembler code, &lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;many of today&#39;s brightest minds in Silicone Valleys around the world never wrote one opcode. They&#39;re all using PHP, Ruby, Java and node.js, which are great &quot;wrappers&quot; around good old Assembly to bring programming, innovation, &lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;disruptiveness - to the masses, make the whole world a better place. It&#39;s how it should be.&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;Hadoop will die only if data and big data dies. Nonsense. Data is by far the most important asset organizations have. Facebook as well as Bank Of America will be worth a fraction of their value in minutes if they loose the same fraction of their data. Both won&#39;t be able to compete if they can&#39;t be intelligent and analyze their data that multiplies every (low number) days/weeks/months. The d&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;ata makes a business intelligent and &lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;Hadoop helps exactly there.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;Hadoop is the Assembler of all analytical big data processing, ETL and queries. The potential around it and its ecosystem is literally unlimited, tons of&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;innovation and&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;disruptiveness are poured by s&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;tartups and communities all over, like Splunk, HBase, Cloudera, Hive, Hadapt, and many many more. And we&#39;re just in the &quot;FORTRAN&quot; phase...&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/953914713481895351/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2012/07/so-now-hadoops-days-are-numbered.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/953914713481895351'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/953914713481895351'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2012/07/so-now-hadoops-days-are-numbered.html' title='So now Hadoop&#39;s days are numbered?'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHfscznqsboGkpK_TagaXpo3stT31q22noh2JmjLe_RkxZggyOpuP02adiYLta38rYgPMhUSibkc8uE0io-cDtDRVpxemJcCTE2Y8dLb9JxLT84-Lmo0gLDfzwDm_dn7BtOMWyd8RO0bk_/s72-c/punchCard.gif" height="72" width="72"/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-5984306368204261103</id><published>2012-06-28T17:57:00.000-04:00</published><updated>2012-06-28T17:57:04.975-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="ARM"/><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="cloud"/><category scheme="http://www.blogger.com/atom/ns#" term="Cluster"/><category scheme="http://www.blogger.com/atom/ns#" term="Database"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="MySQL"/><category scheme="http://www.blogger.com/atom/ns#" term="OLTP"/><category scheme="http://www.blogger.com/atom/ns#" term="Parallelism"/><category scheme="http://www.blogger.com/atom/ns#" term="Scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Scale out"/><category scheme="http://www.blogger.com/atom/ns#" term="Sharding"/><title type='text'>ARM based data center. Inspiring.</title><content type='html'>In a previous &lt;a href=&quot;http://database-scalability.blogspot.com/2012/05/scale-out-your-db-on-arm-based-servers.html&quot;&gt;post&lt;/a&gt; I wrote ARM based servers. Since then, and thanks to all the comments and responses I got, I looked more into this ARM thing and it&#39;s absolutely fascinating...&lt;br /&gt;
&lt;br /&gt;
Look at this beauty (taken from the site of &lt;a href=&quot;http://www.calxeda.com/&quot;&gt;Calxeda&lt;/a&gt;, the&amp;nbsp;manufacturer):&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBRRp0YkQgMCcN6DbmW9TUYxv4zNeReNit6EecGns-kYBS4oXCYhdWambb1ZDcucAnxElBR0_e4eVo4uL_psqoIKGDLAQSnVumy2q2y5RTYT17h7ZBfN9MpWmbC2EAAXQ2Nn2X_FFaqfKu/s1600/ARM1.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;195&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBRRp0YkQgMCcN6DbmW9TUYxv4zNeReNit6EecGns-kYBS4oXCYhdWambb1ZDcucAnxElBR0_e4eVo4uL_psqoIKGDLAQSnVumy2q2y5RTYT17h7ZBfN9MpWmbC2EAAXQ2Nn2X_FFaqfKu/s400/ARM1.png&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
What is it? A chip? A server? No, it&#39;s a cluster of 4 servers...&lt;br /&gt;
&lt;br /&gt;
And this:&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirjofPMV6RVeA0D6d_YsiUJ1OhJXdra_ARYE1HfKAmUD39fc8fUvjlmFlsjo8kunp-nfqVPzY2h74StT82GpsrysMtW2Ooc7IysaBkv4fgqlactV7odYElxNcSpgzRYnbmtgy_31vQYN_d/s1600/ARM2.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;244&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirjofPMV6RVeA0D6d_YsiUJ1OhJXdra_ARYE1HfKAmUD39fc8fUvjlmFlsjo8kunp-nfqVPzY2h74StT82GpsrysMtW2Ooc7IysaBkv4fgqlactV7odYElxNcSpgzRYnbmtgy_31vQYN_d/s640/ARM2.png&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
is&amp;nbsp;&lt;span style=&quot;background-color: white;&quot;&gt;HP Redstone Server, 288 chips, 1,152 cores (Calxeda quad-core SoC) in a 4U server “Dramatically reducing the cost and complexity of cabling and switching”. Calxeda is talking about:&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;“Cut energy and space by 90%”, and&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;“10x the performance at the same power, the same space” and it&#39;s just the beginning...&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
And &lt;a href=&quot;http://blogs.arm.com/smart-connected-devices/746-arm-in-servers-taming-big-data-with-calxeda-isc12&quot;&gt;this&lt;/a&gt; is from the last couple of days... From&amp;nbsp;&lt;span style=&quot;background-color: white;&quot;&gt;ISC&#39;12 (International Supercomputing Conference): &quot;ARM in Servers – Taming Big Data with Calxeda&quot;:&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style=&quot;background-color: white;&quot;&gt;In the case of data intensive computing, re-balancing or ‘right-sizing’ the solution to eliminate bottlenecks can significantly improve overall efficiency&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;background-color: white;&quot;&gt;By combining a quad-core ARM® Cortex™-A series processor with topology agnostic integrated fabric interconnect (providing up to 50Gbits of bandwidth at latencies less than 200ns per hop), they can eliminate network bottlenecks and increase scalability&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
You still can&#39;t go to the store and buy a 4U ARM-based database server that performs 10x and uses 1/10 of the power (combine them, it order of magnitude of 100x...). It&#39;s not now, maybe not tomorrow, but it&#39;s not sci-fi. And technologies will have to adapt to this world of &quot;multiple machines, shared nothing, commodity hardware&quot;. I think databases will be the hardest tech to adapt, the only way is to distribute the data wisely and then&amp;nbsp;&lt;span style=&quot;background-color: white;&quot;&gt;distribute the processing, sometimes&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;parallelize&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;&amp;nbsp;processing and access to&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;harness those thousands of cores.&lt;/span&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/5984306368204261103/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2012/06/arm-based-data-center-inspiring.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/5984306368204261103'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/5984306368204261103'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2012/06/arm-based-data-center-inspiring.html' title='ARM based data center. Inspiring.'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBRRp0YkQgMCcN6DbmW9TUYxv4zNeReNit6EecGns-kYBS4oXCYhdWambb1ZDcucAnxElBR0_e4eVo4uL_psqoIKGDLAQSnVumy2q2y5RTYT17h7ZBfN9MpWmbC2EAAXQ2Nn2X_FFaqfKu/s72-c/ARM1.png" height="72" width="72"/><thr:total>7</thr:total><georss:featurename>Newton, MA 02460, USA</georss:featurename><georss:point>42.3467771 -71.2072321</georss:point><georss:box>42.323306099999996 -71.2467141 42.3702481 -71.167750099999992</georss:box></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-2143895078465882782</id><published>2012-06-20T10:46:00.002-04:00</published><updated>2012-06-20T10:46:27.094-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="Cluster"/><category scheme="http://www.blogger.com/atom/ns#" term="Database"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Memcached"/><category scheme="http://www.blogger.com/atom/ns#" term="MySQL"/><category scheme="http://www.blogger.com/atom/ns#" term="NoSQL"/><category scheme="http://www.blogger.com/atom/ns#" term="OLTP"/><category scheme="http://www.blogger.com/atom/ns#" term="Parallelism"/><category scheme="http://www.blogger.com/atom/ns#" term="Replication"/><category scheme="http://www.blogger.com/atom/ns#" term="Scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Scale out"/><category scheme="http://www.blogger.com/atom/ns#" term="Single Master Replication"/><title type='text'>The catch-22 of read/write splitting</title><content type='html'>&lt;span style=&quot;background-color: white;&quot;&gt;In my previous &lt;a href=&quot;http://database-scalability.blogspot.com/2012/06/why-shared-storage-db-clusters-dont.html&quot;&gt;post&lt;/a&gt; I covered the shard-disk paradigm&#39;s pros and cons, but the conclusion that is that it cannot really qualify as a scale-out solution, when it comes to massive OLTP, big-data, big-sessions-count and mixture of reads and writes.&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;&lt;u&gt;Read/Write splitting&lt;/u&gt;&lt;/b&gt; is&amp;nbsp;achieved when&amp;nbsp;numerous replicated database servers are used for reads. This way the system can scale to cope with increase in concurrent load. This solution&amp;nbsp;&lt;span style=&quot;background-color: white;&quot;&gt;qualifies as a scale-out solution as it allow&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;expansion beyond the boundaries of one DB,&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;DB machines are shared-nothing, can be added as a slave to the replication &quot;group&quot; w&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;hen required.&lt;/span&gt;&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhoP_7sS0J3LWcuh-3uBFm121374cLhV1sgSAX5SkEJHkrJdIVMkJs3TGmOyhM6kFN48flE161z0VAncKBW8-HZoMGwL3gX8HwCr2Ws2yUjOjNY-SIN9FMjyNBZFcSZybN0g-M5f6Eug0ud/s1600/R-W-Split.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;173&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhoP_7sS0J3LWcuh-3uBFm121374cLhV1sgSAX5SkEJHkrJdIVMkJs3TGmOyhM6kFN48flE161z0VAncKBW8-HZoMGwL3gX8HwCr2Ws2yUjOjNY-SIN9FMjyNBZFcSZybN0g-M5f6Eug0ud/s320/R-W-Split.png&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;And, as a fact, read/write splitting is very popular and widely used by lots of high-traffic applications such as popular web sites, blogs, mobile apps, online games and social applications.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
However, today&#39;s extreme challenges of big-data, increased load and advance requirements expose vulnerabilities and flaws in this solution. Let&#39;s summarize them here:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style=&quot;background-color: white;&quot;&gt;&lt;u&gt;All writes go to the master node = bottleneck&lt;/u&gt;: While reading sessions are distributed across several database servers (replication slaves), writing sessions are all going to the same primary/master server,&amp;nbsp;hence&amp;nbsp;still a bottleneck, all of them will consume all resources from the DB for our well-known &quot;buffer management, locking, thread locks/semaphores, and recovery tasks&quot;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;u&gt;Scaled sessions&#39; load, not big data&lt;/u&gt;&lt;span style=&quot;background-color: white;&quot;&gt;: While I can take my, X&amp;nbsp;reading&amp;nbsp;sessions and spread them over my 5 replication slaves giving each to handle with only X/5 sessions, however my giant DB will have to be &lt;/span&gt;&lt;b&gt;replicated as a whole&lt;/b&gt;&lt;span style=&quot;background-color: white;&quot;&gt; to all servers. Prepare lots of disks...&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;background-color: white;&quot;&gt;&lt;u&gt;Scale? Yes. Query performance? No&lt;/u&gt;: Queries on each read-replica need to cope with the entire data of the database. No parallelism, to smaller data sets to handle&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;u&gt;Replication lag&lt;/u&gt;&lt;span style=&quot;background-color: white;&quot;&gt;: Async replication will always introduce lag. Be prepared for a lag between the reads and the writes.&lt;/span&gt;&lt;/li&gt;
&lt;li style=&quot;direction: ltr;&quot;&gt;&lt;span style=&quot;background-color: white;&quot;&gt;&lt;u&gt;Reads after write&lt;/u&gt;&amp;nbsp;will show missing data. The transaction is not yet committed so it&#39;s not written to the log, not propagated to salve machine, not applied at the slave DB.&amp;nbsp;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
Above all, databases suffer from writes made by many concurrent sessions. Database engine themselves become bottleneck because of their *&lt;span style=&quot;background-color: white;&quot;&gt;buffer management, locking, thread locks/semaphores, and recovery tasks*. Reads are a secondary target. BTW - reads performance and scale can be very well gained by good smart caching, use of a &lt;a href=&quot;http://nosql-database.org/&quot;&gt;NoSQL&lt;/a&gt; such as &lt;a href=&quot;http://memcached.org/&quot;&gt;Memcached&lt;/a&gt; in the app, in front of the RDBMS.&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;In modern applications we see more and more avoided&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;reads and&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;writes, that cannot be avoided or cached, storming the DB.&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;div style=&quot;direction: ltr;&quot;&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;span style=&quot;background-color: white;&quot;&gt;R/W splitting is usually implemented today inside the application code, the it&#39;s&amp;nbsp;easy to start, then becomes hard... I&amp;nbsp;&lt;/span&gt;recommend using a specialized COTS product that does it 100 times better and may eliminate some or all limitations above (&lt;a href=&quot;http://www.scalebase.com/&quot;&gt;ScaleBase&lt;/a&gt; is one solution that gives that (among other things)).&lt;/span&gt;&lt;/div&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;span style=&quot;background-color: white;&quot;&gt;This is&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;read/write splitting&#39;s catch 22.&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;It&#39;s an OK scale-out solution and relatively easy to implement, b&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;ut improvement of caching systems, changing requirements in the online applications and big-data and big-concurrency&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;&amp;nbsp;-&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: white;&quot;&gt;rapidly driving it towards its fate, become less and less relevant, and only play a partial role in a complete scale-out plan.&amp;nbsp;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;
&lt;span style=&quot;background-color: white;&quot;&gt;In a complete scale-out solution, where data is&amp;nbsp;distributed&amp;nbsp;(not replicated)&amp;nbsp;throughout&amp;nbsp;a grid of shared-nothing databases, read/write splitting will play its part, but only a minor one. Will get to that in next posts.&lt;/span&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/2143895078465882782/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2012/06/catch-22-of-readwrite-splitting.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/2143895078465882782'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/2143895078465882782'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2012/06/catch-22-of-readwrite-splitting.html' title='The catch-22 of read/write splitting'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhoP_7sS0J3LWcuh-3uBFm121374cLhV1sgSAX5SkEJHkrJdIVMkJs3TGmOyhM6kFN48flE161z0VAncKBW8-HZoMGwL3gX8HwCr2Ws2yUjOjNY-SIN9FMjyNBZFcSZybN0g-M5f6Eug0ud/s72-c/R-W-Split.png" height="72" width="72"/><thr:total>2</thr:total><georss:featurename>Newton, MA 02460, USA</georss:featurename><georss:point>42.3467771 -71.2072321</georss:point><georss:box>42.323306099999996 -71.2467141 42.3702481 -71.167750099999992</georss:box></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-4174296551262073025</id><published>2012-06-07T09:56:00.002-04:00</published><updated>2012-06-07T09:56:26.516-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="cloud"/><category scheme="http://www.blogger.com/atom/ns#" term="Cluster"/><category scheme="http://www.blogger.com/atom/ns#" term="Database"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="MySQL"/><category scheme="http://www.blogger.com/atom/ns#" term="OLTP"/><category scheme="http://www.blogger.com/atom/ns#" term="Oracle"/><category scheme="http://www.blogger.com/atom/ns#" term="RAC"/><category scheme="http://www.blogger.com/atom/ns#" term="Real Application Cluster"/><category scheme="http://www.blogger.com/atom/ns#" term="Scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Scale out"/><category scheme="http://www.blogger.com/atom/ns#" term="Sharding"/><category scheme="http://www.blogger.com/atom/ns#" term="virtualization"/><title type='text'>Why shared-storage DB clusters don&#39;t scale</title><content type='html'>Yesterday I was asked by a customer for the reason why he had failed to achieve scale with a state-of-the-art &quot;shared-storage&quot; cluster. &quot;It&#39;s a scale-out to 4 servers, but with a shared disk. And I got, after tons of work and efforts, 130%&amp;nbsp;throughput, not even close to the expected 400%&quot; he said.&lt;br /&gt;
&lt;br /&gt;
Well, scale-out cannot be achieved with a shared storage and the word &quot;&lt;u&gt;shared&lt;/u&gt;&quot; is the key. Scale-out is done with absolutely&amp;nbsp;&lt;u&gt;nothing&lt;/u&gt; shared or a &quot;shared-nothing&quot; architecture. This what makes it&amp;nbsp;linear and unlimited.&amp;nbsp;Any shared resource, creates a tremendous burden on each and every database server in the cluster.&lt;br /&gt;
&lt;br /&gt;
In a&amp;nbsp;&lt;a href=&quot;http://database-scalability.blogspot.com/2012/05/were-in-big-data-business.html&quot;&gt;previous post&lt;/a&gt;, I identified database engine activities such as buffer management, locking, thread locks/semaphores, and recovery tasks - as the main bottleneck in the OLTP database, handling reads and writes mixture.&amp;nbsp;No matter which database engine, they all have all of the above, Oracle, MySQL, all of them.&amp;nbsp;&quot;&lt;b&gt;&lt;u&gt;The database engine itself becomes the bottleneck!&lt;/u&gt;&lt;/b&gt;&quot; I wrote.&lt;br /&gt;
&lt;br /&gt;
With a shared disk - there is a single shared copy of my big data on the shared disk, the database engine still have to maintain &quot;buffer management, locking, thread locks/semaphores, and recovery tasks&quot;. But now - with a twist! Now all of the above need to be done &quot;globally&quot; between all participating servers, thru network adapters and cables, introducing latency. Every database server in the cluster needs to update all other nodes for every&amp;nbsp;&quot;buffer management, locking, thread locks/semaphores, and recovery tasks&quot; it is doing on a block of data. See here number of &quot;conversation paths&quot; between the 4 nodes:&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUeQ-24WlxF08bgT8nXxweWLXxcyuV5MjOKRPr95VP9FChN4zn9m-ks4VJ-ZcVn4E3g5Vi6t9b6uwxpO7dfjqlH6gNF1lzPEpgoXP2gnDHKUQJIDHa8QIffP2RrDqPVhz-3glZ-eidCQBu/s1600/RAC.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUeQ-24WlxF08bgT8nXxweWLXxcyuV5MjOKRPr95VP9FChN4zn9m-ks4VJ-ZcVn4E3g5Vi6t9b6uwxpO7dfjqlH6gNF1lzPEpgoXP2gnDHKUQJIDHa8QIffP2RrDqPVhz-3glZ-eidCQBu/s1600/RAC.png&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
Blue dashed lines are data access, red lines are communications between nodes. Every node must notified and be notified to and by all other nodes. It&#39;s a complete graph, with 4 nodes and there are 6 edges. With 10 you&#39;ll find 45 red lines here (&lt;i&gt;n(n - 1)/2&lt;/i&gt;, a&amp;nbsp;&lt;a href=&quot;http://en.wikipedia.org/wiki/Complete_graph&quot;&gt;reminder&lt;/a&gt; from Computer Science courses...). Imagine the noise, the latency, &lt;u&gt;for every&lt;/u&gt;&amp;nbsp;&quot;buffer management, locking, thread locks/semaphores, and recovery tasks&quot;. Shared-storage becomes shared-everything.&amp;nbsp;Node A makes an update to block X - 10 machines need to acknowledge.&amp;nbsp;I wanted to scale, but instead I multiplied the initial problem.&lt;br /&gt;
&lt;br /&gt;
And I didn&#39;t even mention the fact that the shared disk might become a SPOF and a bottleneck.&lt;br /&gt;
And I didn&#39;t&amp;nbsp;even&amp;nbsp;mention the limitations when you wanna go with this to cloud or virtualization.&lt;br /&gt;
And I didn&#39;t mention the tons of money this toy costs. I prefer buying 2 of &lt;a href=&quot;http://www.ferrari.com/English/GT_Sport%20Cars/CurrentRange/458-Italia/Pages/458-Italia.aspx&quot;&gt;these&lt;/a&gt;, one for me and one for a good friend, and hit the road together...&lt;br /&gt;
&lt;br /&gt;
Shared-disk solutions gives a very limited solution to
OLTP scale. If the data is not distributed, computing resources required to handle this data will not be distributed and thus reduced, on the contrary, they will be multiplied and consume all available resources from all machines.&lt;br /&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
&lt;o:p&gt;&lt;br /&gt;&lt;/o:p&gt;&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
Real scale-out is achieved by distributing the data in&amp;nbsp;&lt;u&gt;shared nothing&lt;/u&gt;, every database node is &lt;u&gt;independent&lt;/u&gt; with its data,&amp;nbsp;no&amp;nbsp;duplications,&amp;nbsp;no notifications, ownership or acknowledges over any network. If data is distributed correctly, concurrent sessions will be also distribute across servers, each node runs extremely fast on its small-data, small load, with its own small &quot;buffer management, locking, thread locks/semaphores, and recovery tasks&quot;.&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;MsoNormal&quot;&gt;
My customer responded &quot;Makes perfect sense! Tell me more about Scale-Out and distribution...&quot;.&amp;nbsp;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/4174296551262073025/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2012/06/why-shared-storage-db-clusters-dont.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/4174296551262073025'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/4174296551262073025'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2012/06/why-shared-storage-db-clusters-dont.html' title='Why shared-storage DB clusters don&#39;t scale'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUeQ-24WlxF08bgT8nXxweWLXxcyuV5MjOKRPr95VP9FChN4zn9m-ks4VJ-ZcVn4E3g5Vi6t9b6uwxpO7dfjqlH6gNF1lzPEpgoXP2gnDHKUQJIDHa8QIffP2RrDqPVhz-3glZ-eidCQBu/s72-c/RAC.png" height="72" width="72"/><thr:total>7</thr:total><georss:featurename>Newton, MA, USA</georss:featurename><georss:point>42.3028239 -71.1864397</georss:point><georss:box>42.2558489 -71.2654037 42.3497989 -71.1074757</georss:box></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-4514760756447002376</id><published>2012-05-30T13:39:00.000-04:00</published><updated>2012-05-30T13:39:32.990-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="ARM"/><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="cloud"/><category scheme="http://www.blogger.com/atom/ns#" term="Database"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="MySQL"/><category scheme="http://www.blogger.com/atom/ns#" term="OLTP"/><category scheme="http://www.blogger.com/atom/ns#" term="Scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Scale out"/><title type='text'>Scale-out your DB on ARM-based servers</title><content type='html'>Today, I think we witnessed a small sign for a big revolution...&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.pcworld.com/businesscenter/article/256383/dell_reaches_for_the_cloud_with_new_prototype_arm_server.html&quot;&gt;http://www.pcworld.com/businesscenter/article/256383/dell_reaches_for_the_cloud_with_new_prototype_arm_server.html&lt;/a&gt;
&lt;br /&gt;
&lt;blockquote class=&quot;tr_bq&quot;&gt;
&quot;Dell announced a prototype low-power server with ARM processors, following a growing demand by Web companies for custom-built servers that can scale performance while reducing financial overhead on data centers&quot;&lt;/blockquote&gt;
In short, ARM (see Wikipedia definition &lt;a href=&quot;http://en.wikipedia.org/wiki/ARM_architecture&quot;&gt;here&lt;/a&gt;) is an architecture standard for processors. ARM processors are slower compared to good old x86 processors from Intel and AMD, but have power-efficiency, density and price attributes that intrigue customers, especially in our days of green data centers where carbon emissions is carefully measured, and of course, cost-saving economics.&lt;br /&gt;
&lt;br /&gt;
Take iPhones and iPads for example, those&amp;nbsp;amazing machines do fast real-time calculations with their&amp;nbsp;relatively&amp;nbsp;powerful ARM processors (Apple &lt;a href=&quot;http://en.wikipedia.org/wiki/Apple_A4&quot;&gt;A4&lt;/a&gt;, &lt;a href=&quot;http://en.wikipedia.org/wiki/Apple_A5&quot;&gt;A5&lt;/a&gt;, A5x), yet are extremely efficient with regards to power and stay relatively cold. See picture (credits to &lt;a href=&quot;http://en.wikipedia.org/wiki/Apple_A5&quot;&gt;Wikipedia&lt;/a&gt;) of the newest Apple A5x chip, used in New iPad:&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsIpnGHLn_LTLCtvfFeBz54QLprMqBkiZdbFqdm4Labq1ddtWmuFul9TKi9mjgBwY93gdZ0HkCNwwv0W-fW3kzY8Mg9h7JOU_7c7m0lSvOunzZhyphenhypheno8Npzl3wHBVAte_-4MvnnzH1FvLMVG/s1600/240px-Apple_A5X_Chip.jpg&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsIpnGHLn_LTLCtvfFeBz54QLprMqBkiZdbFqdm4Labq1ddtWmuFul9TKi9mjgBwY93gdZ0HkCNwwv0W-fW3kzY8Mg9h7JOU_7c7m0lSvOunzZhyphenhypheno8Npzl3wHBVAte_-4MvnnzH1FvLMVG/s1600/240px-Apple_A5X_Chip.jpg&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
Today when true big web and cloud players build their data centers the question is not &quot;how big are your servers?&quot; but rather &quot;how many servers do you carry?&quot;. Ask&amp;nbsp;Facebook, Google, Netflix, and more to come... For those guys, no single server can be big enough anyway, so they&#39;re built from the ground up for scaling-out to numerous servers and performing small tasks, concurrently. Familiar with Google&#39;s &lt;a href=&quot;http://en.wikipedia.org/wiki/MapReduce&quot;&gt;Map-Reduce&lt;/a&gt;&amp;nbsp;and &lt;a href=&quot;http://en.wikipedia.org/wiki/Hadoop&quot;&gt;Hadoop&lt;/a&gt;? So - why not parallelize on ARM based servers? Can you imagine 20 iPads, occupying the same space as a&amp;nbsp;1U rack-mounted &quot;pizza&quot; server, but with 5x parallel computing power, 20x electricity power efficiency, and 100x cooling costs efficiency?&lt;br /&gt;
&lt;br /&gt;
So what all this has to do with &lt;a href=&quot;http://database-scalability.blogspot.com/&quot;&gt;database scalability&lt;/a&gt; you ask?&lt;br /&gt;
&lt;br /&gt;
With this quote from the article, I don&#39;t agree: &quot;But ARM still cannot match up chips from Intel and AMD for resource-heavy tasks such as databases.&quot;.&lt;br /&gt;
&lt;br /&gt;
Oh I remember those days when, in every data center in every organization I gave consulting to, I saw the same picture... All machines were nice, neatly organized, tagged, blades, standard racks, mostly virtualized... But the DB? No... Those servers were&amp;nbsp;non-standard, the biggest, ugliest, most-expensive, capex and opex.&amp;nbsp;Why? It&#39;s the DB! It&#39;s special! It has needs!! Specialized HW, specialized storage, $$$. Those days are over, as organizations&#39; need to save is arriving to the shores of the sacred database. Today, harder&amp;nbsp;questions are being asked,&amp;nbsp;more&amp;nbsp;is being done in commoditization, more databases are virtualized, and the cloud...&lt;br /&gt;
&lt;br /&gt;
Big Data is everywhere, in the web, in the cloud, in the enterprise, databases must scale, scale-out or else explode, it&#39;s a hard fact. Databases can be scaled with &lt;a href=&quot;http://database-scalability.blogspot.com/2012/05/were-in-big-data-business.html&quot;&gt;smart distribution and parallelism&lt;/a&gt;, and then can use commodity hardware, can be easily virtualized, and cloud-ified. If distribution is done correctly - the sum is greater than its parts, and the parts in this case can be low end... The lowest of the low... database machines can definitely be ARM based servers, each holds portion of the data, attracts a portion of the concurrent sessions, and contributes to the overall processing.&lt;br /&gt;
&lt;br /&gt;
A database on an iPad? Naa, I prefer breaking another record in Fruit Ninja.&lt;br /&gt;
A database on 20 ARM-based servers? If it&#39;s 5x faster and costs 60x less in electricity and cooling - then yes, definitely.</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/4514760756447002376/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2012/05/scale-out-your-db-on-arm-based-servers.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/4514760756447002376'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/4514760756447002376'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2012/05/scale-out-your-db-on-arm-based-servers.html' title='Scale-out your DB on ARM-based servers'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsIpnGHLn_LTLCtvfFeBz54QLprMqBkiZdbFqdm4Labq1ddtWmuFul9TKi9mjgBwY93gdZ0HkCNwwv0W-fW3kzY8Mg9h7JOU_7c7m0lSvOunzZhyphenhypheno8Npzl3wHBVAte_-4MvnnzH1FvLMVG/s72-c/240px-Apple_A5X_Chip.jpg" height="72" width="72"/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6415786925319620734.post-1059728717075182111</id><published>2012-05-21T14:12:00.003-04:00</published><updated>2012-05-21T14:14:56.776-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="ALL_ROWS"/><category scheme="http://www.blogger.com/atom/ns#" term="analytics"/><category scheme="http://www.blogger.com/atom/ns#" term="big data"/><category scheme="http://www.blogger.com/atom/ns#" term="Columnar Storage"/><category scheme="http://www.blogger.com/atom/ns#" term="Data warehouse"/><category scheme="http://www.blogger.com/atom/ns#" term="Database"/><category scheme="http://www.blogger.com/atom/ns#" term="Database scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="FIRST_ROWS"/><category scheme="http://www.blogger.com/atom/ns#" term="OLTP"/><category scheme="http://www.blogger.com/atom/ns#" term="Oracle"/><category scheme="http://www.blogger.com/atom/ns#" term="Parallelism"/><category scheme="http://www.blogger.com/atom/ns#" term="Scalability"/><category scheme="http://www.blogger.com/atom/ns#" term="Scale out"/><category scheme="http://www.blogger.com/atom/ns#" term="Sharding"/><title type='text'>Scaling OLTP is nothing like scaling Analytics</title><content type='html'>We&#39;re in the big data business. OLTP applications and Analytics.&lt;br /&gt;
&lt;br /&gt;
Scaling OLTP applications is nothing like scaling Analytics, like I posted here:&amp;nbsp;&lt;a href=&quot;http://database-scalability.blogspot.com/2012/05/oltp-vs-analytics.html&quot;&gt;http://database-scalability.blogspot.com/2012/05/oltp-vs-analytics.html&lt;/a&gt;. OLTP is a mixture of read and writes, heavy session concurrency and also growing amounts of data.&lt;br /&gt;
&lt;br /&gt;
In my previous post,&amp;nbsp;&lt;a href=&quot;http://database-scalability.blogspot.com/2012/05/scale-differences-between-oltp-and.html&quot;&gt;http://database-scalability.blogspot.com/2012/05/scale-differences-between-oltp-and.html&lt;/a&gt;, I mentioned that Analytics can be scaled using: columnar storage, RAM and query parallelism.&lt;br /&gt;
&lt;br /&gt;
Columnar storage cannot be used for OLTP, as while it makes read scans better, it hurts writes, especially INSERTs. Same goes for RAM, the approach of “let’s put everything in memory” is also problematic for writes that should be Durable (the D in ACID). There are databases that reach Durability with writing to memory of at least 2 machines, I&#39;ll get to that in a later post, but in the simpler view, RAM is great for reads (Analytics), very limited for writes (OLTP).&lt;br /&gt;
&lt;br /&gt;
Query parallelism that worked for Analytics, is limited for OLTP. Mostly because of high concurrency and writes, OLTP is a mixture of read and writes, ratios today reach 50%-50% and more. Every write operation is eventually at least 5 operations for the database, including table, index(s), rollback segment, transaction log, row-level locking and more. Now multiply with 1000 concurrent transactions, and 1TB of data. &lt;u&gt;&lt;b&gt;The database engine itself becomes the bottleneck!&lt;/b&gt;&lt;/u&gt; It puts so many resources into buffer management, locking, thread locks/semaphores, and recovery tasks, no resources are left available for handling query data!&lt;br /&gt;
&lt;br /&gt;
3 bullets why naïve parallelism is not a magic bullet&amp;nbsp;for OLTP:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Parallel query within the same database server will just turn the hard-to-manage 1000 concurrent transactions into impossible 1000000 concurrent sub-transactions… Good luck with that…&amp;nbsp;&lt;/li&gt;
&lt;li&gt;Parallelizing query on several database servers is a step in a good direction. However it can’t scale: if I have 10 servers and each one my 1000 concurrent transactions needs to gather data from all servers in parallel, how many concurrent transactions I’ll have on each server? That’s right, 1000. What did I solve? Can I scale to 2000 concurrent transactions? All my servers will die together. In that case what if I scale to 20 servers instead of 10? Then I’ll have 20 servers with 2000 concurrent transactions… that will all die together.&amp;nbsp;&lt;/li&gt;
&lt;li&gt;OLTP operations are not good candidates for parallelism:&lt;/li&gt;
&lt;ol&gt;
&lt;li&gt;Scans,&amp;nbsp;Full table/index scans and range scans, are parallelized all the time in Analytics,&amp;nbsp;are seldom in OLTP. In OLTP most accesses are short, pinpointed, index-based small range and unique scans. Oracle’s optimizer mode FIRST_ROWS (OLTP) will almost always prefer index access and ALL_ROWS (Analytics) will have hard time give up its favorable full table scan.&amp;nbsp;
So what exactly do we parallelize in OLTP? An index rebuild once a day (scan...)?&lt;/li&gt;
&lt;li&gt;1000 concurrent 1-row INSERT commands a second - is a valid OLTP scenario. What exactly do I parallelize?&lt;/li&gt;
&lt;/ol&gt;
&lt;/ol&gt;
Parallelism cannot be the one and only complete solution. It serves a minor role in the solution, whose key factor&amp;nbsp;is: &lt;b&gt;&lt;u&gt;distribution&lt;/u&gt;&lt;/b&gt;.&lt;br /&gt;
&lt;br /&gt;
OLTP databases can scale only by a smart distribution of data but also the concurrent sessions among numerous database servers. It’s all in the &lt;u&gt;distribution of the data&lt;/u&gt;, if data is distributed in a smart way, concurrent sessions will be also distribute across servers.&lt;br /&gt;
&lt;br /&gt;
Go from 1 big fat database server dealing with 1TB of data and 1000 concurrent transactions, to 10 databases, each deal with easy 100GB and 100 concurrent transactions. I&#39;ll hit the jackpot if I&#39;ll manage to keep databases isolated,&amp;nbsp;&lt;u&gt;shared nothing&lt;/u&gt;, processing-wise, not only cables-wise. Best are &lt;u&gt;transactions that start and finish on a single database&lt;/u&gt;.&lt;br /&gt;
&lt;br /&gt;
And if I’m lucky and my business is booming, I can scale:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Data grew from 1TB to 1.5TB? Add more databases servers.&lt;/li&gt;
&lt;li&gt;Concurrent sessions grew from 1000 to 1500? Add more databases servers.&lt;/li&gt;
&lt;li&gt;Parallel query/update? Sure! If a session does need to scan data from all servers, or need to perform an index rebuild, it can run in parallel on all servers, and will take a fraction of the time.&lt;/li&gt;
&lt;/ol&gt;
Ask Facebook (&lt;a href=&quot;http://www.google.com/finance?q=fb&quot;&gt;FB&lt;/a&gt;, as of today... ). Each of their 10,000s databases is handling a fraction of the data, in a way that only a fraction of all sessions are accessing it in any point in time.&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Each of the databases is still doing hard work on every update/insert/delete and on&amp;nbsp;buffer management, locking, thread locks/semaphores, and recovery tasks. What can we do? It&#39;s OLTP, it&#39;s read/write,&amp;nbsp;it&#39;s ACID... It&#39;s heavy! I trust every one of my DBs to do what it does best, I just give it the optimal data size and session concurrency to do that.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;div&gt;
Let&#39;s summarize here:&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-f5sZL6PRWhcz_bU-xz_ixy4wcT78KV1tO8bNVChz-m8yn1dzOIj5fEah5Xmju0tA_AqOlFnKZ27TcpXWiZjZ-PCkNvza1CLdpgN7RP5gVfkjXYZAsH3otF2WWwC46x1NWaWf0EYVRuh4/s1600/oltp-analytics.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;168&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-f5sZL6PRWhcz_bU-xz_ixy4wcT78KV1tO8bNVChz-m8yn1dzOIj5fEah5Xmju0tA_AqOlFnKZ27TcpXWiZjZ-PCkNvza1CLdpgN7RP5gVfkjXYZAsH3otF2WWwC46x1NWaWf0EYVRuh4/s640/oltp-analytics.png&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
In my next post I&#39;ll dive more into implementations caveats (shared disk, shared memory, sharding) and pitfalls, do&#39;s and don&#39;t&#39;s...&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br class=&quot;Apple-interchange-newline&quot; /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;
Stay tuned, join those who subscribed and get automatic updates, get involved!&lt;/div&gt;
&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-scalability.blogspot.com/feeds/1059728717075182111/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://database-scalability.blogspot.com/2012/05/were-in-big-data-business.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/1059728717075182111'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6415786925319620734/posts/default/1059728717075182111'/><link rel='alternate' type='text/html' href='http://database-scalability.blogspot.com/2012/05/were-in-big-data-business.html' title='Scaling OLTP is nothing like scaling Analytics'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/15220942770776878197</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-f5sZL6PRWhcz_bU-xz_ixy4wcT78KV1tO8bNVChz-m8yn1dzOIj5fEah5Xmju0tA_AqOlFnKZ27TcpXWiZjZ-PCkNvza1CLdpgN7RP5gVfkjXYZAsH3otF2WWwC46x1NWaWf0EYVRuh4/s72-c/oltp-analytics.png" height="72" width="72"/><thr:total>1</thr:total></entry></feed>