<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
    <title>Nati Shalom's Blog</title>
    
    
    <link rel="alternate" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/" />
    <id>tag:typepad.com,2003:weblog-1309908</id>
    <updated>2012-01-16T13:47:15+01:00</updated>
    <subtitle>Thoughts on Scalability, NoSQL, Big Data, DevOps, Cloud, PaaS </subtitle>
    <generator uri="http://www.typepad.com/">TypePad</generator>
    <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/NatiShalom" /><feedburner:info uri="natishalom" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://hubbub.api.typepad.com/" /><entry>
        <title>GigaSpaces Cloudify &amp; VMware CloudFoundary the new PaaS Jailbreaker</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/PpZ-XgXeM9M/there-are-currently-two-main-approaches-for-developing-and-managing-application-in-the-cloudpaas-paas-takes-a-developer-ap.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2012/01/there-are-currently-two-main-approaches-for-developing-and-managing-application-in-the-cloudpaas-paas-takes-a-developer-ap.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef0168e5542ac2970c</id>
        <published>2012-01-16T13:47:15+01:00</published>
        <updated>2012-01-16T13:47:15+01:00</updated>
        <summary>I was reading Krishnan Subramanian's post, Two Events That “Clouded” Our Thinking In 2011. The thing that caught my attention was Krishnan's comments on why PaaS is a superior alternative to DevOps: The shift in the thinking about the enterprise...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Azure" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="CloudStack" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="DevOps" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="EC2" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Java/J2EE" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="JavaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="JClouds" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Multi-tenancy" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="OpenStack" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="PaaS" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Rackspace" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="SaaS" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;I was reading &lt;a href="http://www.cloudave.com/author/krishnan/"&gt;Krishnan &lt;/a&gt;&lt;a href="http://www.cloudave.com/author/krishnan/"&gt;Subramanian&lt;/a&gt;'s post, &lt;a href="http://www.cloudave.com/16494/two-events-that-clouded-our-thinking-in-2011/" rel="bookmark" title="Two Events That “Clouded” Our Thinking In 2011"&gt;Two Events That “Clouded” Our Thinking In 2011&lt;/a&gt;. The thing that caught my attention was Krishnan's comments on why PaaS is a superior alternative to DevOps: &lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;The shift in the thinking about the enterprise cloud consumption also poured water into the “DevOps” concept advocated by vendors and pundits with their foot in the IaaS world. When organizations embrace PaaS instead of infrastructure services, we don’t need the DevOps marriage and the associated cultural change (believe me, this cultural change is giving sleepless nights to many IT managers and some consultants are even making money helping organizations realize this cultural change). With PaaS, organizations can keep the existing distinction between the Ops and Dev teams without worrying about the cultural change. In fact, with cloud computing, the role of the Ops is not going away but it stays in the background offering an interface which developers can manage themselves.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;Krishnan represents one of the common attitudes and subjects of debate between two main paradigms for developing and managing applications on the cloud:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;strong&gt;PaaS&lt;/strong&gt; -- PaaS takes a developer, application-driven approach. A PaaS platform provides generic application containers to run your code. The PaaS platforms deals with all the operational aspects needed to run your code such as deployment, scaling, fail-over, etc.&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="color: #000088;"&gt;&lt;strong&gt;&lt;span style="color: black;"&gt;DevOps&lt;/span&gt;&lt;/strong&gt; -- &lt;span style="color: black;"&gt;DevOps takes a more operations-driven approach. With DevOps, you get tools to automate your operational environment through scripts and recipes, and keep full visability and control over the underlying infrastructure.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;h2&gt;The Difference Between PaaS and DevOps&lt;/h2&gt;&#xD;
&lt;p&gt;Both PaaS and DevOps aim toward the same goal -- reducing the complexity of managing and deploying applications on the cloud. But they take a fairly different approach to deliver on that promise.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;a href="http://www.linkedin.com/profile/view?id=93550&amp;amp;authType=NAME_SEARCH&amp;amp;authToken=RKgJ&amp;amp;locale=en_US&amp;amp;srchid=372c8304-ab92-4983-bb97-38a8ace29e08-0&amp;amp;srchindex=1&amp;amp;srchtotal=60&amp;amp;goback=%2Efps_PBCK_*1_Christopher_Keene_*1_*1_*1_*1_*2_*1_Y_*1_*1_*1_false_1_R_*1_*51_*1_*51_true_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2&amp;amp;pvs=ps&amp;amp;trk=pp_profile_name_link"&gt;Christoper Knee &lt;/a&gt; summarized it in his blog post &lt;a href="http://www.keeneview.com/2011/08/devops-and-paas-friend-or-foe.html"&gt;DevOps and PaaS -- Friend or Foe?&lt;/a&gt; as the difference between Developers and SysAdmin:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Developers may ask: "if I have a self-service portal for deploying applications (aka PaaS), do I need SysAdmins at all?"&lt;/li&gt;&#xD;
&lt;li&gt;SysAdmins may ask: "isn't PaaS just a monstrous black box that prevents me from provisioning the specific services we need to deploy real-world apps?" ... The typical SysAdmin thinks that they can get to 75% of PaaS functionality with DevOps tools like &lt;a href="https://github.com/opscode/chef"&gt;Chef&lt;/a&gt; without giving up any systems architecture flexibility.&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;In one of my earlier post on the subject, &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/03/productivity-vs-control-tradeoffs-in-paas.html"&gt;Productivity vs. Control tradeoffs in PaaS&lt;/a&gt;, I tried to outline the main limitation of most of the current "blackbox PaaS" implementations:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;I thought that Carlos Ble's post &lt;a href="http://www.carlosble.com/2010/11/goodbye-google-app-engine-gae/"&gt;Goodbye Google App Engine (GAE)&lt;img alt="" id="snap_com_shot_link_icon" src="http://www.previewshots.com/images/v1.3/t.gif"&gt;&lt;/img&gt;&lt;/a&gt; is a good example that illustrates why the initial perception behind GAE as a simple platform that provides extreme productivity can be completely wrong.&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;...developing on GAE introduced such design complexity that working around it pushes us 5 months behind schedule.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;Part of the reason that brought Carlos through that experience IMO is that in the course of trying to make GAE extremely productive the owner made the platform too opinionated, to the point where you lose all the potential productivity gains by trying to adopt their model. In addition, with a platform like GAE you have very little freedom to leverage existing frameworks such as your own database, or messaging system, or any other third-party service that can in itself be a huge contributor to productivity.&lt;/p&gt;&#xD;
&lt;p&gt;Instead, you're completely dependent on the platform provider's stack and pace of development, and that in itself can work against agility and productivity in yet another dimension. In this specific example, Carlos couldn’t use a specific version of a Python library that would have made his productivity higher, and instead had to work around issues that were already solved elsewhere. This is a good example how the lack of flexibility leads to poorer productivity even in the case of simple applications.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;h2&gt;Putting DevOps &amp;amp; PaaS togatehr &lt;/h2&gt;&#xD;
&lt;div&gt;It looks like more people in the the industry have come to recognize that rather than looking at DevOps and PaaS as two competing paradigms, it might be best to combine the two, as &lt;a href="http://www.linkedin.com/profile/view?id=93550&amp;amp;authType=NAME_SEARCH&amp;amp;authToken=RKgJ&amp;amp;locale=en_US&amp;amp;srchid=372c8304-ab92-4983-bb97-38a8ace29e08-0&amp;amp;srchindex=1&amp;amp;srchtotal=60&amp;amp;goback=%2Efps_PBCK_*1_Christopher_Keene_*1_*1_*1_*1_*2_*1_Y_*1_*1_*1_false_1_R_*1_*51_*1_*51_true_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2&amp;amp;pvs=ps&amp;amp;trk=pp_profile_name_link"&gt;Christoper Knee&lt;/a&gt; pointed out in his post:&lt;/div&gt;&#xD;
&lt;div&gt; &lt;/div&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;div&gt;What if you could get a PaaS that wasn't a black box, enabling developers to deploy apps easily while still giving SysAdmins the ability to provision any services they needed (a la &lt;a href="http://www.cloudfoundry.org/"&gt;Cloud Foundry&lt;/a&gt;)?&lt;/div&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;div&gt;I also came to that realization myself, as I pointed out in my &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/12/2012-cloud-paas-nosql-predictions.html"&gt; 2012 predictions&lt;/a&gt;: &lt;/div&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;div&gt;In 2012, we’ll see many of the DevOps tools, such as Chef and Puppet, integrated into application platforms, making it easier to deploy complex applications onto the cloud. In the same way, we’re going to see more Application Platforms adopting the automation and recipe model from the DevOps world into the application platform. The latter have the potential to transform the opinionated PaaS offerings as we know them today, with Heroku and GAE leading that trend, into a more open PaaS offering that better fits the way users develop apps today, and provide more freedom to choose your own stack, cloud, and application blueprint.&lt;/div&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;h2&gt;&#xD;
&lt;div&gt;What Makes a Cloudify and CloudFoundary a PaaS Jailbreaker?&lt;/div&gt;&#xD;
&lt;/h2&gt;&#xD;
&lt;div&gt;The CIO Maginze article &lt;span style="font-family: Arial, sans-serif; line-height: 28px; font-size: 10pt;"&gt;&lt;a href="http://www.cio.com.au/article/409290/cloud_computing_disrupts_vendor_landscape/"&gt;Cloud computing disrupts the vendor landscape&lt;/a&gt; &lt;/span&gt;&lt;span style="font-family: Arial, sans-serif; line-height: 28px; font-size: 10pt;"&gt;defines a new class of &lt;/span&gt;PaaS platforms that I think represent the definition for a DevOps PaaS:&lt;/div&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;div&gt;A PaaS that allows developers to use whatever tool they want to build their cloud applications and the platform tackles the deployment, scaling and management of these apps in the cloud data center.&lt;/div&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;div&gt;VMware CloudFoundry is one of the more notable references in that category. Quoting &lt;a href="http://www.linkedin.com/profile/view?id=93550&amp;amp;authType=NAME_SEARCH&amp;amp;authToken=RKgJ&amp;amp;locale=en_US&amp;amp;srchid=372c8304-ab92-4983-bb97-38a8ace29e08-0&amp;amp;srchindex=1&amp;amp;srchtotal=60&amp;amp;goback=%2Efps_PBCK_*1_Christopher_Keene_*1_*1_*1_*1_*2_*1_Y_*1_*1_*1_false_1_R_*1_*51_*1_*51_true_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2&amp;amp;pvs=ps&amp;amp;trk=pp_profile_name_link"&gt;Christoper Knee&lt;/a&gt;:&lt;/div&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;div&gt;CloudFoundry runs anywhere, incuding on your laptop. CloudFoundry's service container concept is particularly strong, kind of an appliance on steroids.&lt;/div&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;div&gt;&#xD;
&lt;p&gt;These ideas were the founding concept behind &lt;a href="http://www.gigaspaces.com/wiki/display/CLOUD/Cloudify+Documentation+Home"&gt;Cloudify&lt;/a&gt;, i.e. putting DevPops &amp;amp; PaaS togather in a single framework. As with CloudFoundry, Cloudify enables you to break away from the "blackbox PaaS". However, even though CloudFoundry is significantly more open than most other PaaS alternatives, at the core it is still based on the "my way or the highway" approach (aka opininated architecture), which forces you to fit into a specific blueprint that is mandated by the platform. Cloudify, on other hand, pushes the envelope even further by adopting the concpet of &lt;a href="http://www.gigaspaces.com/wiki/display/CLOUD/The+Application+Recipe"&gt;recipes&lt;/a&gt; that was first introduced by DevOps frameworks such as Chef and Puppet. It introduces  more application-driven recipes through the introduction of Domain Specific Language that extends upon the groovy language.&lt;/p&gt;&#xD;
&lt;p&gt;The Cloudify &lt;a href="http://www.gigaspaces.com/wiki/display/CLOUD/The+Application+Recipe"&gt;recipes&lt;/a&gt; give you the full power to plug in any application stack on any cloud (including a non-virtualized environment) and manage them in similar way to the way you would manage them in your own datacenter or machines. You can also call Chef and Puppet from within the recipes. All this, without hacking the framework itself. As with other similar DSLs, the Cloudify DSL was designed to express even the most complex application management tasks, such as &amp;lt;recovery from a data center failure&amp;gt; in a single line and avoid all the verbose script and API calls that are often involved when you work at the infrastructure level.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;span style="font-family: verdana, geneva; font-size: 10pt;"&gt;&lt;strong id="internal-source-marker_0.9225309891626239" style="font-weight: normal;"&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;All this makes &lt;/span&gt;&lt;/strong&gt;&lt;strong id="internal-source-marker_0.9225309891626239" style="font-weight: normal;"&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Cloudify an even more open alternative that fits a large variety of the current enterprise application stacks, including:&lt;/span&gt;&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;div style="background-color: transparent;"&gt;&lt;strong id="internal-source-marker_0.9225309891626239" style="font-weight: normal;"&gt; &lt;ol style="font-family: Times; font-size: medium;"&gt;&#xD;
&lt;li&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;JEE applications&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;Big Data applications&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;Multi-tier applications&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;Native applications (C++,..)&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;.Net, Ruby, Python, PHP applications&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;Multi-site applications&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;Low-latency applications (that can't run on VMs)&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;/ol&gt;&#xD;
&lt;p&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;It also makes Cloudify more open for special customization of the existing stack, like in the case of:&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;ol style="font-family: Times; font-size: medium;"&gt;&#xD;
&lt;li&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;An application that needs a certain version of MySQL (not the one that comes with the framework)&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;An application that needs to run on Redhat (not Ubuntu). or even more interesting -- a case where there are mutiple applications, each needing a different OS served at the same time.&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;/ol&gt;&#xD;
&lt;p&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;Cloudify also provides a more advanced level of control geared for mission-critical apps, including:&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;ol style="font-family: Times; font-size: medium;"&gt;&#xD;
&lt;li&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;Monitoring the application stack and topology&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;Adding custom application metrics&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="color: #000000; background-color: transparent; font-weight: normal; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-family: verdana, geneva; font-size: 10pt;"&gt;Adding custom SLAs&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;/ol&gt;&#xD;
&lt;p&gt;&lt;span style="font-family: verdana, geneva; font-size: 10pt;"&gt;It can also work on wide variety of cloud environments, including Microsoft Azure and non-virtual enviroments.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;/strong&gt;&#xD;
&lt;p&gt;&lt;span style="font-family: verdana, geneva; font-size: 10pt;"&gt;&lt;strong id="internal-source-marker_0.9225309891626239" style="font-weight: normal;"&gt;One of the great powers of the &lt;/strong&gt;&lt;a href="http://www.gigaspaces.com/wiki/display/CLOUD/The+Application+Recipe"&gt;recipe&lt;/a&gt;&lt;strong style="font-weight: normal;"&gt; is that it is a great collaboration tool. Once you develop a recipe, it is very easy to share it wtih different groups -- whether internal groups like developement, QA, and operations, where the recipie provides a programmatic definition of thier environment, or it can be shared between the product team and the pro-services team, where your sales and pro-services can easily install and update product versions in a consistent way, as well as reproduce customer scenarios and share them with the support team. Recipes are also a great tool for collaboration through a wider community network, where users can collaborate by sharing common recipes and best practices over the web.&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;h2&gt;Quick Introduction to Cloudify Recipes&lt;/h2&gt;&#xD;
&lt;p&gt;Below is a short snippet that shows a simple application recipe, of a typical java-based web application based on JBoss as the web container and MySQL as a database. The application recipe describes the services that comprise the application, and thier dependencies. The details on how to run MySQL and JBoss is provided in a seperate recipe for each of the individual services. A more detailed description of how a service recipe would look like can be seen &lt;a href="http://www.gigaspaces.com/wiki/display/CLOUD/The+Service+Recipe"&gt;here&lt;/a&gt;. &lt;/p&gt;&#xD;
&lt;pre class="code-java" style="margin-top: 10px; margin-right: 0px; margin-bottom: 10px; margin-left: 0px; text-align: left; overflow-x: auto; overflow-y: auto; font-family: 'Courier New', Courier, monospace; line-height: 1.3; padding: 0px;"&gt;application {&#xD;
       name=&lt;span class="code-quote" style="color: #009100; background-color: inherit;"&gt;"simple app"&lt;/span&gt;&lt;/pre&gt;&#xD;
&lt;pre class="code-java" style="margin-top: 10px; margin-right: 0px; margin-bottom: 10px; margin-left: 0px; text-align: left; overflow-x: auto; overflow-y: auto; font-family: 'Courier New', Courier, monospace; line-height: 1.3; padding: 0px;"&gt;       service {&#xD;
              name = &lt;span class="code-quote" style="color: #009100; background-color: inherit;"&gt;"mysql-service"&lt;/span&gt;&#xD;
      }&#xD;
       service {&#xD;
&#xD;
              name = &lt;span class="code-quote" style="color: #009100; background-color: inherit;"&gt;"jboss-service"&lt;/span&gt;&#xD;
              dependsOn = [&lt;span class="code-quote" style="color: #009100; background-color: inherit;"&gt;"mysql-service"&lt;/span&gt;]&#xD;
       }&#xD;
}&lt;/pre&gt;&#xD;
&lt;p&gt;To get a taste, you can try one of the available recepes such as Cassandra, MongoDB, Tomcat, JBoss, Solar, etc., just as a simple way to &lt;a href="http://www.gigaspaces.com/wiki/display/CLOUD/Cloudify+2.0+Early+Access+Download"&gt;try out&lt;/a&gt; these products on your own desktop or on any of the supported clouds, without the hassle that is often involved in doing so and without even direct relation to Cloudify per-se.&lt;/p&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;/div&gt;&#xD;
&lt;/div&gt;&#xD;
&lt;h2&gt;References&lt;/h2&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;span style="font-family: arial, helvetica, sans-serif; font-size: 10pt;"&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/03/productivity-vs-control-tradeoffs-in-paas.html"&gt;Productivity vs. Control tradeoffs in PaaS&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="font-family: arial, helvetica, sans-serif; font-size: 10pt;"&gt;&lt;a href="http://www.keeneview.com/2011/08/devops-and-paas-friend-or-foe.html"&gt;DevOps and PaaS - Friend or Foe?&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="font-family: arial, helvetica, sans-serif; font-size: 10pt;"&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/01/paas-shouldnt-be-built-in-silos.html"&gt;PaaS shouldn’t be built in Silos&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="font-family: arial, helvetica, sans-serif; font-size: 10pt;"&gt;&lt;a href="http://www.cloudave.com/16494/two-events-that-clouded-our-thinking-in-2011/" rel="bookmark" title="Two Events That “Clouded” Our Thinking In 2011"&gt;Two Events That “Clouded” Our Thinking In 2011&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="line-height: 28px; font-family: arial, helvetica, sans-serif; font-size: 10pt;"&gt;&lt;a href="http://www.cio.com.au/article/409290/cloud_computing_disrupts_vendor_landscape/"&gt;Cloud computing disrupts the vendor landscape&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="font-family: arial, helvetica, sans-serif; font-size: 10pt;"&gt;&lt;a href="http://www.gigaspaces.com/wiki/display/CLOUD/Cloudify+Documentation+Home"&gt;Cloudify docs&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;/ul&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=PpZ-XgXeM9M:VLNTvcn38Dg:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=PpZ-XgXeM9M:VLNTvcn38Dg:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/PpZ-XgXeM9M" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2012/01/there-are-currently-two-main-approaches-for-developing-and-managing-application-in-the-cloudpaas-paas-takes-a-developer-ap.html</feedburner:origLink></entry>
    <entry>
        <title>Realtime Analytics for Big Data: A Facebook Case Study</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/tqh5SOUf21c/realtime-analytics-for-big-data-a-facebook-case-study.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2012/01/realtime-analytics-for-big-data-a-facebook-case-study.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef0168e542168a970c</id>
        <published>2012-01-09T22:04:25+01:00</published>
        <updated>2012-01-09T22:04:25+01:00</updated>
        <summary>via www.youtube.com</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cassandra" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Grid" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="NOSQL" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;&lt;iframe allowfullscreen="allowfullscreen" frameborder="0" height="315" src="http://www.youtube.com/embed/viPRny0nq3o?fs=1&amp;amp;feature=oembed" width="560"&gt;&lt;/iframe&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;small&gt;via &lt;a href="http://www.youtube.com/watch?v=viPRny0nq3o&amp;amp;feature=player_embedded"&gt;www.youtube.com&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=tqh5SOUf21c:vZFdK72OxU8:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=tqh5SOUf21c:vZFdK72OxU8:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/tqh5SOUf21c" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2012/01/realtime-analytics-for-big-data-a-facebook-case-study.html</feedburner:origLink></entry>
    <entry>
        <title>2012 Cloud, PaaS, NoSQL Predictions</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/hFF-dEPai40/2012-cloud-paas-nosql-predictions.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/12/2012-cloud-paas-nosql-predictions.html" thr:count="2" thr:updated="2012-01-12T22:14:08+01:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef015393b2c3ee970b</id>
        <published>2011-12-13T08:47:45+01:00</published>
        <updated>2011-12-13T08:49:15+01:00</updated>
        <summary>2011 is coming to its end and now is a good time to start planning for 2012. I thought that a good start would be too look at my 2011 predictions and if my previous (and first) attempt to predict...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloudify" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Grid" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="NOSQL" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="PaaS" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef01675eb08bf5970b-pi" style="float: left;"&gt;&lt;img alt="1372170_green_and_blue cop2y" class="asset  asset-image at-xid-6a00d835457b7453ef01675eb08bf5970b" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef01675eb08bf5970b-120wi" style="margin: 0px 5px 5px 0px;" title="1372170_green_and_blue cop2y"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br&gt;2011 is coming to its end and now is a good time to start planning for 2012. I thought that a good start would be too look at &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2010/12/2011-cloud-paas-nosql-predictions.html"&gt;my 2011 predictions&lt;/a&gt; and if my previous (and first) attempt to predict someting in that turbulent environment held any water...so, here is a quick recap of 2011.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;span style="text-decoration: underline;"&gt;&lt;strong&gt;Recap of 2011&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Private vs. Public Cloud&lt;/strong&gt; - As I noted in my recent post &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/11/public-vs-priavate-clouds-again-its-not-about-the-cost.html"&gt;Public vs. Private Clouds&lt;/a&gt; I felt that during 2011 the debate around public vs. public cloud would become less interesting, as most of the industry has started to accept the fact that there is a need for both environments, and the important issue would become how to make them work well together. The most interesting development in that regard was Rackspace’s &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/11/public-vs-priavate-clouds-again-its-not-about-the-cost.html"&gt;recent announcement&lt;/a&gt; about their plan to support OpenStack based private clouds, which shows that even public cloud providers have fully embraced this idea.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;OpenStack&lt;/strong&gt; is evolving from a movement into a viable reality - the momentum around OpenStack has gone through ups and downs throughout the year as happens with every new technology. However looking back, it appears that 2011 was a fairly successful year for OpenStack with its first &lt;a href="http://www.internap.com/press-release/internap-announces-world%E2%80%99s-first-commercially-available-openstack-cloud-compute-service/"&gt;public cloud available already&lt;/a&gt; in the market.  &lt;a href="http://jbgeorge.net/2011/07/26/this-just-in-dell-announces-the-dell-openstack-cloud-solution/"&gt;Dell&lt;/a&gt; and &lt;a href="http://www.theregister.co.uk/2011/09/13/hewlett_packard_cloud_part_2/"&gt;HP&lt;/a&gt; have started to offer the OpenStack based cloud to their customers, as has Citrix.  Rackspace announced their plan to provide official support including for those who want to build their own OpenStack environment…that's quite big considering the short timeframe from when the technology was first introduced…still there is a long way to go but the future looks promising - check out &lt;a href="http://vmblog.com/archive/2011/11/30/73-of-survey-participants-considering-an-openstack-deployment-for-cost-savings-and-to-avoid-vendor-lock-in.aspx"&gt;this survey&lt;/a&gt; in that regard.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;PaaS&lt;/strong&gt; adoption has been happening at a slower pace than expected, despite the fact that the trend remains consistent.  For PaaS startups 2011 was a fairly significant year with the acquisition of Heroku by Salesforce.  Amazon Redhat and VMware joined the PaaS arena; Amazon with Elastic Beanstalk, Redhat with their OpenShift initiative, VMware with CloudFoundry, adding to its previous acquisition of SpringSource vFabric. This was a fairly significant year for us at GigaSpaces as we launched a new product in this same domain that aims to completely change the way PaaS is being taught today (stay tuned…).&lt;/p&gt;&#xD;
&lt;p&gt;Google App Engine have no doubt been the disappointment of the year by literally killing GAE as we knew it (amongst many other things) with their &lt;a href="http://highscalability.com/blog/2011/9/7/what-google-app-engine-price-changes-say-about-the-future-of.html"&gt;new pricing model&lt;/a&gt;. &lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Big Data&lt;/strong&gt; has gone real time.  Facebook made a big announcement on how they moved their batch-oriented analytics system to real time analytics (See my previous posts on this subject &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach-to-facebooks-new-realtime-analytics-system.html"&gt;here&lt;/a&gt; and &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach.html"&gt;here).&lt;/a&gt;  Twitter announced a launch of a new &lt;a href="http://www.readwriteweb.com/archives/twitter_to_launch_real-time_analytics_dashboard_so.php"&gt;Real Time Analytics dashboard&lt;/a&gt;; while both join Google and Yahoo who have already started to make this shift.  Google has also been transforming their &lt;a href="http://analytics.blogspot.com/2011/09/whats-happening-on-your-site-right-now.html?utm_source=mandatory&amp;amp;utm_medium=email&amp;amp;utm_campaign=v5default"&gt;web analytics framework into real time&lt;/a&gt;. As I noted in my 2011 predictions, the entire debate around NoSQL and SQL didn't make sense, and indeed we’ve seen quite a few announcements both from &lt;a href="http://www.datastax.com/dev/blog/what%E2%80%99s-new-in-cassandra-0-8-part-1-cql-the-cassandra-query-language"&gt;Cassandra&lt;/a&gt; and &lt;a href="http://gigaom.com/cloud/couchbase-2-0-unql-sql-nosql/"&gt;Couchbase&lt;/a&gt; on their support for SQL-like query support.&lt;/p&gt;&#xD;
&lt;p&gt;In Memory Data Grids have also taken a similar approach where, with GigaSpaces, we’ve launched our &lt;a href="http://www.gigaspaces.com/wiki/display/XAP8/Introduction+to+XAP+JPA"&gt;JPA support&lt;/a&gt;, other Data Grid implementations such as Infinispan and Gemfire seems to be heading in that same direction each adding different levels of SQL support. The interesting development in this regard is that we were able to prove that you could actually mix and match Document/Schemaless APIs with SQL APIs and have the flexibility to choose the right language for the job (See online demo &lt;a href="http://www.youtube.com/user/gigaspacestv#p/u/14/jC57mId3SMg"&gt;Same Data Any API&lt;/a&gt;).&lt;/p&gt;&#xD;
&lt;p&gt;All in all I think that I came fairly close - don't you think…?&lt;/p&gt;&#xD;
&lt;p&gt;Ok that gives me enough confidence to try the same thing for 2012. &lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;&lt;br&gt;&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;span style="text-decoration: underline;"&gt;&lt;strong&gt;2012 predictions&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Cloud&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;iCloud everywhere&lt;/strong&gt; - IMO the biggest shift in Cloud is the fact that it’s going to become pretty much invisible to many of the end users as new mobile devices, operating systems and applications start to be designed with cloud support in mind. Apple iCloud and DropBox mark the beginning of this trend. Using cloud for collaboration and synchronization is definitely a killer app for many of the consumer based apps. I expect that in 2012 we’re going to continue to see a big push of many SaaS-based offerings in that space toward rich client support that uses the cloud as a backend and leverages the power of the new generation of advanced mobile devices. The difference is that those clients won’t be just another frontend for the same web UI, but something that will run almost entirely on the mobile device and will use more generic cloud services for synchronization and collaboration. This will create the need for more generic cloud services such as database as a service and other middleware services that can interact directly from mobile applications.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Moving from Amazon-centric clouds to Cloud Mashups&lt;/strong&gt; – In 2011 we started to see new kinds of clouds starting to pop up. Literally every hardware vendor (IBM, Dell, HP,..), telco (ATT, Verizon, KT), and software provider (Oracle, Microsoft) are either developing or already offer something in this space. Each one tries to maintain a unique position to compete with Amazon either through SLAs, locality, security, or being more open through the support of OpenStack. In 2012, this movement is going to become even stronger as many of the players that have been making the investment during 2011 will come out full speed ahead in 2012.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Microsoft finally gets it with Azure&lt;/strong&gt; - Microsoft has been around for a while with Azure with somewhat marginal success mostly around its .NET user base, an approach that is too narrow a play when it comes to cloud.  Their cloud strategy is coming into focus with the offering of a more ubiquitous cloud supporting technologies that were previously unheard of on a MSFT cloud platform - such as Java, PHP and it wouldn't be too far to assume that they will be supporting Linux applications in the cloud as well. &lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Cost-driven Application Management&lt;/strong&gt; - One of the things that is still fairly hard to measure in the cloud is cost, and more specifically how each component of our application and architecture contribute to cost.  This is specifically true during current market conditions which are going to put even more pressure on cost savings. Cost-driven application design patterns will start to emerge, and will become an integral part of any design for cloud applications just as scalability and performance are today. A new form of Cost Driven Application Management (CDA) will start to emerge to provide better insight on how our application behaves from a cost analysis perspective - &lt;a href="http://www.typepad.com/site/blogs/6a00d835457b7453ef00d835457b7553ef/post/6a00d835457b7453ef015393b2c3ee970b/www.newvem.com"&gt;Newvem&lt;/a&gt; is a new startup in that space that already launched their private beta.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Mission Critical Apps move into the Cloud&lt;/strong&gt; - As the industry matures there is no reason why we should draw the line for cloud adoption at simple apps. The challenge will be mostly around performance, latency, and ensuring continuous availability. A new class of middleware and application platforms that are designed specifically for cloud environments will become more popular to help in that transition. On the other hand, Java and JEE specifically will finally become more cloud ready as I noted in an earlier post - &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/09/java-and-center-stage-meet-us-at-javaone-2011.html"&gt;Java and the Center Stage&lt;/a&gt;.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Network Gets into Cloud&lt;/strong&gt; &lt;strong&gt;API Stack &lt;/strong&gt;- While compute and storage have become virtualized to fit into the cloud, we haven't seen much advancement on the network layer. Many of the networking providers are now launching APIs to enable better control over the cloud network. Alcatel recently announced an interesting cloud proposition in this domain specifically targeting telcos.  The idea is to use the network as a vehicle for making distributed data centers look like one big cloud, making it possible to better leverage existing assets and offer SLA driven compute resources based on latency, location etc. Other cloud providers are also starting to open their network APIs starting from the Load Balancer down to the core switch. This opens up a new set of opportunities for integrating these network APIs with the upper layer of the application stack.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;More OpenStack Clouds&lt;/strong&gt; - 2011 was the just the beginning of that trend, 2012 will see more public and private cloud providers offering support for OpenStack APIs with RackSpace, Dell, and HP already making public announcements in this area. The interesting question in this regard would be how Citrix will play out their CloudStack acquisition with its OpenStack strategy. &lt;/p&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;PaaS&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;DevOps and PaaS Converge into App DevOps PaaS&lt;/strong&gt; - One of the topics that drew a lot of my personal interest last year was the DevOps movement. For odd reasons, most of that movement was driven by Ops and less by Devs.  In 2012, we’ll see many of the DevOps tools such as Chef and Puppet integrated into application platforms making it easier to deploy complex applications onto the cloud. In the same way, we’re going to see more Application Platforms adopting the automation and recipe model from the DevOps world into the application platform. The latter have the potential to transform the opinionated PaaS offerings as we know them today, with Heroku and GAE leading that trend, into a more open PaaS offering that better fits into the way users develop apps today and giving more freedom to choose your own stack, cloud, and application blueprint.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Beyond Google App Engine, Heroku&lt;/strong&gt; - Heroku established itself as a one of the early PaaS providers in the market and is now expanding their offering to Java. CloudFoundry, DotCloud and others are slightly different but still follow the same path.  In 2012 you should expect more choices for completely different PaaS platforms starting with JEE PaaS offerings from Redhat, IBM, and Oracle, to private PaaS offerings which essentially are frameworks to build your own PaaS, DevOps PaaS offerings (see note above), as well as vertical PaaS for specific industries. Magento announced their plan to provide &lt;a href="http://www.magentocommerce.com/blog/comments/magento-go/"&gt;PaaS for eComm&lt;/a&gt; and it wouldn’t be crazy to assume that others will follow that same path. &lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;BigData&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Not only Hadoop Centric&lt;/strong&gt; - During 2010 and to a lesser degree 2011 Big Data discussions were pretty much centered around Hadoop.  NoSQL solutions such as Cassandra and Mongo are gaining fast adoption mainly due to the operational and development complexity that comes with Hadoop. That movement is going to continue at an even greater pace, as Hadoop gets fragmented between many vendors and frameworks such as EMC, MapR, Cloudera, Yahoo, IBM each claiming to own their own Hadoop distro. With new funding in the hands of many of the NoSQL startups I’d expect to see more complete solution stacks targeted at Big Data.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;In Memory Data-Grid and NoSQL Become Integrated - &lt;/strong&gt;During the early days of the NoSQL movement it wasn't clear how the two technologies fit together. As I noted in my previous post &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach.html"&gt;here&lt;/a&gt;, it actually makes more sense to integrate the two technologies in a context of real time analytics for Big Data or real time data processing for Big Data.  Indeed during 2011 I started to see more case studies showing the use of the two technologies as with Facebook and Twitter. MemBase is also a good example for that approach with their &lt;a href="http://www.readwriteweb.com/cloud/2011/02/nosql-consolidation-couchone-a.php"&gt;announcment&lt;/a&gt; earlier this year about their integration of Memcached and CouchDB together into a single product. At GigaSpaces we added built-in integration for &lt;a href="http://www.gigaspaces.com/wiki/display/SBP/Cassandra+Mirror+Service"&gt;Cassandra&lt;/a&gt; and MongoDB, as noted here, and plan to invest more in that direction during 2012.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;A New Class of Big Data Application Platforms will Address the Development and Operational Complexity of Big Data Applications&lt;/strong&gt; - As Big Data application become more mainstream we start to hit the next level of complexity, development, and operational complexity. Clearly plugging NoSQL into your architecture may address your scalability requirements but at the same time it’s going to make your development and management experience more complex.  Not because the products themselves are complex, but mostly it’s because it is less obvious how to build and design the application around these new technologies. As in previous years, the goal of application platforms is to ease that task by putting together an integrated stack that makes it easier to develop Big Data applications as I noted in my post on &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/09/big-data-application-platform.html"&gt;Big Data Application Platforms&lt;/a&gt;.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;a href="http://cloudcomputing.sys-con.com/node/2040343"&gt;The Future of Cloud Computing: Industry Predictions for 2012 | Cloud Computing Journal&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://www.readwriteweb.com/cloud/2011/02/nosql-consolidation-couchone-a.php"&gt;NoSQL Consolidation: CouchOne and Membase Merge to Form Couchbase&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://gevaperry.typepad.com/main/2011/10/the-future-of-clouds.html"&gt;10 Predictions About Cloud Computing&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2010/12/2011-cloud-paas-nosql-predictions.html"&gt;2011 Cloud, PaaS, NoSQL Predictions&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/11/public-vs-priavate-clouds-again-its-not-about-the-cost.html"&gt;Public vs Private clouds (Again!)- it's not about the cost&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://www.magentocommerce.com/blog/comments/magento-go/"&gt;Introducing Magento Go and the Magento Go Developer Platform. The Next Evolution in eCommerce&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach.html"&gt;Real Time Analytics for Big Data: An Alternative Approach&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/09/big-data-application-platform.html"&gt;Big Data Application Platform&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/09/java-and-center-stage-meet-us-at-javaone-2011.html"&gt;Java and the Center Stage&lt;/a&gt;.&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://www.youtube.com/user/gigaspacestv#p/u/14/jC57mId3SMg"&gt;Same Data Any API&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;/ul&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=hFF-dEPai40:PMn5UfsPN4s:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=hFF-dEPai40:PMn5UfsPN4s:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/hFF-dEPai40" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/12/2012-cloud-paas-nosql-predictions.html</feedburner:origLink></entry>
    <entry>
        <title>Making Cloud Portability a Practical Reality</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/tu4yB2UI2AM/making-cloud-portability-a-practical-reality.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/12/making-cloud-portability-a-practical-reality.html" thr:count="2" thr:updated="2012-01-01T23:03:02+01:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef0162fd8baeef970d</id>
        <published>2011-12-08T21:49:35+01:00</published>
        <updated>2011-12-09T05:06:58+01:00</updated>
        <summary>In one of my previous posts Five Misconceptions On Cloud Portability I argued that: The term "cloud portability" is often considered a synonym for "Cloud API portability," which implies a series of misconceptions. If we break away from dogma, we...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloudify" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="PaaS" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;In one of my previous posts &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/11/five-misconceptions-on-cloud-portability.html"&gt; Five Misconceptions On Cloud Portability&lt;/a&gt; I argued that:&lt;/p&gt;&#xD;
&lt;p style="padding-left: 30px;"&gt;The term "cloud portability" is often considered a synonym for "Cloud API portability," which implies a series of misconceptions. If we break away from dogma, we can find that what we really looking for in cloud portability is Application portability between clouds which can be a vastly simpler requirement, as we can achieve application portability without settling on a common Cloud API. ..&lt;/p&gt;&#xD;
&lt;p&gt;The following presentation shows how I could use the ideas from this post and provide a practical cloud portability solution today using &lt;a href="http://www.gigaspaces.com/cloudify"&gt;Cloudify&lt;/a&gt; and JClouds. &lt;/p&gt;&#xD;
&lt;p style="text-align: center;"&gt; &lt;/p&gt;&#xD;
&lt;p style="text-align: center;"&gt; &lt;iframe allowfullscreen="allowfullscreen" frameborder="0" height="344" src="http://www.youtube.com/embed/fqksVT8M7mE?fs=1&amp;amp;feature=oembed" width="459"&gt;&lt;/iframe&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;small&gt;via &lt;a href="http://www.youtube.com/watch?v=fqksVT8M7mE&amp;amp;feature=digest_wed"&gt;www.youtube.com&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=tu4yB2UI2AM:cpvAKnvrRKQ:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=tu4yB2UI2AM:cpvAKnvrRKQ:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/tu4yB2UI2AM" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/12/making-cloud-portability-a-practical-reality.html</feedburner:origLink></entry>
    <entry>
        <title>Public vs Private clouds (Again!)- it's not about the cost </title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/xa8zabJSDTc/public-vs-priavate-clouds-again-its-not-about-the-cost.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/11/public-vs-priavate-clouds-again-its-not-about-the-cost.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef0162fcbb5cf6970d</id>
        <published>2011-11-22T18:50:26+01:00</published>
        <updated>2011-11-22T21:06:49+01:00</updated>
        <summary>I was reading Geva Perry's post, 10 Predictions about Cloud Computing .. ﻿ As always I enjoyed reading Geva's throughts and for the most part comletely agree with him. The thing that caught my attention is Geva's point on private...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;I was reading Geva Perry's post, &lt;a href="http://gevaperry.typepad.com/main/2011/10/the-future-of-clouds.html"&gt;10 Predictions about Cloud Computing&lt;/a&gt;  ..&lt;/p&gt;&#xD;
&lt;p&gt; ﻿ &lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef0154373a43a0970c-pi" style="display: inline;"&gt; &lt;/a&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef0154373a4421970c-pi" style="display: inline;"&gt; &lt;/a&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef01539366af2b970b-pi" style="float: left;"&gt;&lt;/a&gt; &lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef0162fcbc21e8970d-pi" style="float: left;"&gt; &lt;/a&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef01539366c8b4970b-pi" style="float: left;"&gt;&lt;img alt="PublicVSPrivate" class="asset  asset-image at-xid-6a00d835457b7453ef01539366c8b4970b" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef01539366c8b4970b-320wi" style="margin: 0px 5px 5px 0px;" title="PublicVSPrivate"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;p&gt;As always I enjoyed reading Geva's throughts and for the most part comletely agree with him. The thing that caught my attention is Geva's point on private vs. public cloud. Geva re-iterated why the economical benefit of public cloud would turn private cloud into a small niche. Quoting Geva:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;"Internal clouds will be niche. In the long-run, Internal Clouds (clouds operated in a company's own data centers, aka "private clouds") don't make sense. The economies of scale, specialization (an aspect of economies of scale, really) and outsourcing  benefits of public clouds are so overwhelming that it will not make sense for any one company to operate its own data centers."&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;Last week I had an interesting discussion with &lt;a href="http://www.linkedin.com/profile/view?id=916527&amp;amp;authType=NAME_SEARCH&amp;amp;authToken=gAWT&amp;amp;locale=en_US&amp;amp;srchid=67840c87-6d9d-4da9-9c67-35bcff14f076-0&amp;amp;srchindex=1&amp;amp;srchtotal=1&amp;amp;goback=%2Efps_PBCK_Yoav+Abrahami_*1_*1_*1_*1_*1_*1_*2_*1_Y_*1_*1_*1_false_1_R_*1_*51_*1_*51_true_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2&amp;amp;pvs=ps&amp;amp;trk=pp_profile_name_link" target="_self"&gt;Yoav Abrahami&lt;/a&gt; from Wix which made me rethink this entire argument. Wix runs their own managed data center on one of the hosters. They were able to build a pretty agile environment, which enables them to update their software a few times a day and in this way continually grow their business in this model. If they want extra capacity it takes them a couple of hours to get the hardware that they need. They pay a monthly fee and they don't need to commit to three years in advance for thier resources.  The example of Wix is interesting as it doesn't fall exactly into the public cloud definition, yet they don't carry large part of a cost associated with  the operation of a data center which makes the economical argument somewhat less relevant.&lt;/p&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;p&gt;This brings me to the first point:&lt;/p&gt;&#xD;
&lt;p&gt;&lt;span style="color: #0060bf;"&gt;&lt;strong&gt;What makes a cloud  private or public?&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p&gt;For most people, public cloud is synonymous with Amazon or Rackspace and private cloud is everything else.  Which basically means that if you don't pay by the hour and can spawn x1000 of machines by a call of an API your not considered public cloud - is that really the right place to draw the line?&lt;/p&gt;&#xD;
&lt;p&gt;I beg to differ - the main thing that brought us to cloud in the first palce is agility at a reasonble cost.  Agility in that context is our abiilty to launch new products, new features, and meet a growing demand continuously. Previously, it was the IT who stood in the way of the business and  Cloud (public &amp;amp; private) emerged as a significantly better way to run IT. Wix is one good example of what some would consider a private cloud, but at the same time they were able to serve the agility and cost requirements of  thier business.  The thought that the only way to meet our business agility and cost would be through public cloud is wrong and there are many examples like Wix that prove otherwise. This brings me to the next point...&lt;/p&gt;&#xD;
&lt;p&gt;&lt;span style="color: #0060bf;"&gt;&lt;strong&gt;Its not about cost but ownership, and privacy.&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p&gt;The point in my previous argument is that the cost of running your own managed data center today can be fairly low and even lower than running on some of the known public clouds because you can optimize for a specific use case and workload. BeaTune (a referece from my previous post) is another good example in this regard. They claim that since they switched to HQ8 they were able to cut their infrastructure costs in half. Here are the exact numbers:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Amazon: “two servers $380/month. Add roughly $220/month for backup snapshots and EBS for the (pretty large) database and your total yearly costs are $,7200”&lt;br&gt;&lt;br&gt;EQ8: “two 24GB RAM, i7-920 Quad-Core, 2x1.5TB HDD (Software-RAID 1) servers for $3,530/year at a hosting company that you can actually call&lt;br&gt;&lt;br&gt;Source: &lt;a href="http://blog.beatunes.com/2011/07/goodbye-amazon-ec2-see-you-later-cloud.html"&gt;Goodbye Amazon EC2 See you later cloud&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef0154373c0b89970c-pi" style="float: right;"&gt;&lt;img alt="Buy-or-lease-a-car" class="asset  asset-image at-xid-6a00d835457b7453ef0154373c0b89970c" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef0154373c0b89970c-120wi" style="margin: 0px 0px 5px 5px;" title="Buy-or-lease-a-car"&gt;&lt;/img&gt;&lt;/a&gt;The explanation that I have for that, is that with public clouds you're limited in the type of compromises that you can make, and therefore carry a fairly large cost just to be able to live to the promise of meeting huge peak loads, a high degree of failures, and more. If you design for a specific work load you could come up with different structure that fits a potentially more relaxed set of assumptions that could fit your business better and can result in a lower cost and even better performance and SLA. The other part of the explanation is that there is more than one mean to gain from the economy of scale even in the case of private clouds such as leasing the resources of our infrastrcture and using hosted data centers. A good analogy to that is leasing versus renting a car. If we run a fleet we don't have to buy the cars of the entire fleet, instead, we could lease those cars from a third party provider which uses the economy of scale to manage the cost and risk through his entire customer base.  By leasing the car we still maintain pretty much the same degree of control as we would if we would own the car ourselves, but we have a more economimcal model to do so. &lt;/p&gt;&#xD;
&lt;p&gt;This brings me to the main point of this post. The important difference between private and public cloud is less about the cost but more to do with ownership and flexibility. I'll use another transportation analogy to illustrate my point.  The cost of using public transportation is significantly lower than owning our own private car. If I only use the cost argument I could easily argue that owning a private car doesn't make any economical sense, still most of us wouldn't give up our private car, not because it is less expensive, but because we want the flexibility, and to control when and where we want to go, and are willing to pay extra to accommodate these needs.  Pubic cloud in this example is analogous to public transportation; the fact that you can share the same infrastructure with many users enables the sharing of costs with all the users, and thus the cost per user is significantly reduced, while at the same time it means that you agree to compromise on some sort of a least common denominator which is also an attribute of that cost benefit.&lt;/p&gt;&#xD;
&lt;p&gt;If we agree on this observation, then its clear that niether private nor public clouds would ever go away but instead we'll be moving away from the old way of running data centers into a more modern data center, a hybrid data ceneter that can serve the need for control and cost. In both cases, controlling the cooling, real estate, and electiricity doesn't makes any sense, and indeed all that will be outsourced completely if that's not already the case. The main thing that will make private cloud different is who controls where and how our data is maintained, where and how our services will run, what type of infrastrcture (compute, network, storage) we choose to run our business, and most importantly how much we're willing to share that environment with other users to reduce the cost. Even with that, it's hard to draw  a clear line between private and public cloud, as it may very well be the case that what's right in a particular case is a mashup between the two. For example, imagine the case where we will keep our data in a private cloud but run the compute nodes on a public cloud, is that a private, public cloud...who cares?&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Final note&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;Many use the term private cloud to point to the old (and pretty clueless) way of running a data center. I argued that private cloud, as the name suggests, represents a business need to maintain high degree of privacy and control of its data and resources by ensuring that they are not shared with others. The key is the sharing aspect. The resources may very well be hosted somwhere else but the key is that they remain private. These needs will never cease to exist in the same way that we have public and private transportation, or the option to lease and rent a car, and will continue to have.  The old way of running data centers must change and is changing to meet the cost and agility demand of the business, and is adopting many of the features of public cloud. With that the cost margin differences between private and public clouds becomes less signficant and for some scenarios has proven to be even cheaper (assuming that you can design for a particular work load and use case) in a private cloud sceanrio. All this makes the cost argument even less significant a factor for choosing between the two models.&lt;/p&gt;&#xD;
&lt;p&gt;Public cloud providers would also need to adapt to meet the demand of private cloud. Rackspace recently announced a &lt;a href="http://www.readwriteweb.com/cloud/2011/11/is-rackspace-ready-to-support.php"&gt;new offering&lt;/a&gt; in that space around OpenStack.&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Rackspace's announcement is another sign that any player that wants to compete in the IaaS market needs to offer some way to have a private and public cloud. Any offering with only half that is going to be left wanting.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;In addition to that, Rackspace is conneting their hosting business which some could view as another form of private cloud with their public cloud offering which will give thier customers an even greater degree of flexability. Amazon already provides for a while now the Virtual Private Cloud (VPC) service which enables them to offer resources from the public cloud as just another node in the customer's private cloud. Amazon VPC doesn't provide the full degree of privacy and isolation of their network and resources, but is definitely another form of connecting private with public cloud offerings, and an indication of how Amazon has adapted to this demand. There are still many questions to be answered and challenges ahead, namely how to combine the two models well both technically and echonomically, but the main point is that the entire debate around private and public cloud, who's going to win should become moot. It's now time to focus our energy on how we combine the two.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;a href="http://gevaperry.typepad.com/main/2011/10/the-future-of-clouds.html"&gt;10 Predictions about Cloud Computing&lt;/a&gt; &lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://blog.beatunes.com/2011/07/goodbye-amazon-ec2-see-you-later-cloud.html"&gt;Goodbye Amazon EC2 See you later cloud&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://www.readwriteweb.com/cloud/2011/11/is-rackspace-ready-to-support.php"&gt;Is Rackspace Ready to Support Private Clouds?&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;Amazon VPC&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p style="color: #008; text-align: right;"&gt;&lt;small&gt;&lt;em&gt;Powered by&lt;/em&gt; &lt;a href="http://www.qumana.com/"&gt;Qumana&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=xa8zabJSDTc:MqQXJmI-kMI:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=xa8zabJSDTc:MqQXJmI-kMI:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/xa8zabJSDTc" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/11/public-vs-priavate-clouds-again-its-not-about-the-cost.html</feedburner:origLink></entry>
    <entry>
        <title>Five Misconceptions on Cloud Portability </title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/tQAPAXu8K2U/five-misconceptions-on-cloud-portability.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/11/five-misconceptions-on-cloud-portability.html" thr:count="2" thr:updated="2011-12-05T00:06:04+01:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef015436e8c695970c</id>
        <published>2011-11-15T18:30:07+01:00</published>
        <updated>2011-12-05T00:04:09+01:00</updated>
        <summary>Three years ago when we started working on the first generation of our PaaS offering, cloud portability seemed to be pretty much “mission impossible.” At the time, we made a conscious decision to focus only on Amazon for our first...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="JClouds" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="OpenStack" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="PaaS" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Rackspace" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Three years ago when we started working on the first generation of our PaaS offering, cloud portability seemed to be pretty much “mission impossible.” At the time, we made a conscious decision to focus only on Amazon for our first generation PaaS as practically it was the only cloud in town.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Now, there are many different public and private cloud offerings: platforms like GoGrid, VMWare, Citrix/Xen, and Cisco UCS, with recent additions being OpenStack, Cloud.com, and Microsoft Azure. Frameworks like JClouds have been developed to make cloud portability an easier goal to reach.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;As a result, cloud portability is not only a possibility, but it's easily done with the right constraints in mind.&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Having said that, there are still too many options from the standpoints of API standardization, portable virtual machines, abstraction frameworks, orchestration frameworks, etc. None of them fully address the challenge and therefore a solution has to combine these options into one cohesive unit. Finding the right combination of features is a pretty tricky challenge and involves lots of trial and error, as the chances that you’ll pick the wrong ones are still pretty high, as we experienced ourselves.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;In this and the next few posts, I wanted to share some of our experiences when developing &lt;span style="color: #0021e0;"&gt;&lt;a href="http://www.gigaspaces.com/cloudify"&gt;Cloudify&lt;/a&gt;&lt;/span&gt; , our unified cloud deployment/management tool. I will start by covering five common misconceptions people have.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;h2&gt;Five Common Misconceptions on Cloud Portability&lt;/h2&gt;&#xD;
&lt;h3&gt;1. Cloud portability = Cloud API portability&lt;/h3&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;This is perhaps the most important observation in this entire discussion. When people think of Cloud portability, the immediate thing that comes to mind is Cloud API portability - a standard API, or a common abstraction, that maps all the different API’s into one common façade, right? Well... wrong.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;What you're really looking for is the ability to run your application on different cloud platforms without any code change.  Having a common or standard API is one way to achieve that goal but not necessarily the most practical one, given the speed in which cloud APIs evolve. This brings me to the first observation: &lt;strong&gt;Application Portability != Cloud API portability&lt;/strong&gt;. Let me explain:&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt; &lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef015393158083970b-pi" style="display: inline;"&gt;&lt;img alt="Screen Shot 2011-11-15 at 3.50.05 AM" border="0" class="asset  asset-image at-xid-6a00d835457b7453ef015393158083970b image-full" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef015393158083970b-800wi" title="Screen Shot 2011-11-15 at 3.50.05 AM"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br&gt;&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Most applications don’t need to interact with the cloud-specific API from the application itself.  Most of the cloud API deals with stuff that happens outside of the application code, like starting new virtual machines or services, or providing elasticity. In many cases, systems management tools are the primary specific aspect for a given cloud platform; most provide support for managed middleware like MySQL (RDMS), Tomcat, memcached, or even map/reduce (through Hadoop, for example.) The mechanisms for using these are standard and common, but the APIs for managing the services are not. (There are exceptions, such as SimpleDB and SQS, which do use proprietary and localized APIs. I believe that over time, you'll see less and less of this as there will be enough nonproprietary options to restrict their use.)&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;The problem of dealing with application portability between clouds is vastly different than the problem of dealing with API portability. API portability is easy; cloud API portability is not.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;I’ll spend more time on how we can use this realization to simplify the task of achieving cloud portability in a follow-up post.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;h3&gt;2. The main incentive for Cloud Portability is - Avoiding Vendor lock-in&lt;/h3&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Vendor lock-in is one of the major barriers to cloud adoption as indicated in a recent survey by 451, GigaOM and North Bridge Venture - &lt;a href="http://blogs.the451group.com/opensource/2011/06/22/future-of-cloud-survey-shows-significance-of-open-source/"&gt;Future of cloud survey shows significance of open source&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;&lt;span style="font-family: Arial;"&gt;Interoperability and vendor lock-in were ranked the next most significant challenges, both with 25% of response. Though we saw vendor lock-in fade a bit as a concern for customers two to three years ago, it has risen again as a major issue in cloud computing. We believe this is in part because of the early nature of cloud computing and a desire from users to avoid getting stuck with a cloud vendor, framework or technology.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Many (myself included) view cloud portability as a mean to address vendor lock-in as the main drive for Cloud portability. If we can run our application on any cloud without any code change, we can in fact avoid vendor lock-in. &lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;This brings me to the second realization. &lt;strong&gt;Cloud portability is more about business agility than it is about vendor lock-in&lt;/strong&gt;. Let me explain:&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;It is not very likely that you would switch entirely between one cloud provider to the other, at least not frequently. If you are going to change cloud providers, its going to be more of a one-off event than something that you would practice on a regular basis. &lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;On the other hand, it is probably more likely that you would use cloud portability to &lt;strong&gt;choose the right cloud for the job&lt;/strong&gt;. A common example would be to run your test on the cloud and production on your own private cloud. Another example would be to run your demos and trials on the “cheapest” cloud and production with a cloud provider that offers a higher service level. A more advanced scenario would be to use  cloud portability for cloud bursting.  &lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;In other words, cloud portability is about business flexibility. It gives us more choices between cost, SLA, and security tradeoffs between the different cloud offerings and product lines. Portability between our local data center and the cloud would also help in making the transition to the cloud more smooth.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;h3&gt;3. Cloud portability isn’t for startups&lt;/h3&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;If you're a startup, when the discussion on cloud portability comes up the immediate reaction is often “hmm.. interesting but not for me.” Being  part of a startup company myself, I think that I can relate to that reaction.  As I mentioned before, when I faced the choice for dealing with cloud portability a few years back, cloud portability was one of the first items that I took off of my list of TODOs to meet our time-to-market goal, and settled for Amazon.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Recently, I came across different cases that shows that dealing with cloud portability is often forced on us even when we don’t plan for it.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;The first case is with &lt;a href="http://mixpanel.com/" target="_self"&gt;MixPanel&lt;/a&gt;, a web analytics company who switched from RackSpace to Amazon:&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;When MixPanel started up, cost mattered as they were running out of their own pockets. In this case, every dollar counted and therefore the right choice at the time was Slicehost.  As their business grew, Mixpanel switched from Slicehost to Linode and later to Rackspace due to a Ycombinator deal. As their business grew, they switched to Amazon who happened to provide more features and greater flexibility .&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="margin-left: 36.0pt; mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;First was &lt;strong&gt;Slicehost&lt;/strong&gt; back when everything was on a single 256MB instance.. Second was &lt;strong&gt;Linode&lt;/strong&gt; because it was cheaper (money mattered to me at that point). Lastly, we moved over to the &lt;strong&gt;Rackspace&lt;/strong&gt; Cloud because they cut a deal with Ycombinator... Even with all the lock in we have with Rackspace (we have 50+ boxes)..it’s really not about the money but about the features and the product offering, here’s why we’re moving”&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="margin-left: 36.0pt; mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;source: &lt;a href="http://code.mixpanel.com/2010/11/08/amazon-vs-rackspace/"&gt;Amazon vs Rackspace&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;The second case is &lt;a href="http://www.beatunes.com/" target="_self"&gt;BeaTunes&lt;/a&gt; that switched from Amazon to EQ8 Hetzner&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;In the case of BeaTunes, the trigger for switching off of Amazon was an &lt;a href="http://aws.amazon.com/ebs/" target="_self"&gt;Elastic Block Store&lt;/a&gt; (EBS) crash. During that time, I had to rely on basic support that Amazon provided. Given that BeaTunes is a small startup, they had almost no leverage to escalate their issue fast enough for the business need and had to hope that their particular issue would hit enough users in the forum to attract Amazon's support staff's attention. &lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;They ended up choosing EQ8. Being a smaller hoster than Amazon made EQ8 a better fit for BeaTunes, a small startup itself. Even a small startup can run mission critical software, and when there is a failure, it is even more important to get a timely response as your entire future may rely on this. Unlike big companies, you often don’t have the means to survive such failures easily. It also happened that by switching to EQ8 BeaTunes could choose better hardware configuration that fit their needs and at half the price of the original services!&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="margin-left: 36.0pt; mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Amazon: “two servers $380/month. Add roughly $220/month for backup snapshots and EBS for the (pretty large) database and your total &lt;strong&gt;yearly costs are $,7200&lt;/strong&gt;”&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="margin-left: 36.0pt; mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;EQ8: “two 24GB RAM, i7-920 Quad-Core, 2x1.5TB HDD (Software-RAID 1) servers for &lt;strong&gt;$3,530/year&lt;/strong&gt; at a hosting company that you can actually call&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="margin-left: 36.0pt; mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Source: &lt;a href="http://blog.beatunes.com/2011/07/goodbye-amazon-ec2-see-you-later-cloud.html"&gt;Goodbye Amazon EC2 See you later cloud&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;These are only two samples, representative of a much bigger trend in the industry at large.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;The interesting thing in both MixPanel's and BeaTunes' histories is that their choice of cloud changed over time due to changes in their maturity as a company, as well as changes in the business requirements which involve cost, support level, flexibility and feature set, etc. At each point in time, the &lt;strong&gt;right cloud&lt;/strong&gt; happened to be a different cloud. &lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Personally, I found that issue around supportability in the case of BeaTunes quite interesting as quite often we tend to go for the brand thinking that it’s the “safest bet” where perhaps the right choice would be to choose the one that fits our size (and stage).&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;The main point in both examples is that even during a relatively short period of time, the two startups found themselves switching from one cloud to another. In the case of MixPanel, this happened four times during their company lifetime.  &lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;The thought that you can avoid such a move between cloud platforms is likely to be too optimistic, even if you’re a startup. Given that there are more options to deal with cloud portability today, and the effort is not as a great as was the case few years back, I would encourage every startup that is expecting rapid growth to re-examine their deployments and plan for cloud portability rather than wait to be forced to make the switch when you are least prepared to do so.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;h4&gt;&lt;span style="font-family: Arial;"&gt;4. Cloud portability = Compromising on the least common denominator &lt;/span&gt;&lt;/h4&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;By definition, standards tend to lag behind implementations. Standards are therefore a compromise on the least common denominator. In a dynamic environment, such as the one we're experiencing with cloud today, compromising on the least common denominator is a choice which may come with great cost on “reinventing” things that are already available outside of the standard or common API.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;If we think of application portability, we don’t need to compromise on the least common denominator as most of the interaction with the cloud API happens outside of our application code anyway, to handle things like provisioning, setup, installation, scaling, monitoring, etc.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;There are orchestration tools such as &lt;a href="http://www.opscode.com/chef/" target="_self"&gt;Chef&lt;/a&gt; and &lt;a href="http://puppetlabs.com/" target="_self"&gt;Puppet&lt;/a&gt; that can provide higher levels of abstraction for automating those processes between clouds without relying on a common cloud API. Frameworks like &lt;a href="http://www.jclouds.org/" target="_self"&gt;JClouds&lt;/a&gt; provide common abstractions to the common set of APIs (such as Compute and Storage APIs) while still allowing the application to interact with the underlying cloud-specific APIs and thus minimizing the areas of differences between clouds that would require specific handling.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;h4&gt;&lt;span style="font-family: Arial;"&gt;5. The effort for achieving cloud portability far exceed the value&lt;/span&gt;&lt;/h4&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Many would argue that cloud portability comes with a cost and rightly so. Indeed, there is no such thing as free lunch, and cloud portability is no exception.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;However, the cost isn’t as great as many of us think, especially now that there are more tools and frameworks available.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;The effort to achieve cloud portability is far less than it used to be, in most cases, making it a greater and more valuable priority (with less investment) than it used to be.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;h2&gt;Final notes&lt;/h2&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;The term "cloud portability" is often considered a synonym for "Cloud &lt;strong&gt;API&lt;/strong&gt; portability," which implies a series of misconceptions.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;If we break away from dogma, we can find that what we really looking for in cloud portability is Application portability between clouds which can be a vastly simpler requirement, as we can achieve application portability without settling on a common Cloud API.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;As in the case of MixPanel and BeaTune, choosing the right cloud for the job may vary over time. The right cloud when we create a new startup can be different than when we grow, and if we're very successful, we may find out that managing our own cloud is the right choice. &lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;If we focus only on what’s needed to ensure application portability between clouds, we may find that cloud portability can be easier than it seems at first glance. If done correctly, it can result in greater flexibility for our businesses:&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Choose the right cloud for the Job&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;li&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Reduce vendor lock-in&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;li&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Enable advanced deployments such as a hybrid cloud, and cloudbursting&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p class="MsoNormal" style="mso-pagination: none; mso-layout-grid-align: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;In the next post I’ll touch on what it takes to turn cloud portability into a practical reality.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p style="color: #000088; text-align: left;"&gt;&lt;strong&gt;&lt;span style="color: black; font-family: Arial;"&gt;References&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;span style="font-family: Arial;"&gt;&lt;a href="http://blogs.the451group.com/opensource/2011/06/22/future-of-cloud-survey-shows-significance-of-open-source/"&gt;Future of cloud survey shows significance of open source&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="font-family: Arial;"&gt;&lt;a href="http://blog.beatunes.com/2011/07/goodbye-amazon-ec2-see-you-later-cloud.html"&gt;Goodbye Amazon EC2 See you later cloud&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="font-family: Arial;"&gt;&lt;a href="http://code.mixpanel.com/2010/11/08/amazon-vs-rackspace/"&gt;Amazon vs Rackspace&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://www.gigaspaces.com/cloudify" target="_self"&gt;Cloudify - cloud application management&lt;/a&gt; &lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://www.jclouds.org" target="_self"&gt;JClouds.org - cloud portability framework&lt;/a&gt; &lt;/li&gt;&#xD;
&lt;/ul&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=tQAPAXu8K2U:lINXGvixn0w:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=tQAPAXu8K2U:lINXGvixn0w:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/tQAPAXu8K2U" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/11/five-misconceptions-on-cloud-portability.html</feedburner:origLink></entry>
    <entry>
        <title>Behind the Curtains of Cloudify, GigaSpaces' new product</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/x9wx0wx_tq0/behind-the-curtains-of-the-new-gigaspaces-cloudify-product-1.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/10/behind-the-curtains-of-the-new-gigaspaces-cloudify-product-1.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef015436569bd1970c</id>
        <published>2011-10-24T14:46:39+02:00</published>
        <updated>2011-10-25T23:21:06+02:00</updated>
        <summary>Announcing a new product is a major and exciting milestone in a company's lifecycle. Our last announcment was the launch of XAP a few years ago - the Xtreme Application Platform aimed to provide a complete end to end scalability...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="JClouds" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="PaaS" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">
&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;Announcing a new product is a major and exciting milestone in a company's lifecycle. Our last announcment was the launch of &lt;a href="http://www.gigaspaces.com/xap" target="_self"&gt;XAP&lt;/a&gt;&amp;nbsp;a few years ago - the Xtreme Application Platform aimed to provide a complete end to end scalability solution through the entire application stack, meaning Data, Web, Messaging, everything.&lt;/p&gt;
&lt;p&gt;Today is one of those exciting days, as we are announcing a new product, &lt;a href="http://www.gigaspaces.com/cloudify" target="_self"&gt;Cloudify&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As its name suggests, Cloudify is primarily targeted at making the processes of bringing new and existing application to the Cloud extremely simple, to the point that it would require no code or architecture changes to deploy on any distributed platform.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;In this post, I would like to share some of the motivations that led us to come out with Cloudify and provide a bit of an insight into &lt;a href="http://blog.gigaspaces.com/2011/07/06/cloudify-for-azure-on-board-enterprise-java-apps-to-the-azure-in-a-snap/" target="_self"&gt;how Cloudify is being used already&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The motivation behind Cloudify&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The first thing that led us to Cloudify was the need to&amp;nbsp;&lt;strong&gt;reduce the barrier of entry&lt;/strong&gt; for applications that want to leverage the cost effectiveness and agility of the cloud, but are not yet ready to make the investment in re-writing their existing application, or switch to a new platform for that purpose.&lt;/p&gt;
&lt;p&gt;In many situations, it was apparent that reducing the time to market in which those applications can offer through the cloud was a much bigger concern than making the application completely elastic. There are also those who are launching new applications but are not yet ready to make the switch to the cloud.&lt;/p&gt;
&lt;p&gt;These organizations are looking for ways in which they can "future proof" their applications so that when they &lt;em&gt;are&lt;/em&gt;&amp;nbsp;ready to make the switch to cloud-based deployment, they can migrate with a simple configuration change. ISVs in particular, need this flexibility so that they can continue to serve their existing deployments while at the same time providing new cloud-enabled product offerings without the overhead of maintaining two completely&amp;nbsp;separate product lines for that purpose and without disrupting thier current business.&lt;/p&gt;
&lt;p&gt;The second motivation was around &lt;strong&gt;openness,&amp;nbsp;&lt;/strong&gt;and &lt;strong&gt;reducing the lock-in concerns&lt;/strong&gt;.&amp;nbsp;&amp;nbsp;Too many cloud solutions and platforms introduce a high degree of lock-in, starting from the infrastucture, through the API and specific language and stack that is forced upon users.&lt;/p&gt;
&lt;p&gt;Big players such as Oracle and VMware are starting to offer their own data center infrastructure and tie this even further into their platforms, leaving their customers with an&amp;nbsp;even greater lock-in risk. As Oracle and VMware expand their stacks, more and more of their ecosystem partners are finding themselves in direct competition with little option of fighting back.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;As many data centers become cloud-enabled through private cloud initiatives, it becomes more apparent that the right way to leverage cloud infrastrcture is through some sort of hybrid cloud. A common use case for mission-critical applications would be to run testing on the public cloud, for example, and the production application on a&amp;nbsp;private cloud. Another use case could be to use the private cloud for running steady workloads and burst to the public cloud during peak loads.&lt;/p&gt;
&lt;p&gt;Today, however, the private and public clouds look fairly different. Moving an application from one cloud environment to the other is still a fairly complex process. That brings me to third motivation: the need to &lt;strong&gt;provide a consistent cloud story&lt;/strong&gt; behind both the private and public cloud.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cloudify to the rescue&lt;/strong&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;The heart of Cloudify relies on two main parts: a recipe model, which is a Groovy based DSL (Domain Specific Language) specifically designed to provide cloud semantics to any application, and an orchestration engine that is responsible for interpeting this recipe and executing it.&lt;/p&gt;
&lt;p&gt;With these two components, we are able to &lt;strong&gt;reduce the barrier of entry &lt;/strong&gt;for Cloudifying an application to the point where &lt;strong&gt;all you need is a simple start and stop script&lt;/strong&gt;! You can automate the deployment of your application on any cloud, include monitoring, and even add more advanced capabilities such as elasticity and continuous availability. More advanced monitoring and SLA management can be added easily at any stage through a set of plug-ins. Below is a snippet&amp;nbsp; of this recipe:&lt;/p&gt;
&lt;p&gt;&lt;a style="display: inline;" href="http://natishalom.typepad.com/.a/6a00d835457b7453ef0153928c31fd970b-pi"&gt;&lt;/a&gt;&lt;a style="display: inline;" href="http://natishalom.typepad.com/.a/6a00d835457b7453ef0153928c3277970b-pi"&gt;&lt;img class="asset  asset-image at-xid-6a00d835457b7453ef0153928c3277970b" title="Cloudify-Recipe-1" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef0153928c3277970b-320wi" alt="Cloudify-Recipe-1" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a style="display: inline;" href="http://natishalom.typepad.com/.a/6a00d835457b7453ef0153928c32f1970b-pi"&gt;&lt;img class="asset  asset-image at-xid-6a00d835457b7453ef0153928c32f1970b" title="Cloudify-Recipe-2" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef0153928c32f1970b-320wi" alt="Cloudify-Recipe-2" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a style="display: inline;" href="http://natishalom.typepad.com/.a/6a00d835457b7453ef0153928c3331970b-pi"&gt;&lt;img class="asset  asset-image at-xid-6a00d835457b7453ef0153928c3331970b" title="Cloudify-Recipe-3" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef0153928c3331970b-320wi" alt="Cloudify-Recipe-3" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Cloudify's orchestration works at a process level.&lt;/p&gt;
&lt;p&gt;This means that the deployment, installation,and scaling can be supported in the same way no matter what the deployment stack is, whether it's based on Java, .Net, C++, Ruby, PHP, or anything else.&lt;/p&gt;
&lt;p&gt;Further, Cloudify comes with pre-integrated recipes that support the likes of Tomcat, JBoss, MySQL, Spring, Cassandra, MongoDB, HSQL, ActiveMQ, and Solr as well as a set of pre-backed application blueprints for Web Applications and Big Data Analytics applications.&lt;/p&gt;
&lt;p&gt;This makes Cloudify &lt;strong&gt;open to any application stack. &lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&amp;nbsp;&lt;/strong&gt;Plus,&amp;nbsp;it actually makes many of the popular OSS stacks run better&lt;strong&gt;.&lt;/strong&gt;The Groovy-based DSL and the planned integration with other automation engines such as &lt;a href="http://www.opscode.com/chef/" target="_self"&gt;Chef&lt;/a&gt; make it fairly easy to extend the platform and plug in your application or stack of choice. We actually invested a lot of effort in Cloudify to make the process of developing and testing recipes very simple, to the point where you can test an entire application stack on a local Cloudify-managed cloud.&lt;/p&gt;
&lt;p&gt;The third point relates to DevOps.&lt;/p&gt;
&lt;p&gt;Many of the existing automation and orchestration services have been designed primarly for operation teams. They are often not even accessible or available at the development environment.&lt;/p&gt;
&lt;p&gt;With Cloudify, we realized that the right way to make an application ready for the cloud would be to empower the developers of those applications with tools that fit their development environment through Local-Cloud, a simple DSL, and a set of IDE integrations along with the Cloudify Application Management and Monitoring system, which was designed to fit into both the development and operational environment.&lt;/p&gt;
&lt;p&gt;With these, we can provide a &lt;strong&gt;smooth transition from the development environment to the production cloud environment &lt;/strong&gt;through a single operation. The same abstraction enables us to move between different cloud environments, whether they happen to be public or private clouds. &amp;nbsp;This allows us to choose the right cloud for the job.&lt;/p&gt;
&lt;p&gt;For example, we can run our testing in the cloud and the production application locally, or vice versa. We can choose a cloud resource based on cost and/or locality; for example, we may want to use a "low cost" cloud for demos, and for production deployment, a cloud that provides the&amp;nbsp;best service levels. All this brings a&amp;nbsp;&lt;strong&gt;consistent cloud story&lt;/strong&gt; between local, private, and public clouds. &amp;nbsp;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;iframe width="460" height="315" src="http://www.youtube.com/embed/W3kG84vvnfY" frameborder="0" allowfullscreen&gt;&lt;/iframe&gt;

&lt;!--
&lt;p&gt;&lt;a style="display: inline;" href="http://natishalom.typepad.com/.a/6a00d835457b7453ef0154365fcf64970c-pi"&gt;&lt;img class="asset  asset-image at-xid-6a00d835457b7453ef0154365fcf64970c" title="CloudifyConsole" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef0154365fcf64970c-500wi" alt="CloudifyConsole" width="408" height="240" /&gt;&lt;/a&gt;&lt;/p&gt;
--&gt;

&lt;p&gt;&lt;strong&gt;What does this means for XAP users?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As mentioned above, XAP provides an end to end scale-out middleware stack.&lt;/p&gt;
&lt;p&gt;Cloudify was designed to manage XAP middleware services as a first class citizen within the Cloudify environment.&amp;nbsp;That means that with XAP, users can now have get significantly better automation, orchestration, as well as management and monitoring services than with previous releases. Cloudify comes with specific support for running XAP middleware components, including DataGrid, Messaging, Processing, Map/Reduce, et al.&lt;/p&gt;
&lt;p&gt;The combination of the Cloudify and XAP also opens the opportunity to integrate XAP with other middleware services such as Tomcat and JBoss on the web container side, and MongoDB, Cassandra on the database side, managing the &lt;em&gt;entire&lt;/em&gt; stack through a consistent provisioning and management experience.&amp;nbsp;It also completely automates the process of deploying complex applications such as Big Data applications that span multiple sites.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How can Cloudify users benefit from XAP?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;One point that needs to be made is that Cloudify and XAP protect you from external services. For example, Amazon provides an excellent feature set: RDS, SQS, SimpleDB, Elastic Caching, and more, but using these features ties you to Amazon. With Cloudify and XAP, you get each of these feature sets, along with multi-site replication and Xtreme Transaction Processing, which protects you from being locked in to cloud-specific services.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Who is integrating with Cloudify&amp;nbsp;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Even though Cloudify wasn't officially released until today, the interest behind the product was enormous and brought many new prospects and existing customers to try out earlier versions of the product. I picked a&amp;nbsp;few of the intreesting use cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A large Telco ISV plans to use Cloudify to move many of their existing network applications and services into a virtualized environment.&lt;/li&gt;
&lt;li&gt;A leading Bank that is already using XAP plans to use Cloudify to manage their entire application stack and build thier own private PaaS.&lt;/li&gt;
&lt;li&gt;A large ISV plans to use Cloudify to build a fully managed Big Data stack that integrates with Cassandra.&lt;/li&gt;
&lt;li&gt;A large system integrator is using our Integration platform&amp;nbsp;&lt;em&gt;&lt;strong&gt;Cloudify for Azure &lt;/strong&gt;&lt;/em&gt;to bring Java applications into &lt;a href="http://www.microsoft.com/windowsazure/" target="_self"&gt;Azure&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A large provider in the financial sector plans to use Cloudify to move their entire portfolio of&amp;nbsp;products that were built through aquisitions into a common platform that can provide consistent management across the stack and reduce the cost of managing their existing application, as well as reduce the time it takes them to launch new features and applications.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Final notes..&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Today many organizations and ISVs wanting to leverage the value of the cloud are left with a fairly limited number of&amp;nbsp;choices - they can either choose to use one of the public PaaS platforms and give up full control over the langague, data center, and stack that they can choose, or they can choose to build their own platform and in that case face the complexity, cost, and more importantly time that is often involved in building such a platform.&lt;/p&gt;
&lt;p&gt;Many of those organizations need a platform that they can use to Cloudify their application and build thier own PaaS without the complexity involved, without giving up control at the same time.&lt;/p&gt;
&lt;p&gt;ISVs and SaaS providers need such a platform to embed as part of their own product distribution or service.&lt;/p&gt;
&lt;p&gt;According to recent Forrester research (&lt;a href="http://blogs.forrester.com/stefan_ried/11-04-21-sizing_the_cloud" target="_self"&gt;Sizing The Cloud&lt;/a&gt;),&amp;nbsp;the demand for such middleware platforms is going to grow substantially over the course of the next years.&lt;/p&gt;
&lt;div&gt;
&lt;blockquote&gt;By 2020, the middleware virtualization market will be the largest private cloud segment in terms of total revenues, at $9 billion.&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;“a new set of technologies that allow virtualization of large-scale, server-based enterprise applications are starting to become more important. Application-aware virtualization (AAV) will be the primary driver of future growth in this category”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Cloudify is positioned specificaly to fill-in that void.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;In addition to all that, Cloudify will come with a &lt;strong&gt;feature-rich free edition&lt;/strong&gt;. Unlike many of the other alternative offerings in this space, we designed the free edition to serve real enterprise application workloads and not just demo or development environments. (We'll share more in few weeks time as we complete this beta stage).&lt;/p&gt;
&lt;p&gt;The current beta version already comes with support for Tomcat, JBoss, MySQL, Spring, Cassandra, MongoDB, HSQL, and ActiveMQ. It also comes with examples that show full application stack deployments, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Web Application deployment stacks including the traditional PetClinic and SpringTravel applications, demonstrating an elastic deployment of a common web application stack.&lt;/li&gt;
&lt;li&gt;A Real Time Analytics for Big Data example, demonstrating how you can process large volumes of events in real time and store them in a NoSQL backend.&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Trying out these examples is an extremely simple process. &amp;nbsp;You don't need to download or install a full virtual machine image or integrate with a particular cloud. All you need is to download Cloudify, start the Cloudify local-cloud and execute a single command &amp;lt;install-application ..&amp;gt; and you're good to go.&lt;/p&gt;
&lt;p&gt;The beta is available for download at &lt;a href="http://www.gigaspaces.com/cloudify"&gt;gigaspaces.com/cloudify&lt;/a&gt;&amp;nbsp;(the link also includes an online demo for those interested).&lt;/p&gt;
&lt;p&gt;On a more personal note - with Cloudify, I feel that we're able to take a lot of the experience that we've gained over the years in dealing with large-scale mission-critical applications and come out with something that integrates all this into a simple product. I believe that the days in which you had to "outsource" your entire application to someone else's PaaS, or SaaS just to mitigate&amp;nbsp;the complexity will be gone. &lt;strong&gt;Cloudify brings the power back to where its belongs - to the application owner!&amp;nbsp;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I look forward to hearing your feedback, receiving&amp;nbsp;your input, and getting the ball rolling...&amp;nbsp;&lt;/p&gt;
&lt;h1&gt;References:&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://blogs.forrester.com/stefan_ried/11-04-21-sizing_the_cloud" target="_self"&gt;Cloudify for Azure – On-Board Enterprise Java Apps to the Azure Cloud in a Snap&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://blogs.forrester.com/stefan_ried/11-04-21-sizing_the_cloud" target="_self"&gt;Cloudify Product Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.gigaspaces.com/themes/tendu/engine/swf/player.swf?url=../../data/video/AzureISVTeamDemo.mp4" target="_self"&gt;Cloudify online demo (on Azure)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=x9wx0wx_tq0:82YCr4ARIyU:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=x9wx0wx_tq0:82YCr4ARIyU:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/x9wx0wx_tq0" height="1" width="1"/&gt;</content>


        

    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/10/behind-the-curtains-of-the-new-gigaspaces-cloudify-product-1.html</feedburner:origLink><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="enclosure" href="http://feedproxy.google.com/~r/NatiShalom/~5/yBvRA1kA5MA/player.swf" length="0" type="video/vnd.objectvideo" /><feedburner:origEnclosureLink>http://www.gigaspaces.com/themes/tendu/engine/swf/player.swf?url=../../data/video/AzureISVTeamDemo.mp4</feedburner:origEnclosureLink></entry>
    <entry>
        <title>Java and Center Stage: Meet us at JavaOne 2011</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/hVZeLLxPw6g/java-and-center-stage-meet-us-at-javaone-2011.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/09/java-and-center-stage-meet-us-at-javaone-2011.html" thr:count="1" thr:updated="2011-10-16T18:24:21+02:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef015435cd523b970c</id>
        <published>2011-09-30T19:02:28+02:00</published>
        <updated>2011-10-01T13:26:35+02:00</updated>
        <summary>It seems that now Java is taking center stage in the Cloud world, with more application platforms such as SalesForce/Heroku, VMware, Redhat/OpenShift, Cloud.com, JClouds, and obviously GigaSpaces providing a rich set of offerings that are based on...
</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cassandra" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Grid" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Datacenter" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="JavaOne" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="JClouds" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Monitoring" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Multi-tenancy" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="multitenancy" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Operations" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef015435ceb4af970c-pi"&gt;&lt;img align="right" alt="image" height="159" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef015391fb37f7970b-pi" style="margin: 0px 0px 0px 9px; display: inline; float: right;" title="image" width="240"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;p&gt;Next week, I’m going to be in JavaOne. Unlike last year, where I expressed &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2010/09/my-javaone-2010-plans.html"&gt;lots of skepticism&lt;/a&gt; about the way Oracle drives the Java Community, it seems that this year things are starting to get back on course – not that the skepticism has vanished completely, but at least there is a stronger sentiment that things are starting to settle down and take a more positive course.&lt;/p&gt;&#xD;
&lt;p&gt;It seems that now Java is taking center stage in the Cloud world, with more application platforms such as SalesForce/Heroku, VMware, Redhat/OpenShift, Cloud.com, JClouds, and obviously &lt;a href="http://gigaspaces.com/xap" target="_self"&gt;GigaSpaces&lt;/a&gt; providing a rich set of offerings that are based on Java.&lt;/p&gt;&#xD;
&lt;p&gt;One of the most interesting things in my opinion is that the combination of Cloud and Java brings new opportunities that until recently were unheard of:&lt;/p&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Heroku, which was very much Ruby-centric, &lt;a href="http://cornishhostingcompany.co.uk/blog/cloud-app-platform-heroku-now-supports-java/3812/"&gt;added support for Java&lt;/a&gt;. &lt;/li&gt;&#xD;
&lt;li&gt;GigaSpaces recently announced our partnership with Microsoft to &lt;a href="http://www.gigaspaces.com/azure" target="_self"&gt;bring Java to Microsoft's Azure community&lt;/a&gt;. &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;What’s interesting in those two examples is that the Cloud forces us to break away from the language wall. It brings communities that were living in completely isolated silos much closer together.&lt;/p&gt;&#xD;
&lt;p&gt;This forces a very different way of thinking about Java - and about the next generation Java platform specifically.&lt;/p&gt;&#xD;
&lt;p&gt;The new platform should be:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Open for other languages (through integration points like Protobuff, Thrift, REST, and more, as well as the ability to &lt;em&gt;run&lt;/em&gt; those languages) &lt;/li&gt;&#xD;
&lt;li&gt;Embrace NoSQL implementations as an integral part of the platform &lt;/li&gt;&#xD;
&lt;li&gt;Add support for dynamic data structures such as JSON, and schema-free document models &lt;/li&gt;&#xD;
&lt;li&gt;Bring Dev and Ops closer by providing better exposure of operational aspect as part of the development frameowork &lt;/li&gt;&#xD;
&lt;li&gt;Make cloud deployment a first class citizen &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;It seems that Java EE 7 is taking the right path in that direction. From &lt;a href="http://java.sys-con.com/node/1794962" target="_self"&gt;Java EE 7 and Cloud Computing&lt;/a&gt;:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;The Java EE platform architecture is taking into account the operational impact of the cloud, more specifically by defining new roles, such as a PaaS administrator. &lt;/li&gt;&#xD;
&lt;li&gt;The Java EE platform may also establish a set of constraints for PaaS-specific features, such as multi-tenancy, that deployments may have to obey. Applications may also be able to identify themselves as designed for cloud environments. &lt;/li&gt;&#xD;
&lt;li&gt;All resource manager-related APIs, such as JPA, JDBC and JMS, will be updated to enable multi-tenancy. The programming model may be refined as well by introducing connectionless versions of the major APIs. &lt;/li&gt;&#xD;
&lt;li&gt;Java EE will define a descriptor for application metadata to enable developers to describe certain characteristics of their applications that are essential for the purpose of running them in a PaaS environment. These may include being multitenancy-enabled, resources sharing, quality of service information, dependencies between applications. &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;h2&gt;What This Means&lt;/h2&gt;&#xD;
&lt;p&gt;Java is regaining some of its luster from the last few years. It was being perceived as vaguely pedestrian and ordinary, and languages like Ruby and Python were being seen as more dynamic environments for rapid and scalable deployments.&lt;/p&gt;&#xD;
&lt;p&gt;Now we're seeing Java as the &lt;em&gt;&lt;a href="http://en.wikipedia.org/wiki/Lingua_franca" target="_self"&gt;lingua franca&lt;/a&gt;&lt;/em&gt; of enterprise development. With so many environments being heterogeneous, a homogenous platform becomes a very desirable resource, and the JVM is able to run many languages in an integrated environment: Java, Ruby, Python, Scala, and Groovy, to name only a few, and many of these can run as compiled code &lt;em&gt;and&lt;/em&gt; as scripts.&lt;/p&gt;&#xD;
&lt;p&gt;Even beyond the multiple programming language support of the JVM, you have very broad support for almost any number of remote procedure call mechanism you can think of. The old J2EE approach, where you defined your requirement and there was a specific way to fulfill it (which meant that every architecture was more or less nudged to the middle), is no longer a real limit.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;If you can do it, you can do it in Java, without Java forcing you to sacrifice in the process.&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;This is a very powerful concept.&lt;/p&gt;&#xD;
&lt;p&gt;It puts your organization back under your control. This is desirable because &lt;em&gt;you&lt;/em&gt; know your product and architecture better than a community process does. You know whether you need to support REST here, and SOAP there; you know you need to use NoSQL in this area and MySQL in that area.&lt;/p&gt;&#xD;
&lt;h2&gt;A Remaining Challenge&lt;/h2&gt;&#xD;
&lt;p&gt;Java EE 7 may be becoming &lt;em&gt;aware&lt;/em&gt; of the new cloud-based environment, but it's still only one aspect of application deployment and design. What it does &lt;em&gt;not&lt;/em&gt; address is the entire application environment.&lt;/p&gt;&#xD;
&lt;p&gt;This is where applications such as &lt;a href="http://www.gigaspaces.com/cloudify" target="_self"&gt;Cloudify&lt;/a&gt; come in. Cloudify treats every artifact involved with your deployment as a managed element.&lt;/p&gt;&#xD;
&lt;p&gt;Consider: when you deploy an enterprise application, you're not just deploying some operational code. That code has external requirements, like a database (or NoSQL warehouse); it might also rely on an Apache-based load balancer, or perhaps another independently-deployed artifact.&lt;/p&gt;&#xD;
&lt;p&gt;That's quite a burden on operations, because all of these are separately deployed and managed.&lt;/p&gt;&#xD;
&lt;p&gt;Cloudify, however, centralizes the management of each element your application is composed of, over the entire application's lifecycle, and provides a bridge to whatever environment you wish to use.&lt;/p&gt;&#xD;
&lt;p&gt;For example, you could have a MySQL database, used by a Java application hosted by a single Tomcat instance, fronted by an Apache httpd server, hosted on an internal Linux server. Cloudify can deploy and monitor each of those elements: the MySQL database, the schema and initial data; Tomcat; the Java application; Apache httpd, and the server upon which it all runs.&lt;/p&gt;&#xD;
&lt;p&gt;Now let's look at a larger deployment: a set of MySQL servers, a cluster of GigaSpaces XAP nodes, an instance of GlassFish hosting some web services, six .Net-based web services, and five httpd servers being load-balanced, deployed on a platform such as &lt;a href="http://www.microsoft.com/windowsazure/" target="_self"&gt;Microsoft Azure&lt;/a&gt;. From Cloudify's perspective, &lt;strong&gt;the deployment and management process is the same&lt;/strong&gt;. You have more artifacts to deploy, and you're telling Cloudify to deploy on a different platform... that's it.&lt;/p&gt;&#xD;
&lt;p&gt;Cloudify can make sure each resource is running optimally, as well. If you can measure an attribute - such as CPU load, or disk space, or free memory - you can tell Cloudify to assert that some headroom exists, and what to do if a metric is exceeded.&lt;/p&gt;&#xD;
&lt;p&gt;For example, if you have a CPU running at 100% for an extended period of time, Cloudify can automatically deploy another instance of the component consuming the CPU, and configure load balancing so your application is more scalable, &lt;strong&gt;without human intervention&lt;/strong&gt;.&lt;/p&gt;&#xD;
&lt;p&gt;Likewise, if you have four servers running a given process, all at 0% CPU load, you can reduce the number of containers with that component so that you're not overallocating resources, saving energy and money.&lt;/p&gt;&#xD;
&lt;p&gt;The key thought here:&lt;/p&gt;&#xD;
&lt;h2&gt;You know how to solve your problem. We can help.&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e8bef29cb970d-pi"&gt;&lt;img align="right" alt="2011-09-30 22 15 01" border="0" height="128" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef015435ceb509970c-pi" style="background-image: none; margin: 0px 0px 0px 6px; padding-left: 0px; padding-right: 0px; display: inline; float: right; padding-top: 0px; border: 0px;" title="2011-09-30 22 15 01" width="171"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/h2&gt;&#xD;
&lt;p&gt;We'll be on the floor at JavaOne 2011, in booth #5002. We'll be happy to show you both &lt;a href="http://www.gigaspaces.com/xap" target="_self"&gt;XAP&lt;/a&gt;, our industry-leading application platform, and &lt;a href="http://www.gigaspaces.com/cloudify" target="_self"&gt;Cloudify&lt;/a&gt;, our deployment solution for the cloud-enabled world.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;span style="font-size: 8pt;"&gt; &lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;span style="font-size: 8pt;"&gt; &lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;span style="font-size: 8pt;"&gt;(This post was co-written with Joseph Ottinger, who unfortunately won't be able to make it to JavaOne this year. But he says that the best people in GigaSpaces &lt;em&gt;will&lt;/em&gt; be there, so you're not missing out!)&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=hVZeLLxPw6g:uTuu6RnRq2Q:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=hVZeLLxPw6g:uTuu6RnRq2Q:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/hVZeLLxPw6g" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/09/java-and-center-stage-meet-us-at-javaone-2011.html</feedburner:origLink></entry>
    <entry>
        <title>Big Data Application Platform</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/mJZsfPckMUs/big-data-application-platform.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/09/big-data-application-platform.html" thr:count="6" thr:updated="2012-01-12T22:16:53+01:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef0153913d1ca1970b</id>
        <published>2011-09-02T16:23:16+02:00</published>
        <updated>2011-09-03T00:54:10+02:00</updated>
        <summary>It's time to think of the architecture and application platforms surrounding "Big Data" databases. Big Data is often centered around new database technologies mostly from the emerging NoSQL world. The main challenge that these databases solve is how to handle...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Grid" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Hbase" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Java/J2EE" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="NOSQL" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="PaaS" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Risk Analysis" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;It's time to think of the architecture and application platforms surrounding "Big Data" databases. Big Data is often centered &lt;a href="http://www.flickr.com/photos/alphachimpstudio/5022256242/sizes/m/in/photostream/"&gt;&lt;img align="right" alt="" height="271" src="http://farm5.static.flickr.com/4130/5022256242_8623087f8a.jpg" style="margin: 5px 0px 2px 3px; display: inline; float: right;" width="323"&gt;&lt;/img&gt;&lt;/a&gt;around new database technologies mostly from the emerging NoSQL world.  The main challenge that these databases solve is how to handle massive amount of data at a reasonable cost and without poor performanc - distributed databases emerged to address this challenge and today we're seeing &lt;a href="http://www.dzone.com/links/r/nosql_job_trends_august_2011.html"&gt;high adoption rate&lt;/a&gt; and quite impressive success stories such as the &lt;a href="http://techblog.netflix.com/2011/01/nosql-at-netflix.html" target="_self"&gt;Netflix use of Cassandra/DataStax solution&lt;/a&gt;.  All that indicate the speed in which this market evolves.&lt;/p&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;p&gt; &lt;span style="font-size: 20px; font-weight: bold;"&gt;The need for a Big Data Application Platform&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p&gt;Application platforms provide a framework for making the development of applications simpler. They do this by carving out the generic parts of applications such as security, scalability, and reliability (which &lt;em&gt;are&lt;/em&gt; attributes of a 'good' application) from the parts of the applications that are specific to our business domain.&lt;/p&gt;&#xD;
&lt;p&gt;Most of the existing application platforms such as Java EE and Ruby on Rails were designed to work with centralized relational databases in mind. Clearly, that model doesn’t fit well to the Big Data world simply because it wasn’t designed to deal with massive amount of data in first place. In addition to that, frameworks such as Hadoop are considered too complex as noted in VP/Research Director for Forrester Research &lt;strong&gt;&lt;a href="http://www.linkedin.com/profile/view?id=230799&amp;amp;authType=name&amp;amp;authToken=v_aU&amp;amp;locale=en_US&amp;amp;pvs=pp&amp;amp;trk=ppro_viewmore"&gt;Mike Gilpin&lt;/a&gt;, &lt;/strong&gt;in his post, &lt;a href="http://blogs.computerworlduk.com/app-dev-and-programme-management/2011/05/big-data-technology-getting-hotter-but-still-too-hard-for-most-developers/index.htm"&gt;"Big Data" technology: getting hotter, but still too hard&lt;/a&gt;:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;"Big Data" also matters to application developers - at least, to those who are building applications in domains where "Big Data" is relevant. These include smart grid, marketing automation, clinical care, fraud detection and avoidance, criminal justice systems, cyber-security, and intelligence.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;One "big question" about "Big Data": What’s the right development model? Virtually everyone who comments on this issue points out that today’s models, such as those used with Hadoop, are too complex for most developers. It requires a special class of developers to understand how to break their problem down into the components necessary for treatment by a distributed architecture like Hadoop. For this model to take off, we need simpler models that are more accessible to a wider range of developers - while retaining all the power of these special platforms.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;Other existing models for handling Big Data such as Data Warehouse don’t cut it either, as noted in &lt;a href="http://www.linkedin.com/profile/view?id=123279&amp;amp;authType=NAME_SEARCH&amp;amp;authToken=9Tzt&amp;amp;locale=en_US&amp;amp;srchid=a8bc0aea-cd89-44fe-9deb-88c0a1d74c15-0&amp;amp;srchindex=1&amp;amp;srchtotal=2&amp;amp;goback=%2Efps_PBCK_Dan+Woods+forbes_*1_*1_*1_*1_*1_*1_*2_*1_Y_*1_*1_*1_false_1_R_true_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2&amp;amp;pvs=ps&amp;amp;trk=pp_profile_name_link"&gt;Dan Woods&lt;/a&gt;' post on Forbes, &lt;a href="http://www.forbes.com/sites/ciocentral/2011/07/21/big-data-requires-a-big-new-architecture/"&gt;Big Data Requires a Big, New Architecture&lt;/a&gt;:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;...to take maximum advantage of big data, IT is going to have to press the re-start button on its architecture for acquiring and understanding information. IT will need to construct a new way of capturing, organizing and analyzing data, because big data stands no chance of being useful if people attempt to process it using the traditional mechanisms of business intelligence, such as a data warehouses and traditional data-analysis techniques.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;To effectively write Big Data applications, we need an Application Platform that would put together the different patterns and tools that are used by pioneers in that space such as Google, Yahoo, and Facebook in one framework and make them simple enough so that any organization could make use of them without the need to go through huge investment.&lt;/p&gt;&#xD;
&lt;p&gt;Here's my personal view on how that platform could look like based on my experience covering the NoSQL space for a while now and through my experience with GigaSpaces.&lt;/p&gt;&#xD;
&lt;h2&gt;Characteristics of Big Data Application Platform&lt;/h2&gt;&#xD;
&lt;p&gt;As with any Application Platform, a Big Data application platform needs to support all the functionality expected from any application platform such as scalability, availability, security, etc.&lt;/p&gt;&#xD;
&lt;p&gt;Big Data Application platforms are unique in the sense that they need to be able handle massive amounts of data and therefore need to come with built-in support for things like Map/Reduce, Integration with external NoSQL databases, parallel processing, and data distribution services and on top of that, they should make the use of those new patterns simple from a development perspective.&lt;/p&gt;&#xD;
&lt;p&gt;Below is a more concrete list of the specific characteristics and features that define what Big Data Application Platform ought to be. I've tried to point to the specific Java EE equivalent API and how it would need be extended to support Big Data application.&lt;/p&gt;&#xD;
&lt;h3&gt;Support Batch and Real Time analytics&lt;/h3&gt;&#xD;
&lt;p&gt;Most of the existing application platforms were designed for handling of transactional web applications and have little support for business analytics applications.  Hadoop has become the de facto standard for handling Batch processing; Real Time analytics, however, is done through other means outside of the hadoop framework, mostly through an event processing framework, as I already outlined in details during my previous post &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach.html"&gt;Real Time Analytics for Big Data: An Alternative Approach&lt;/a&gt;.&lt;/p&gt;&#xD;
&lt;p&gt;A Big Data Application Platform would need to make Big Data application development closer to mainstream development  by providing a built-in stack that includes integration with Big Data databases from the NoSQL world, and Map/Reduce frameworks such as Hadoop and distributed processing, etc.&lt;/p&gt;&#xD;
&lt;p&gt;It also needs to extend the existing Transaction processing and Event Processing semantics that come with JavaEE for handling of Real Time analytics that fits into the Big Data world as outlined in the references below:&lt;/p&gt;&#xD;
&lt;h3&gt;Making Big Data Application Closer to Mainstream development practices&lt;/h3&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;h4&gt;Domain model and Data access API&lt;/h4&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Writing Big Data application is quite different than writing a typical &lt;a href="http://en.wikipedia.org/wiki/Create,_read,_update_and_delete"&gt;CRUD&lt;/a&gt; application to a centralized relational database in many forms. The main difference is with the design of our data domain model, and the API and Query semantics that well use to access and process that data.&lt;/p&gt;&#xD;
&lt;p&gt;Mapping has proven to be a fairly effective approach to map the impedance mismatch between different data models and API’s.  A good reference on that regard is the use of O/R mapping tools such as Hibernate for bridging similar impedance mismatch between Object and Relational models.&lt;/p&gt;&#xD;
&lt;p&gt;Abstraction frameworks such as Spring has also proven to be quite useful means to plug-in different data sources that doesn’t comply to common API through plug-in approach.&lt;/p&gt;&#xD;
&lt;p&gt;To make the development of Big Data Application closer to mainstream development Big Data application platform need to come with similar mapping and abstraction tools that would make the transition from the existing API’s significantly smoothers.&lt;/p&gt;&#xD;
&lt;p&gt;Today we already have various mapping and abstraction framework available.&lt;/p&gt;&#xD;
&lt;h5&gt;For Big Data Mapping tools:&lt;/h5&gt;&#xD;
&lt;p&gt;For batch processing were already seeing the increase adoption of frameworks such as &lt;a href="http://wiki.apache.org/hadoop/Hive"&gt;Hive&lt;/a&gt; which provide an SQL like façade for handling complex batch processing with hadoop.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;a href="http://en.wikibooks.org/wiki/Java_Persistence/What_is_JPA%3F"&gt;JPA&lt;/a&gt; provides more standard JEE abstraction that fits into Real Time Big Data application.    Google App Engine uses &lt;a href="http://www.datanucleus.org/products/accessplatform/appengine/support.html"&gt;Data Nucleus&lt;/a&gt; on top of Big Table, With GigaSpaces we came up with a &lt;a href="http://www.gigaspaces.com/wiki/display/XAP8/JPA+Support"&gt;high performance JPA&lt;/a&gt; abstraction for our In Memory Data Grid using Open JPA, and Redhat came up with Hibernate OGM (Object-Grid Mapping). &lt;/p&gt;&#xD;
&lt;p&gt;In addition to these, we're seeing that the existing mapping tools add extensions to support data partitioning semantics through annotations as in the case of &lt;a href="http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Advanced_JPA_Development/Data_Partitioning"&gt;EclipseLink&lt;/a&gt; and others.&lt;/p&gt;&#xD;
&lt;h5&gt;For Big Data Spring Abstraction:&lt;/h5&gt;&#xD;
&lt;p&gt;SpringSource came out with an interesting and even more high level abstraction known as &lt;a href="http://www.springsource.org/spring-data"&gt;Spring Data&lt;/a&gt; that makes it possible to map different data stores of all kinds into one common abstraction through annotation and plug-in approach. The abstraction is leaky currently, but shows progress in the space.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;h4&gt;Business logic&lt;/h4&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;In Java EE, business logic refers to the parts in our application that are responsible for processing the data. As with JavaEE the data often lives in a centralized database SessionBean was often mapped to a single instance per user session.&lt;/p&gt;&#xD;
&lt;p&gt;With BigData the common pattern for processing data is through Map/Reduce. Map/Reduce was designed to handle the processing of massive amount of data through moving the processing logic to the data and distributing the logic in parallel to all nodes.  Having said that, developing parallel processing code is considered fairly complex.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;A Big Data Application Platform would need to make &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2008/10/is-mapreduce-going-to-main-stream.html"&gt;Map/Reduce&lt;/a&gt; and parallel execution simple. One way of doing that is by mapping this semantics into existing programming models. One example would be to extend the current SessionBean model to support this sort of semantics (as with the &lt;a href="http://www.gigaspaces.com/wiki/display/XAP8/Event+Driven+Remoting"&gt;GigaSpaces Remoting&lt;/a&gt; semantics) – this makes parallel processing look like a standard remote method invocation. Another way is to provide more native parallel execution semantics by extending existing semantics such as &lt;a href="http://www.gigaspaces.com/wiki/display/XAP8/JPA+Support#JPASupport-NativeQueryExecution"&gt;createNativeQuery&lt;/a&gt; in JPA, or the &lt;a href="http://www.gigaspaces.com/wiki/display/XAP8/Task+Execution+over+the+Space#TaskExecutionovertheSpace-j.u.cExecutorService"&gt;executor API&lt;/a&gt; in Java. &lt;/p&gt;&#xD;
&lt;p&gt;Interestingly enough, at the time of writing this post I came across  &lt;a href="http://www.tricode.nl/author/rvbreukelen/"&gt;Robin van Breukelen&lt;/a&gt;'s post, "&lt;a href="http://www.tricode.nl/distributed-fork-join/"&gt;Distributed Fork Join&lt;/a&gt;," that shows how you can use the latest Java 7 fork/join API in similar context.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;h4&gt;Support new semantics that fits the dynamic web era&lt;/h4&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Clearly, SQL is a great query language (or else it wouldn't have taken over for database access!) but SQL also has its limits. When we're writing web content, we often deal with dynamic data structures that continuously evolve. When the amount of this data gets massive, as with Big Data, it becomes almost impossible to manage those changes through a standard SQL schema evolution model.&lt;/p&gt;&#xD;
&lt;p&gt;To address this need, BigData platforms need to add support schemaless semantics as a first class citizen. That often means that the data mapping layer mentioned earlier would need to be extended to support document semantics. An example for that is &lt;a href="http://blog.mongodb.org/post/119945109/why-schemaless"&gt;MongoDB&lt;/a&gt;, &lt;a href="http://www.couchbase.com/"&gt;CouchBase&lt;/a&gt;, &lt;a href="http://www.datastax.com/docs/0.8/data_model/index" target="_self"&gt;Cassandra&lt;/a&gt; and the new &lt;a href="http://www.gigaspaces.com/wiki/display/XAP8/Document+%28Schema-Free%29+API"&gt;GigaSpaces Document API&lt;/a&gt; (which, as it turns out, &lt;a href="https://github.com/Gigaspaces/bestpractices/blob/master/mirror-parent/mongodb/mongodb-common/src/main/java/org/openspaces/bestpractices/mirror/mongodb/common/MongoEDS.java" target="_self"&gt;maps to MongoDB&lt;/a&gt; and &lt;a href="http://www.gigaspaces.com/wiki/display/SBP/Cassandra+Mirror+Service" target="_self"&gt;Cassandra&lt;/a&gt; very easily.)&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;h4&gt;Provide built-in semantics for handling of the tradeoffs between consistency, availability, scalability rather than trying to force a least common denominator as with XA and JTA&lt;/h4&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;JEE application platform forced a fairly strict consistency model through JTA, Distributed Transactions, and so forth. Unlike Java EE, Big Data Application platforms need to support more relaxed versions of those semantics that would enable flexibility between consistency, scalability, and performance. A good example on how that configuration may look like is covered in &lt;a href="http://www.datastax.com/docs/0.8/consistency/index" target="_self"&gt;Cassandra/DataStax&lt;/a&gt; documentation and one of &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2010/11/nocap-part-iii-gigaspaces-clustering-explained.html" target="_self"&gt;my previous post&lt;/a&gt; on the subject of CAP theory.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;h4&gt;Provide built in support for In-Memory data store to gain best performance and latency&lt;/h4&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;RAM-based devices provide the best performance/latency results. Big Data platforms need to provide seamless integration between RAM and Disk based devices where data that is written in RAM would be synched into Disk asynchronously. In addition to that, they need to provide common abstractions that enable users to use the same Data access API for  both devices and thus make it easier to choose to the right tool for the job without changing the application code.&lt;/p&gt;&#xD;
&lt;p&gt;Good starting points on that regard are the JPA abstractions now being supported by In-Memory Data-Grids and NoSQL data stores as well as the Spring Data abstraction as noted above.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;h4&gt;Provide built-in support for event driven data distribution using pub/sub model&lt;/h4&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;The current Java EE frameworks provide limited support for event processing through message-driven beans and JMS. For Big Data applications, we need to add data awareness to the current MDB model that make it easy to route messages based on data affinity and content of the message. We also need to provide more fine grained semantics for triggering events based on data operation (delete, update,..) as well as content as with CEP (Complex Event Processing). &lt;/p&gt;&#xD;
&lt;p&gt;In addition to that we need to provide higher level abstractions that use the underlying pub/sub support to enable data synchronization and locality. A good example for that is the use of LocalCache and LocalView where LocalCache is useful for random access patterns by caching the data that was used most recently and LocalView which provides means to Cache a specific segment of data that is known in advance. Both LocalCache and LocalView use the underlying pub/sub support to ensure continuous synchronization of the data from the BigData pool by tracking changes on the Big Data pool and updating the local copy of the data implicitly.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;h3&gt;Built in support for public/private cloud&lt;/h3&gt;&#xD;
&lt;p&gt;Big Data applications tend to consume lots of compute and storage resources.  There are a growing number of cases where the use of the cloud enables significantly better economics for running  Big Data applications. To take advantage of those economics, Big Data Application Platforms need to come with built-in support for public/private clouds that will include seamless transition between the various cloud platforms through integration with frameworks such as &lt;a href="www.jclouds.org"&gt;JClouds&lt;/a&gt;. Cloud Bursting provides a hybrid model for using cloud resources as spare capacity to handle load. To effectively handle Cloud Bursting with Big Data well have to make the data available for both the public and private side of the cloud under reasonable latency – which often requires other services such as data replication.&lt;/p&gt;&#xD;
&lt;h3&gt;Open &amp;amp; Consistent management and orchestration across the stack&lt;/h3&gt;&#xD;
&lt;p&gt;A typical Big Data application stack includes multiple layers such as the database itself, the web tier,  the processing tier, caching layer, the data synchronization and distribution layer, reporting tools, etc. One of the biggest challenge is that each of those layers come with different management, provisioning, monitoring and troubleshooting tools. Big Data applications tend to be fairly complex by their very nature; the lack of consistent management,  monitoring and orchestration across the stack makes the maintenance and management of this sort of application significantly harder.&lt;/p&gt;&#xD;
&lt;p&gt;In most of the existing Java EE management layers, the management application assumed control of the entire stack. With Big Data applications, that assumption doesn’t apply. The stack can vary quite significantly between different application layers therefore the management layer of Big Data Application Platform needs to come with a more open management that could host different databases, web-containers etc., and provide consistent management and monitoring through this entire stack.&lt;/p&gt;&#xD;
&lt;h2&gt;Final words&lt;/h2&gt;&#xD;
&lt;p&gt;&lt;img align="left" alt="" src="http://siliconangle.com/files/2011/05/cloud-big-data-signpost.jpg" style="margin: 0px 19px 0px 0px; display: inline; float: left;"&gt;&lt;/img&gt;Java EE application servers played an important role in making the development of database-centric web application closer to mainstream. Other frameworks such as Spring and Ruby on Rails later emerged to increase the development productivity of those applications. Big Data Application Platforms have a similar purpose – they should provide the framework for making the development, maintenance and management of Big Data Applications simpler. In a way, you could think of Big Data Application platforms as a natural evolution of the current application platforms.&lt;/p&gt;&#xD;
&lt;p&gt;With the current shift of Java EE application platforms toward PaaS, we're going to see even stronger demand for running Big Data applications on cloud based environments due to the inherent economic and operational benefits. Compared to the current PaaS model, moving data to the cloud is more complex and would require more advance support for data replication across sites, cloud bursting etc.&lt;/p&gt;&#xD;
&lt;p&gt;The good news is that Big Data Application platforms &lt;em&gt;are&lt;/em&gt; being implemented with these goals in mind, and you can already see migration yielding the benefits one should expect.&lt;/p&gt;&#xD;
&lt;h1&gt;References&lt;/h1&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;p&gt;&lt;a href="http://www.dzone.com/links/r/nosql_job_trends_august_2011.html"&gt;NoSQL Job Trends – August 2011&lt;/a&gt; (Dzone)&lt;/p&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;li&gt;&#xD;
&lt;p&gt;&lt;a href="http://blogs.computerworlduk.com/app-dev-and-programme-management/2011/05/big-data-technology-getting-hotter-but-still-too-hard-for-most-developers/index.htm"&gt;Big Data" technology: getting hotter, but still too hard&lt;/a&gt; (Forrester)&lt;/p&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://www.forbes.com/sites/ciocentral/2011/07/21/big-data-requires-a-big-new-architecture/"&gt;Big Data Requires a Big, New Architecture&lt;/a&gt; (Forbes)&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;span style="font-size: 15px;"&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach-to-facebooks-new-realtime-analytics-system.html" target="_self"&gt;Real Time analytics for Big Data: Facebook's New Realtime Analytics Syste&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;/ul&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=mJZsfPckMUs:fDmpUkCTPzs:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=mJZsfPckMUs:fDmpUkCTPzs:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/mJZsfPckMUs" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/09/big-data-application-platform.html</feedburner:origLink></entry>
    <entry>
        <title>Real Time Analytics for Big Data: An Alternative Approach</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/eA1CZK6Hgbw/real-time-analytics-for-big-data-an-alternative-approach.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach.html" thr:count="13" thr:updated="2012-01-11T02:53:38+01:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef015433afa04f970c</id>
        <published>2011-07-14T17:26:26+02:00</published>
        <updated>2011-12-08T22:28:21+01:00</updated>
        <summary>Lately, we've been talking to various clients about realtime analytics, and with convenient timing Todd Hoff wrote up how Facebook's realtime analytics system was designed and implemented (See previous review on that regard here). They had some assumptions in design...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cassandra" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Hbase" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="latency" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="NOSQL" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="predictable analytics" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="real-time" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="scalability" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;Lately, we've been talking to various clients about realtime analytics, and with convenient timing Todd Hoff wrote up how Facebook's realtime analytics system was designed and implemented (See previous review on that regard &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach-to-facebooks-new-realtime-analytics-system.html" target="_self"&gt;here&lt;/a&gt;).&lt;/p&gt;&#xD;
&lt;p&gt;They had some assumptions in design that centered around the reliability of in-memory systems and database neutrality that affected what they did: for memory, that transactional memory was unreliable, and for the database, that HBase was the only targeted data store.&lt;/p&gt;&#xD;
&lt;p&gt;What if those assumptions are changed? We can see reliable transactional memory in the field, as a requirement for any in-memory data grid, and certainly there are more databases than HBase; given database and platform neutrality, and reliable transactional memory, how could you build a realtime analytics system?&lt;/p&gt;&#xD;
&lt;p&gt;Joseph Ottinger and I discussed this, and this is what we came up with.&lt;/p&gt;&#xD;
&lt;h1&gt;A Summary of History&lt;/h1&gt;&#xD;
&lt;p&gt;To understand what a new design might look like, it’s often useful to consider a previous design. This is a very short summary of Facebook’s realtime analytics system.&lt;/p&gt;&#xD;
&lt;p&gt;First, it’s based on a system of key/value pairs, where the key might be a URL and the value is a counter. Thus, there’s a requirement for atomic, transactional updates to a very simple piece of data. The difficulties come from scale, not from the focus of the system.&lt;/p&gt;&#xD;
&lt;p&gt;The process flow is fairly simple:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;A user creates an event by performing some action on the website. This generates an AJAX request, sent to a service. &lt;/li&gt;&#xD;
&lt;li&gt;Scribe is used to write the events into logs, stored on HDFS. &lt;/li&gt;&#xD;
&lt;li&gt;PTail is used to consolidate the HDFS logs. &lt;/li&gt;&#xD;
&lt;li&gt;Puma takes the consolidated logs from PTail and stores them into HBase in groupings that represent roughly 1.5 seconds’ worth of events. &lt;/li&gt;&#xD;
&lt;li&gt;HBase serves as the long-term repository for analytics data. &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;There are some questions around how PTail and Puma serve as scaling agents, and some of the notes around their use are still limited in scale – for example, one of the concerns is that an in-memory hash table will fill up, which sounds like fairly serious limitation to have to keep in mind.&lt;/p&gt;&#xD;
&lt;h1&gt;A Potential for Improvement&lt;/h1&gt;&#xD;
&lt;p&gt;There are lots of areas in which you can see potential improvements, if the assumptions are changed. As a contrast to Facebook's working system:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;We can simplify the design. If memory can be seen as transactional - and it can - we can use them without transforming them as they proceed along our analytics workflow. This makes our design and implementation much simpler to implement and test, and performance improves as well.&lt;/li&gt;&#xD;
&lt;li&gt;We can strengthen the design. With a polling semantic, such systems are brittle, relying on systems that pull data in order to generate realtime analytics data. We should be able to reduce the fragility of the system, even while making it faster.&lt;/li&gt;&#xD;
&lt;li&gt;We can strengthen the implementation. With batching subsystems, there are limits shouldn’t exist. For example, one concern in Facebook's implementation is the use of an in-memory hash table that stores intermediate data; the in-memory aspect isn’t a concern until you realize that the batch sizes are chosen partially to make sure that this hash table doesn’t overflow available space.&lt;/li&gt;&#xD;
&lt;li&gt;We can allow deployments to change databases based on their requirements. There's nothing wrong with HBase, but it's got specific characteristics that aren't appropriate for all enterprises. We can design a system which you’d be able to deploy on various and flexible platforms, and we can migrate the underlying long-term data store to a different database if needed.&lt;/li&gt;&#xD;
&lt;li&gt;We can consolidate the analytics system so that management is easier and unified. While there are system management standards like SNMP that allow management events to be presented  in the same way no matter the source, having so many different pieces means that managing the system requires an encompassing understanding, which makes maintenance and scaling more difficult.&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;h1&gt;&lt;span style="font-weight: normal; font-size: small;"&gt;What we want to do, then, is create a general model for an application that can accomplish the same goals as Facebook’s realtime analytics system, while leveraging the capabilities that in-memory data grids offer where available, potentially offering improvement in the areas of scalability, manageability, latency, platform neutrality, and simplicity, all while increasing ease of data access.&lt;/span&gt;&lt;/h1&gt;&#xD;
&lt;p&gt;That sounds like quite a tall order, but it’s doable.&lt;/p&gt;&#xD;
&lt;p&gt;The key is to remember that at heart, realtime analytics represent an events system. Facebook’s entire architecture is designed to funnel events through various channels, such that they can safely and sequentially manage event updates.&lt;/p&gt;&#xD;
&lt;p&gt;Therefore, they receive a massive set of events that “look like” marbles, which they line up in single file; they then sort the marbles by color, you might say, and for each color they create a bundle of sticks; the sticks are lit on fire, and when the heat goes up past a certain temperature, steam is generated, which turns a turbine.&lt;/p&gt;&#xD;
&lt;p&gt;It’s a real-life Rube Goldberg machine, which is admirable in that it works, but much of it is still unnecessary if the assumptions about memory ("unreliable") and database ("HBase is the only target that counts") are changed. Looking at the analogy from the previous paragraph, there’s no need to change a marble into anything. The marble is enough.&lt;/p&gt;&#xD;
&lt;h1&gt;A Plan for Implementation&lt;/h1&gt;&#xD;
&lt;p&gt;Our design for implementation is built around putting data and messaging together. A data grid is a perfect mechanism for this, as long as it provides some basic features: transactional operations, push and pull semantics, and data partitioning.&lt;/p&gt;&#xD;
&lt;p&gt;A data grid &lt;em&gt;does&lt;/em&gt; provide those basic features, or else it's not really much of a data grid; it'd be more of a cache otherwise.&lt;/p&gt;&#xD;
&lt;p&gt;With a data grid, then, the events come in as individual messages. When the user chooses an operation on the web site, an asynchronous operation would write the event, just as Facebook does today. However, instead of filtering and batching the events into various forms, the events are dispatched to waiting processes that perform many transactional updates in parallel.&lt;/p&gt;&#xD;
&lt;p&gt;There’s a danger that those updates might be slower than the generated events, if each event is processed sequentially. That said, this isn’t as much a problem as one might think; if data partitioning is used, then event handlers can receive partitioned events, which localizes updates and speeds them up dramatically.&lt;/p&gt;&#xD;
&lt;p&gt;In fact, you can still use batching to process events as a group; since the events would be partitioned coming in, the batch process would still be updating local data very quickly, which would be faster than individual event processing, even while retaining simplicity.&lt;/p&gt;&#xD;
&lt;p&gt;With this design, there is no overflow condition, because a system that’s designed to scale in and out as most data grids are will repartition to maintain even usage. If a data grid can’t provide this feature intrinsically, of course some management will be necessary, but finding data grids with this feature isn’t very difficult.&lt;/p&gt;&#xD;
&lt;p&gt;One other advantage of data grids is in write-through support. With write-through, updates to the data grid are written asynchronously to a backend data store – which could be HBase (as used by Facebook), Cassandra, a relational database such as MySQL, or any other data medium you choose for long-term storage, should you need that.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e89d7445a970d-pi" style="display: inline;"&gt;&lt;img alt="RealTimeAnalytics_big_data" border="0" class="asset  asset-image at-xid-6a00d835457b7453ef014e89d7445a970d image-full" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e89d7445a970d-800wi" title="RealTimeAnalytics_big_data"&gt;&lt;/img&gt;&lt;/a&gt; &lt;br&gt;&lt;br&gt;&lt;/p&gt;&#xD;
&lt;p&gt;The memory system and the database - the external data store - work together. The in-memory solution is ideal for the realtime aspects, the events that affect &lt;em&gt;now&lt;/em&gt;. The external data storage solution is designed to handle long-term data, for which speed is not as much of an issue.&lt;/p&gt;&#xD;
&lt;h1&gt;A Discussion of Strengths&lt;/h1&gt;&#xD;
&lt;p&gt;The key concept here is that event handling is the lever that can move the realtime analytics mountain. By providing a simple, scalable publisher/subscriber model, you simplify design; by using a platform that supports data partitioning, transactional updates, and write through capabilities, you gain scalability.&lt;/p&gt;&#xD;
&lt;p&gt;The data grid’s flexible query API means that events can literally react when data is available.&lt;/p&gt;&#xD;
&lt;p&gt;For a call center, for example, you want to immediately identify signals that show that the caller should be handled differently; imagine an ecommerce site that was able to determine immediately if a user was losing interest, and thus could respond appropriately, before the customer moves on.&lt;/p&gt;&#xD;
&lt;p&gt;With external processes and a long funnel for data, immediate-response capabilities are very difficult to implement, not just because of latency but because the data transformations tend to homogenize the data, instead of allowing rich expressions and flexible event types.&lt;/p&gt;&#xD;
&lt;p&gt;The data grid also has much richer support in terms of client applications. Instead of applications going through an API that focuses on a specific phase of the data’s life (for example, an API focused on HBase), you can focus on a generic API that can capture events at any point in their lifecycle, and from anywhere. An external monitoring process, then, can have the same immediate, partition-aware access to data that the integrated message-handling system does; adding features and analysis is just a matter of connecting a client to the data grid.&lt;/p&gt;&#xD;
&lt;p&gt;Here we have a quick demo that shows much of this in motion. We have a market analysis application, deployed into GigaSpaces XAP via our new Cloud deployment too, Cloudify; it uses an event-driven system to display realtime data, with a write-through to Cassandra on the back-end. The design is very simple, and demonstrates the principles we've discussed here - and can scale up and down depending on demand.&lt;/p&gt;&#xD;
&lt;div class="wlWriterEditableSmartContent" id="scid:5737277B-5D6D-4f48-ABFC-DD9C333F4C5D:e9f98245-a26d-41b1-8587-7d4a08bd85fc" style="margin: 0px; display: inline; float: none; padding: 0px;"&gt;&#xD;
&lt;div id="9b305514-6440-4357-89d6-e7eb0da53482" style="margin: 0px; padding: 0px; display: inline;"&gt;&#xD;
&lt;div&gt;&#xD;
&lt;object height="330" width="550"&gt;&#xD;
&lt;param name="movie" value="http://www.youtube.com/v/rt7-UnmliTE?hl=en&amp;amp;hd=1"&gt;&lt;/param&gt;&lt;embed height="330" src="http://www.youtube.com/v/rt7-UnmliTE?hl=en&amp;amp;hd=1" type="application/x-shockwave-flash" width="550"&gt;&lt;/embed&gt;&#xD;
&lt;/object&gt;&#xD;
&lt;/div&gt;&#xD;
&lt;/div&gt;&#xD;
&lt;/div&gt;&#xD;
&lt;h2&gt;Final words&lt;/h2&gt;&#xD;
&lt;p&gt;As real-time-analytics &lt;a href="http://gigaom.com/cloud/big-data-in-real-time-is-no-fantasy/" target="_self"&gt;gets into mainstream application design&lt;/a&gt;, it becomes important to draw a blueprint on how to build this sort of system without re-inventing the wheel.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;a href="http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html" target="_self"&gt;Todd Hoff&lt;/a&gt; (HighScalability) and &lt;a href="http://www.facebook.com/video/video.php?v=707216889765&amp;amp;oid=9445547199&amp;amp;comments" target="_self"&gt;Alex Himel&lt;/a&gt; (Facebook) provided a fairly detailed description on their solution and even more importantly they even shared the rationales that made them do things in certain ways.&lt;/p&gt;&#xD;
&lt;p&gt;One main difference in assumptions that lead to the different implementation strategies are in reliable memory for event processing, and in the use of passive data storage.&lt;/p&gt;&#xD;
&lt;p&gt;Another difference is that we had to to think of the solution as an easily cloneable solution and therefore a lot of attention was put on the simplicity of the runtime, packaging and management of the solution.&lt;/p&gt;&#xD;
&lt;p&gt;Yet another difference is that we couldn’t decide on a specific database as there isn’t a "one size fits all" solution – for certain customers, SQL would still be preferred choice and the fact that we can buffer the write to the database gives them more headroom while still allowing them to scale on writes.&lt;/p&gt;&#xD;
&lt;p&gt;I hope that this would lead to constructive dialogue on the various tradeoffs which will serve the entire industry...&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://mrkn.co/forums/java/general/562.html#" target="_self"&gt;Video: Realtime Analytics for Big Data: A Facebook Case Study&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;&#xD;
&lt;/ul&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=eA1CZK6Hgbw:_fbtjZkcxyY:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=eA1CZK6Hgbw:_fbtjZkcxyY:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/eA1CZK6Hgbw" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach.html</feedburner:origLink></entry>
    <entry>
        <title>Real Time analytics for Big Data: Facebook's New Realtime Analytics System</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/zRkBdGzkQvs/real-time-analytics-for-big-data-an-alternative-approach-to-facebooks-new-realtime-analytics-system.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach-to-facebooks-new-realtime-analytics-system.html" thr:count="5" thr:updated="2011-12-25T19:10:55+01:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef0154335fa212970c</id>
        <published>2011-07-08T14:49:34+02:00</published>
        <updated>2011-12-08T22:24:30+01:00</updated>
        <summary>Recently, I was reading Todd Hoff's write-up on FaceBook real time analytics system. As usual, Todd did an excellent job in summarizing this video from Engineering Manager at Facebook Alex Himel, Engineering Manager at Facebook. In this first post, I’d...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Architecture" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Grid" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="EDA" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Grid" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="NOSQL" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="real-time" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="scalability" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;&lt;img align="right" alt="" src="http://t2.gstatic.com/images?q=tbn:ANd9GcSRHQP8LrGqDlUad0ulsubQvqgQqfAOhnZqtnIyfZdlCbT4vx1J" style="display: inline; float: right;"&gt;&lt;/img&gt;Recently, I was reading Todd Hoff's write-up on &lt;a href="http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html"&gt;FaceBook real time analytics system&lt;/a&gt;. As usual, Todd did an excellent job in summarizing &lt;a href="http://www.facebook.com/video/video.php?v=707216889765&amp;amp;oid=9445547199&amp;amp;comments"&gt;this video&lt;/a&gt; from Engineering Manager at Facebook &lt;a href="http://www.facebook.com/ahimel"&gt;Alex Himel&lt;/a&gt;, Engineering Manager at Facebook.&lt;/p&gt;&#xD;
&lt;p&gt;In &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach-to-facebooks-new-realtime-analytics-system.html"&gt;this first post&lt;/a&gt;, I’d like to summarize the case study, and consider some things that weren't mentioned in the summaries. This will lead to an architecture for building your own Realtime Time Analytics for Big-Data that might be easier to implement, using Facebook's experience as a starting point and guide as well as the experience gathered through a recent work with few of GigaSpaces customers. The &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach.html"&gt;second post&lt;/a&gt; provide a summary of that new approach as well as a pattern and a demo for building your own Real Time Analytics system..&lt;/p&gt;&#xD;
&lt;p&gt;&lt;span style="font-size: 20px; font-weight: bold;"&gt;The Business Drive for real time analytics: Time is money&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p&gt;The main drive for many of the innovations around realtime analytics has to do with competitiveness and cost, just as with most other advances.&lt;/p&gt;&#xD;
&lt;p&gt;For example, during the past few years financial organizations realized that inter-day risk analysis of their customers' portfolios translated to increased profit as they could react faster to profit and loss events.&lt;/p&gt;&#xD;
&lt;p&gt;The same applies to many of the online ecommerce and social sites. Knowing what your users are doing on your site in real time and matching what they do with more targeted information transforms into better conversion rate and better user satisfaction, which means more money in the end.&lt;/p&gt;&#xD;
&lt;p&gt;Todd provides similar reasoning to describe the motivation behind Facebook's new system:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Content producers can see what people like, which will enable content producers to generate more of what people like, which raises the content quality of the web, which gives users a better Facebook experience.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;h2&gt;Why now?&lt;/h2&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Real time analytics goes mainstream&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;The massive transition to online and social applications makes it possible to track user patterns like never before. The correlation between the quality of data that providers track and their business success is closely related: for example, e-commerce customers want to know what their friends think about products or services, right in the middle of their shopping experience. If sites cannot keep up with their thousands of users in real-time, they can lose their customers to sites that can.&lt;/p&gt;&#xD;
&lt;p&gt;So while risk analytics in financial industry was still a fairly small niche of the analytics market the demand for real time analytics in Social, eCommrce and SaaS applications brought the demand for real time analytics to main-stream business under massive load.&lt;/p&gt;&#xD;
&lt;p&gt;No one has time for batch processing anymore.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Technology advancement &lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;Newer infrastructures and technologies like tera-scale memory, nosql, parallel processing platforms, and cloud computing, provide new ways to process massive amount of data in shorter time and with lower cost.  As most of the current analytics systems weren’t built to take advantage of these new technologies and capabilities, they haven't been able to adopt to real time requirements without massive changes.&lt;/p&gt;&#xD;
&lt;h2&gt;&lt;strong&gt;Hadoop Map/Reduce doesn’t fit the real time business&lt;/strong&gt;&lt;/h2&gt;&#xD;
&lt;p&gt;One of the hottest trends in the analytics space is the use of &lt;a href="http://hadoop.apache.org/" target="_blank"&gt;Hadoop&lt;/a&gt; as almost the de-facto standard for many of the batch processing analytics application. While Hadoop (and Map/Reduce in general) does a pretty good job in processing massive logs and data through parallel batch processing, it wasn’t designed to serve the real time part of the business.&lt;/p&gt;&#xD;
&lt;p&gt;Strong evidence for that can be seen from the moves of those who were known as the “poster children” for Map/Reduce: Google and Yahoo both moved away from Map/Reduce. Google has moved to  &lt;a href="http://www.theregister.co.uk/2010/09/24/google_percolator/"&gt;Google Percolator &lt;/a&gt;for its indexing service. Yahoo came-up with a new service&lt;a href="http://blog.programmableweb.com/2010/12/31/yahoos-open-sourced-s4-could-be-a-real-time-cloud-platform/"&gt; S4&lt;img alt="clip_image001" border="0" height="2" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef0154335fa1e1970c-pi" style="background-image: none; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="clip_image001" width="2"&gt;&lt;/img&gt;&lt;/a&gt; which was designed specifically for real time processing.&lt;/p&gt;&#xD;
&lt;p&gt;It is therefore not surprising that Facebook reached the same conclusion as it relate to Hadoop:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Hadoop/Hive..Not realtime. Many dependencies. Lots of points of failure. Complicated system. Not dependable enough to hit realtime goals.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;h4&gt; &lt;span style="font-size: 20px;"&gt;Facebook's Real Time Analytics system&lt;/span&gt;&lt;/h4&gt;&#xD;
&lt;p&gt;According Todd,  Facebook evaluated a fairly long list of alternatives including as MySQL, an in-memory cache, Hadoop Hive/Map Reduce. I highly recommend reading the full details from &lt;a href="http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html" target="_blank"&gt;Todd's post&lt;/a&gt;.&lt;/p&gt;&#xD;
&lt;p&gt;I tried to outline Facebook architecture based on Todd's summary in the diagram below:&lt;/p&gt;&#xD;
&lt;p&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e898cf875970d-pi"&gt;&lt;img alt="image" border="0" height="208" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e898cf8a0970d-pi" style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="image" width="308"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;p&gt;Every user activity triggers an asynchronous event, through AJAX – this event is logged in a tail log using Scribe.  Ptail is used to aggregate the different individual logs into a consolidated stream. The stream is batched in 1.5 sec groupings by Puma which stores the event batch into HBase. The real time logs are kept for a certain period of time and than get cleared from the system.&lt;/p&gt;&#xD;
&lt;p&gt;Obviously this description is a fairly simplistic view – the full details are provided in the original post.&lt;/p&gt;&#xD;
&lt;h2&gt;&lt;span style="font-size: 15px;"&gt;Evaluating the Facebook Architecture&lt;/span&gt;&lt;/h2&gt;&#xD;
&lt;p&gt;Facebook reasoning behind their technology evaluation seems acceptable for the most part, although there are some obvious concerns.&lt;/p&gt;&#xD;
&lt;p&gt;There were two things that caught my attention:&lt;/p&gt;&#xD;
&lt;h4&gt;Memory Counters&lt;/h4&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;(Facebook) felt in-memory counters, for reasons not explained, weren't as accurate as other approaches. Even a 1% failure rate would be unacceptable. Analytics drive money so the counters have to be highly accurate.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;It sounds to me that the evaluation was based on memcached. By default, it's not highly available; a failure would result in loss of data. Obviously, that doesn’t apply to some other memory based solutions such as In Memory Data Grids (for example, &lt;a href="http://gigaspaces.com/" target="_blank"&gt;GigaSpaces&lt;/a&gt; and &lt;a href="http://www.oracle.com/technetwork/middleware/coherence/index.html" target="_blank"&gt;Coherence&lt;/a&gt; both were designed for high resiliency.)&lt;/p&gt;&#xD;
&lt;h4&gt;Cassandra vs HBase&lt;/h4&gt;&#xD;
&lt;p&gt;The choice of &lt;a href="http://hbase.apache.org/" target="_blank"&gt;HBase&lt;/a&gt; over &lt;a href="http://cassandra.apache.org/" target="_blank"&gt;Cassandra&lt;/a&gt; was also very interesting mainly since Cassandra was developed by another Facebook team to address write scalability.  The choice had to do with the write rate differences between the two alternatives at the time of evaluation:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;HBase seemed a better solution based on availability and the write rate. Write rate was the huge bottleneck being solved.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;&lt;a href="http://www.linkedin.com/in/erichauser"&gt;Eric Hauser&lt;/a&gt;  posted a comment on this analysis which seem to indicate that this issue has been addressed with Cassandra:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;When Facebook engineers started the project 6 months ago, Cassandra did not have distributed counters which is now committed in trunk. Twitter is making a large investment on Facebook for real-time analytics (see Rainbird). Write rate should be less of of bottleneck for Cassandra now that counter writes are spread out across multiple hosts. For HBase, every counter is still bound by the performance of single region server? A performance comparison of the two would be interesting&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;Eric's comment is indicative of how dynamic the NoSQL space is. I’d be interested in how different the technology selection would be now.&lt;/p&gt;&#xD;
&lt;h3&gt;The Architecture&lt;/h3&gt;&#xD;
&lt;p&gt;There are a few common principles that drive the architecture for this type of system:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Events logging needs to be extremely fast, to minimize the latency impact on the site.&lt;/li&gt;&#xD;
&lt;li&gt;Events need to be reliable, otherwise the entire system's accuracy is questionable and the data is devalued.&lt;/li&gt;&#xD;
&lt;li&gt;The real time data in the form of logs is kept for a certain period of time (x hours or x days)&lt;/li&gt;&#xD;
&lt;li&gt;Write scalability is key.&lt;/li&gt;&#xD;
&lt;li&gt;Post processing can happens in batches, the size of the batch depends on how *real-time* we need this data to be.&lt;/li&gt;&#xD;
&lt;li&gt;Writes to a backend database need to be done asynchronously.&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;Facebook seem to follow those same principles in their architecture while keeping the system at scale at a fairly impressive rate.&lt;/p&gt;&#xD;
&lt;p&gt;There are a few questions that are still open as I don’t have full visibility into their system:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Are Ptail and Puma centralized components? If so, don't they pose a potential bottleneck? Based on Todd's summary and Alex' presentation, it seems that the way Facebook scaled their system is by splitting the PTail stream by categories of events so each type can be handled by a different data center.&lt;/li&gt;&#xD;
&lt;li&gt;Puma batches event logs in memory before it writes them into Hbase – what happens if Puma fails before the batch is written?&lt;/li&gt;&#xD;
&lt;li&gt;The solution seemed to be limited to handling simple counters. This seem to be a fairly severe limitation, as many systems need to produce more complex relationships even during the real time part of the system, as also indicated as part of the future enhancements, as shown:&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;"(We need to know) how many people across&lt;strong&gt; &lt;/strong&gt;a time window liked a URL. Easy to do in MapReduce, hard to do with a naive counter solution.”&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;In one point, it is noted that Facebook chose not to rely on memory for counters. However, throughout the description it seems that there is still strong reliance on keeping data within memory boundaries:&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;“(We) write extremely lean log lines. The more compact the log lines &lt;strong&gt;the more can be stored in memory&lt;/strong&gt;..”&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;“(We) batch for 1.5 seconds on average. Would like to batch longer but they have so many URLs that they &lt;strong&gt;run out of memory when creating a hashtable&lt;/strong&gt;”&lt;/p&gt;&#xD;
&lt;p&gt;Looking backward – wouldn’t it be better to store the data in-memory in the first place? Why add the extra architecture components, if you're able to make memory work for you? This is a crucial question, of course, because it focuses attention on memory availability. As mentioned, though, there are in-memory data grids that are designed for just this kind of situation.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;It is noted that Puma writes it batches to Hbase sequentially:&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;“(We) wait for last flush to complete for starting new batch to avoid lock contention issues...”&lt;/p&gt;&#xD;
&lt;p&gt;What if this rate is lower than the actual write rate?&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;h2&gt;Realtime.Analytics.next&lt;/h2&gt;&#xD;
&lt;h2&gt;&lt;span style="font-weight: normal; font-size: small;"&gt;Interestingly enough, I was asked to draw a solution for similar challenge for a voice recording system: collect lots data from various sources and process them in real time. The good news is that it's doable, as shown by Facebook's success. &lt;/span&gt;&lt;/h2&gt;&#xD;
&lt;h2&gt;&lt;span style="font-weight: normal; font-size: small;"&gt;The better news is that it's actually fairly easy. We can add rich query capabilities, elastic scaling, database and platform neutrality, evolution of data, and more without making things unnecessarily difficult - it's not so much that this is the next generation of realtime analytics as much as map/reduce and the hbase approach used by facebook is the &lt;em&gt;previous&lt;/em&gt; generation.&lt;/span&gt;&lt;/h2&gt;&#xD;
&lt;p&gt;&lt;span style="font-weight: normal; font-size: small;"&gt;I'll describe how it can be done more simply in the &lt;a href="Real Time Analytics for Big Data: An Alternative Approach" target="_self"&gt;next post&lt;/a&gt;.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;h1&gt;References&lt;/h1&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;span style="font-size: 13px;"&gt;&lt;a href="http://mrkn.co/forums/java/general/562.html#" target="_self"&gt;Video: Realtime Analytics for Big Data: A Facebook Case Study&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="font-size: 13px;"&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach.html" target="_self"&gt;Real Time Analytics for Big Data: An Alternative Approach&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="font-size: 10pt;"&gt;&lt;span&gt;&lt;a href="http://gigaom.com/cloud/big-data-in-real-time-is-no-fantasy/" target="_self"&gt;Big data in real time is no fantasy&lt;/a&gt;&lt;/span&gt;&lt;span&gt; (GigaOm)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html"&gt;Facebook's New Realtime Analytics System: HBase To Process 20 Billion Events Per Day&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/06/readwrite-scale-without-complete-re-write.html"&gt;Read/write scale without complete re-write&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="font-size: 10pt;"&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2009/10/how-mapreduce-and-the-cloud-are-affecting-analytics-applications.html" target="_self"&gt;How Map/Reduce and the Cloud are Affecting Analytics Applications&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;span style="font-size: 10pt;"&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2008/10/is-mapreduce-going-to-main-stream.html" target="_self"&gt;Is MapReduce going mainstream?&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2010/03/memory-is-the-new-disk-for-the-enterprise.html"&gt;Memory is the New Disk for the Enterprise&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;/ul&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=zRkBdGzkQvs:8STfAqb5jqU:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=zRkBdGzkQvs:8STfAqb5jqU:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/zRkBdGzkQvs" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach-to-facebooks-new-realtime-analytics-system.html</feedburner:origLink></entry>
    <entry>
        <title>Read/write scale without complete re-write</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/EIrvzmSQxvs/readwrite-scale-without-complete-re-write.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/06/readwrite-scale-without-complete-re-write.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef014e88dd9601970d</id>
        <published>2011-06-03T19:06:41+02:00</published>
        <updated>2011-06-07T15:32:03+02:00</updated>
        <summary>Last week I was attending one of our Partner events in Stockholm where I presented the convergence of trends in the data scalability world – specifically the transition from NoSQL to NewSQL and the convergence of trends that brings the...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Grid" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="NOSQL" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="scalability" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;Last week I was attending one of our Partner events in Stockholm where I presented the convergence of trends in the data scalability world – specifically the transition from &lt;a href="http://www.slideboom.com/presentations/363518" target="_self"&gt;NoSQL to NewSQL&lt;/a&gt; and the convergence of trends that brings the existing SQL and new NoSQL world much closer together as I noted in a previous post, "&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2010/07/yesql-an-overview-of-the-different-queries-semantics-in-the-post-only-sql-world.html"&gt;YeSQL: An Overview of the Various Query Semantics in the Post Only-SQL World&lt;/a&gt;."&lt;/p&gt;&#xD;
&lt;p&gt;&lt;img align="left" alt="Avanza Communications LLC" src="http://media02.linkedin.com/media/p/3/000/047/2b0/11cb12e.png" style="margin: 0px 4px 0px 0px; display: inline; float: left;"&gt;&lt;/img&gt;During the event, &lt;a href="http://www.linkedin.com/in/bodinger"&gt;Ronnie Bodinger&lt;/a&gt;, Head of IT at &lt;a href="http://www.linkedin.com/company/avanza-communications-llc?trk=ppro_cprof"&gt;Avanza Bank AB&lt;/a&gt;, gave an excellent talk on how they turned their existing online banking application into a new site that was designed for read/write scaling.&lt;/p&gt;&#xD;
&lt;h2&gt;Avanza's System Description:&lt;/h2&gt;&#xD;
&lt;p&gt;Avanza Bank is a  Swedish bank that makes it easy for investors to make equity transactions and fund switches. It runs the most trades on the Stockholm Stock Exchange.&lt;/p&gt;&#xD;
&lt;p&gt;It prides itself for providing advanced tools for its investors through its online banking system.&lt;/p&gt;&#xD;
&lt;p&gt;The current online system is a typical web site based on Java/JSP and Spring.&lt;/p&gt;&#xD;
&lt;h3&gt;Scaling architecture of the existing site&lt;/h3&gt;&#xD;
&lt;p&gt;Most of the interaction with the current site are read-mostly, the main scaling challenge was scaling on concurrent read operations. Read scaling was addressed through a side-cache architecture that is common with many of the existing LAMP + Memcached deployments where the first query hits the database and the following queries hit the cache.&lt;/p&gt;&#xD;
&lt;h2&gt;The New System&lt;/h2&gt;&#xD;
&lt;p&gt;The new site was designed to fit into the real-time and social era.  This means that a lot of the traffic and activities are now being generated by user activities and not by just by the site owner. This activities need to be presented in real-time to the bank users.&lt;/p&gt;&#xD;
&lt;h3&gt;Challenges&lt;/h3&gt;&#xD;
&lt;p&gt;The changes to the new site lead to a significant change in the traffic and load behavior that drives a new class of scaling challenges:&lt;/p&gt;&#xD;
&lt;p&gt;Write scaling&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;In case where there are lots of updates involved the existing side-cache architecture leads to diminishing returns as the cache becomes obsolete pretty quickly and therefore synchronizing the cache only adds overhead.&lt;/p&gt;&#xD;
&lt;p&gt;Using Oracle RAC with a high end hardware platform didn’t prove itself either and yielded a fairly expensive solution that didn’t meet the scaling requirements.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;Unlike “Green Field” applications, Avanza has an existing online application (a “Brown Field”) that serves its current customers. That brings the following list of additional challenges :&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Existing Data Model &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;The entire data model of the application was designed for a relational model – changing the data model or moving it to a new NoSQL architecture as was considered would involve a huge change that could turn into a potentially years-long effort.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;p&gt;Legacy system&lt;/p&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;&lt;img align="left" alt="" height="94" src="http://t1.gstatic.com/images?q=tbn:ANd9GcR9OZ8FlKqzNgsO16CTQOK0rQXEbH8e9R29OOCOoYGhtoiYfbWK-Qyuds0" style="margin: 0px 7px 0px 0px; display: inline; float: left;" width="187"&gt;&lt;/img&gt;The online bank application consists of large set of legacy application and third party services. Re-writing the existing infrastructure is either impossible (due to the dependency on third party tools) or impractical.&lt;/p&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Complex environment &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;&lt;img align="left" alt="" height="108" src="http://perro.si/wp-content/uploads/2008/08/registration.gif" style="margin: 0px 8px 0px 0px; display: inline; float: left;" width="118"&gt;&lt;/img&gt;As it often the case, a large portion of the legacy applications weren’t designed for scale, and weren’t built with a clear holistic architecture as they were built through layers over the years. This increases the complexity of scaling by quite a bit.&lt;/p&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Existing Skillset &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;The existing development team already had fairly good knowledge of Java and Java EE. Changing the team and/or developing a completely new skillset is a huge barrier as the ramp-up time required to bring new developers up to speed with the complexity of the system can take years.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;&lt;span style="font-size: 20px; font-weight: bold;"&gt;The solution: Read/write scale without complete re-write&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p&gt;It was clear that meeting the new scaling challenges would involve changes to the existing application – the main question was how to scope that change so that it wouldn’t require a complete re-write. The second goal was to build the change in a way that would reduce the TCO of the current system.&lt;/p&gt;&#xD;
&lt;p&gt;To achieve those two goals the following approach was used:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Minimize the change by clearly Identifying the scalability hotspots &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;The areas of the application that need intensive write access are often small parts of the overall system.  The first step would therefore be to minimize the change to only the hot spots of the application and keep the rest of the application un-touched. In the case of Avanza, the hot spots were identified on certain tables used by the online web application. Most of the backend systems were still accessing the database for reporting,  synchronization and batch processing and could therefore remain unchanged.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Keep the database as is &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;One of the key piece in the current design is the ability to address the read/write scalability outside of the database context (See the next bullet). This makes it possible to keep the existing database and the schema of the data unchanged. In that way, all the rest of the systems continue to work with the database as if nothing changed.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Put an In Memory Data Grid as a front end to the database &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Scaling the application is done by front-ending the application with an &lt;a href="http://www.gigaspaces.com/wiki/display/XAP8/The+In-Memory+Data+Grid" target="_self"&gt;In-Memory-Data-Grid&lt;/a&gt; (IMDG). The IMDG contains all the hot tables or rows of the original database. The online web application accesses the IMDG instead of the database. The IMDG is distributed in nature thus allowing scalability by distributing the load over a cluster of machines for both read and write operations.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Use write-behind to reduce the synchronization overhead &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Updates from the IMDG to the underlying database is done asynchronously in batches through an &lt;a href="http://www.gigaspaces.com/wiki/display/XAP8/Persistency" target="_self"&gt;internal queuing mechanism&lt;/a&gt; (redo-log).&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Use O/R mapping to map the data back into its original format &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;In many cases to achieve best scalability, we need to partition the data. This often involves changes to the data schema. Changing the data schema could break the entire system including the areas that don’t suffer from the scalability bottleneck. To meet this impedance mismatch, we scope the data schema changes only to the IMDG. The data is mapped from the IMDG schema to the original schema through standard O/R mapping tools such as Hibernate or OpenJPA.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Use standard Java API and framework to leverage existing skillset &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;One of the challenges with many of the new NoSQL database alternatives is that they often force a complete re-architecture.  This comes with the a fairly high cost of re-building the skillsets within the organization across the board for developing against new APIs as well as for maintaining capacity and sizing of those new databases.&lt;/p&gt;&#xD;
&lt;p&gt;IMDGs such as GigaSpaces expose &lt;a href="http://www.gigaspaces.com/wiki/display/XAP8/Modeling+and+Accessing+Your+Data" target="_self"&gt;standard APIs&lt;/a&gt;, such as &lt;a href="http://www.gigaspaces.com/wiki/display/XAP8/JPA+Support" target="_self"&gt;JPA&lt;/a&gt;. In addition, they allow organizations to extend the use of the existing database while removing a large part of read/write load – both the use of standard APIs and existing database enables organizations to leverage their existing skillset and still meet their scalability requirements. It also enables them to take a smoother transition (through baby steps) into a completely new scale-out architecture by allowing a plug-in new scalable database at a later stage.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Use two parallel (old/new) sites to enable gradual transition &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Switching all the customers into a new system at once is often a bad idea. A better approach would be to enable gradual transition of selected customers into the new sites. A common model to achieve that would be to run two parallel sites. The challenge in doing so is the synchronization between the two parallel sites. In the case of Avanza, they used the GigaSpaces Mirror service to synch all the changes from one site to the other and in that way keep the two sites up to date.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;The diagram below provides a visual summary of this approach:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef015432bd61cc970c-pi"&gt;&lt;img alt="image" border="0" height="244" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e88dd95da970d-pi" style="background-image: none; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="image" width="184"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;h2&gt;The TCO angle&lt;/h2&gt;&#xD;
&lt;p&gt;The second goal in the project was to reduce the TCO of the current system.&lt;/p&gt;&#xD;
&lt;p&gt;This is achieved in the following way:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;strong&gt;Use RAM for high performance access and disk for long term storage&lt;/strong&gt; &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;As I noted in one &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2010/03/memory-is-the-new-disk-for-the-enterprise.html"&gt;recent post&lt;/a&gt;, a RAM based solution can be 10x – 100x cheaper than Disk based solution for high performance applications.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;In addition to that, the price for RAM goes down continuously.&lt;/p&gt;&#xD;
&lt;p&gt;1GB can cost only 1.9$ a month..  , storing 1T bytes of data completely in RAM can fit into a single RAC with total cost of $1.8k per month.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef01538eea20d3970b-pi"&gt;&lt;img alt="image" border="0" height="152" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e88dd95f0970d-pi" style="background-image: none; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="image" width="244"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;p&gt;The optimal solution would therefore be to use RAM to manage data that needs high speed write/read access and disk-based storage for the long-term data that is accessed less frequently.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;strong&gt;Use commodity Database and HW&lt;/strong&gt; – A single instance of an Oracle RAC deployment could reach to $500k. Putting a data-grid in front of the database enables us to remove the needs for many of the high-end features available in the Oracle RAC database. It also enables us to remove the need for high end hardware such as storage devices, infiniband network etc. &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;In this specific case, it was possible to move the data into MySQL and use commodity Dell machines to run the entire relational data system.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;h2&gt;Final words&lt;/h2&gt;&#xD;
&lt;p&gt;Many existing applications are built with layers on top of layers on top of layers of development. Relational databases often sit at the heart of those systems. Scaling these systems is therefore an extremely challenging task. This leads a lot of organizations to take the easy route, simply paying more for more high-end hardware and databases. We have reached to the point where this approach doesn’t cut it for many cases - and it's simply too expensive to maintain in the face of cheaper, more scalable alternatives.&lt;/p&gt;&#xD;
&lt;p&gt;In a world where the impact of software accretion is no longer tolerable, it's clear that a chnage is inevitable if we're to meet the new demands for scalability. The real question is how to make that change.  The approach that was taken by Avanza Bank in their use of GigaSpaces is an excellent reference: it shows that they were not just able to meet the new scaling requirements fairly quickly through measurable and easy small changes, but they also reduced their cost of ownership significantly as noted recently by &lt;a href="http://www.linkedin.com/in/bodinger"&gt;Ronnie Bodinger&lt;/a&gt;'s statement:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;&lt;em&gt;“Throwing out Oracle's sluggish databases increases Avanza performance of their business-critical systems, while reducing license costs significantly."&lt;/em&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;em&gt;..&lt;/em&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;em&gt;"Today we have a large Oracle Cluster, which costs a lot. The new system goes in a fraction of it."&lt;/em&gt;&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;You can read the full article &lt;a href="http://computersweden.idg.se/2.2683/1.355460/avanza-okar-prestanda-utan-oracle"&gt;here&lt;/a&gt; (Note the article is in Swedish – use Google Translator to read the English version of the article..)&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;The recorded version of Ronnie presentation as it was captures through one of the audience camera is available &lt;a href="http://www.youtube.com/embed/LFeTHknEmi4" target="_blank" title="Ronie Bodinger (Avanza) presentation on Avanza"&gt;here&lt;/a&gt;.  &lt;/p&gt;&#xD;
&lt;p&gt;Incidentally, I'd like to offer a sincere word of thanks to &lt;a href="http://www.so4it.com/home/home.html" target="_self"&gt;So4It&lt;/a&gt;,  who did an excellent job in working with Avanza and organizing the partner event.&lt;/p&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=EIrvzmSQxvs:W_7rLwsZ1sc:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=EIrvzmSQxvs:W_7rLwsZ1sc:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/EIrvzmSQxvs" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/06/readwrite-scale-without-complete-re-write.html</feedburner:origLink></entry>
    <entry>
        <title>GigaSpaces Citrix integration on top of OpenStack</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/GbhT3dw7lEs/citrix-openstack.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/05/citrix-openstack.html" thr:count="1" thr:updated="2011-11-10T10:51:19+01:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef014e8892cea4970d</id>
        <published>2011-05-25T19:27:00+02:00</published>
        <updated>2011-05-24T22:09:51+02:00</updated>
        <summary>In my one of my previous posts (GigaSpaces OpenStack Explained) I made a reference to the joint work that we are doing with Citrix through the integration of our new PaaS API: The specific integration with NetScaler (load balancer) and...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Citrix" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;In my one of my previous posts (&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/04/gigaspaces-openstack-explained.html"&gt;GigaSpaces OpenStack Explained&lt;/a&gt;) I made a reference to the joint work that we are doing with Citrix through the integration of our new PaaS API:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;The specific integration with NetScaler (load balancer) and XenServer will be achieved through more open interfaces provided through the Citrix/Openstack contribution, which means that OpenStack users can use those intefaces to plug in any hypervisor or load balancer.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;In this post I’d like to elaborate more specifically on the current and planned integration work.&lt;/p&gt;&#xD;
&lt;h2&gt;GigaSpaces Citrix integration on top of OpenStack&lt;/h2&gt;&#xD;
&lt;p&gt;The block diagram below describes the main layers that comprise the joint &lt;a href="http://www.gigaspaces.com/node/2003" target="_blank" title="GigaSpaces and Citrix OpenStack Partnership"&gt;GigaSpaces/Citrix&lt;/a&gt; integration.&lt;/p&gt;&#xD;
&lt;p&gt;The OpenStack layers (marked in bold-green) enable making these integration points more open, as we will be using the OpenStack API for the Compute (NOVA) and Load-Balancer API instead of using the Citrix API directly.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef01538ea0ad69970b-pi"&gt;&lt;img alt="image" border="0" height="176" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e889442a3970d-pi" style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="image" width="300"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;h3&gt;OpenStack Compute (Nova)&lt;/h3&gt;&#xD;
&lt;p&gt;The GigaSpaces integration with OpenStack Nova API is done through the previously announced &lt;a href="https://github.com/jclouds/jclouds/tree/master/apis/nova/src/main/java/org/jclouds/openstack"&gt;JClouds provider&lt;/a&gt; contribution.&lt;/p&gt;&#xD;
&lt;p&gt;This integration enables you to run Citrix Xen VMs and manage the them through the Citrix Management console. Similarly, you can use this same integration to plug-in other VMs.&lt;/p&gt;&#xD;
&lt;h3&gt;OpenStack Load Balancer&lt;/h3&gt;&#xD;
&lt;p&gt;The OpenStack Load Balanacer API is a slight variation on the current &lt;a href="http://docs.rackspacecloud.com/loadbalancers/api/v1.0/clb-devguide/content/ch04s01.html"&gt;Rackspace Load Balancer API&lt;/a&gt;. This integration follows the exact path as with the NOVA integration i.e. we will use the JClouds Load Balancer abstraction and plug-in an OpenStack load balancer provider as one of the providers' plug-ins.&lt;/p&gt;&#xD;
&lt;p&gt;This integration enables you to run the Citrix Netscaler load balancer through the OpenStack API, and thus enable you to leverage the performance and scaling benefit of Netscalaer while at the same time keeping an open interface to plug-in other load balancers.&lt;/p&gt;&#xD;
&lt;h2&gt;Citrix certified version of OpenStack&lt;/h2&gt;&#xD;
&lt;p&gt;OpenStack enables you to download the full source code of the project and build an Amazon like cloud infrastructure in your local environment. As with many other open-source offerings (Linux is a good example) – most enterprises don't have the skills or the time to go through the process of building their own cloud from the source. These organizations would need a pre-packaged version of OpenStack that comes with built-in support, production management tools etc.&lt;/p&gt;&#xD;
&lt;p&gt;The Citrix certified version of OpenStack is geared specifically for this purpose.&lt;/p&gt;&#xD;
&lt;p&gt;The solution will be comprised of two primary components: a Citrix-certified version of OpenStack and a cloud-optimized version of XenServer.&lt;/p&gt;&#xD;
&lt;p&gt;The product will be sold with Rackspace Cloud Builders, who will provide deployment services, training, and ongoing support for customer clouds.&lt;/p&gt;&#xD;
&lt;p&gt;Customers can get started building their clouds today with the &lt;a href="http://www.citrix.com/olympus" target="_blank" title="Citrix Early Access Program"&gt;Early Access Program&lt;/a&gt;. The program will provide access to the software (Citrix), a reference architecture and PowerEdge C server platforms (Dell), and services (RAX Cloud Builders) they need to begin building their cloud. However, as Open Cloud and OpenStack are about openness, customers can work with any group of providers/partners to build their cloud, and will be supported in the Early Access Program. The end result is open source technologies delivering a massively scalable cloud operating system.&lt;/p&gt;&#xD;
&lt;h3&gt;&lt;span style="font-size: large;"&gt;Citrix OpenStack virtual appliance project&lt;/span&gt;&lt;/h3&gt;&#xD;
&lt;p&gt;This video demonstrates the deployment and management of an OpenStack cloud, with the software packaged as a virtual appliance. This allows service providers to track, install, and upgrade their cloud as a single virtual machine image, avoiding the complexity of deploying directly from packages. The video shows a complete installation from bare metal to working cloud, using Citrix's packaged solution. It then goes on to show how the cloud can be scaled up to add new nodes in one click.&lt;/p&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;p&gt;&#xD;
&lt;object data="http://www.citrix.com/tv/s/tv/players/ctv_viral_1_0.swf?ctv=3837&amp;amp;autoStart=false&amp;amp;height=412&amp;amp;width=486&amp;amp;hd=false" height="412" id="CustomCTVPlayer3837" type="application/x-shockwave-flash" width="475"&gt;&#xD;
&lt;param name="data" value="http://www.citrix.com/tv/s/tv/players/ctv_viral_1_0.swf?ctv=3837&amp;amp;autoStart=false&amp;amp;height=412&amp;amp;width=486&amp;amp;hd=false"&gt;&lt;/param&gt;&#xD;
&lt;param name="quality" value="high"&gt;&lt;/param&gt;&#xD;
&lt;param name="bgcolor" value="#ffffff"&gt;&lt;/param&gt;&#xD;
&lt;param name="allowScriptAccess" value="always"&gt;&lt;/param&gt;&#xD;
&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&#xD;
&lt;param name="wmode" value="opaque"&gt;&lt;/param&gt;&#xD;
&lt;param name="src" value="http://www.citrix.com/tv/s/tv/players/ctv_viral_1_0.swf?ctv=3837&amp;amp;autoStart=false&amp;amp;height=412&amp;amp;width=486&amp;amp;hd=false"&gt;&lt;/param&gt;&#xD;
&lt;param name="name" value="CustomCTVPlayer3837"&gt;&lt;/param&gt;&#xD;
&lt;param name="align" value="middle"&gt;&lt;/param&gt;&#xD;
&lt;param name="allowfullscreen" value="true"&gt;&lt;/param&gt;&#xD;
&lt;/object&gt;&#xD;
&lt;/p&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;h2&gt;References&lt;/h2&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;h5&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/04/paas-on-openstack.html"&gt;PaaS on OpenStack&lt;/a&gt;&lt;/h5&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;li&gt;&#xD;
&lt;h5&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/04/gigaspaces-openstack-explained.html"&gt;GigaSpaces OpenStack Explained&lt;/a&gt;&lt;/h5&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;li&gt;&#xD;
&lt;h5&gt;&lt;a href="http://www.citrix.com/tv/#videos/3837"&gt;OpenStack virtual appliance project&lt;/a&gt;&lt;/h5&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=GbhT3dw7lEs:cOB_3gjLUmI:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=GbhT3dw7lEs:cOB_3gjLUmI:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/GbhT3dw7lEs" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/05/citrix-openstack.html</feedburner:origLink></entry>
    <entry>
        <title>PaaS on OpenStack</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/wL8ubAgHHoQ/paas-on-openstack.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/04/paas-on-openstack.html" thr:count="1" thr:updated="2011-05-06T13:31:56+02:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef015431fa13ce970c</id>
        <published>2011-04-27T13:31:55+02:00</published>
        <updated>2011-04-27T13:31:55+02:00</updated>
        <summary>In my last post (GigaSpaces OpenStack Explained) I introduced our plan to add support for OpenStack in our platform: One of the goals for our second-generation PaaS/SaaS enablement platform was to enable smooth migration between different cloud providers. We were...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Citrix" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="JClouds" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="OpenStack" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="PaaS" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;In my last post (&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/04/gigaspaces-openstack-explained.html"&gt;GigaSpaces OpenStack Explained&lt;/a&gt;) I introduced our plan to add support for OpenStack in our platform:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;One of the goals for our second-generation PaaS/SaaS enablement platform was to enable smooth migration between different cloud providers. We were able to achieve this goal through the use of our own abstraction (the Scaling Handler) and through the integration with the &lt;a href="http://www.jclouds.com/"&gt;JClouds&lt;img alt="" src="http://i.ixnp.com/images/v6.59/t.gif"&gt;&lt;/img&gt;&lt;/a&gt; project that provides common abstraction to most of the existing cloud providers. With that, we can ensure that any application can be moved from the likes of Amazon to OpenStack or to an organization's own private cloud with zero changes to the application code or configuration.&lt;/p&gt;&#xD;
&lt;p&gt;The only change involves is setting the user/key of the specific cloud.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e87659ca6970d-pi"&gt;&lt;img alt="image" border="0" height="122" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e87659cc2970d-pi" title="image" width="244"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;p&gt;By adding support for OpenStack, we now enable users to safely move to an OpenStack-based cloud when they're ready and with little effort, yet they gain all the benefits that comes with it in terms of cost, openness etc.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;Yesterday, I had a session during the &lt;a href="http://www.openstack.org/blog/2011/03/openstack-conference-design-summit-2011-sponsored-by-citrix/" target="_self"&gt;OpenStack Summit&lt;/a&gt; where I tried to present a more general view on how we should be thinking about PaaS in the context of OpenStack.&lt;/p&gt;&#xD;
&lt;p&gt;The key takeaway :&lt;/p&gt;&#xD;
&lt;p&gt;The main goal of PaaS is to drive productivity into the &lt;em&gt;process&lt;/em&gt; by which we can deliver new applications.&lt;/p&gt;&#xD;
&lt;p&gt;Most of the existing PaaS solutions take a fairly extreme approach with their abstraction of the underlying infrastructure and therefore fit a fairly small number of extremely simple applications and thus miss the real promise of PaaS.&lt;/p&gt;&#xD;
&lt;p&gt;Amazon's Elastic Beanstalk took a more bottom up approach giving us better set of tradeoffs between the abstraction and control which makes it more broadly applicable to a larger set of applications.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;The fact that OpenStack is opensource allows us to think differently on the things we can do at the platform layer.&lt;/strong&gt; We can create a tighter integration between the PaaS and IaaS layers and thus come up with better set of tradeoffs into the way we drive productivity without giving up control. Specifically that means that:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Anyone should be able to: &#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Build their own PaaS in a snap&lt;/li&gt;&#xD;
&lt;li&gt;Run on any cloud (public/private)&lt;/li&gt;&#xD;
&lt;li&gt;–Gain multi-tenancy, elasticity… Without code changes.&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;li&gt;Provide a significantly higher degree of control without adding substantial complexity over our: &#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Language choice&lt;/li&gt;&#xD;
&lt;li&gt;–Operating System&lt;/li&gt;&#xD;
&lt;li&gt;–Middleware stack&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;li&gt;Should come pre-integrated with a popular stack: &#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Spring,Tomcat, DevOps, NoSQL, Hadoop...&lt;/li&gt;&#xD;
&lt;li&gt;Designed to run the most demanding mission-critical apps&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;The slides below illustrate a very high-level overview of the problems as well as how OpenStack can help provide a solution, from the presentation at the Summit:&lt;/p&gt;&#xD;
&lt;div style="text-align: left; width: 425px;"&gt;&lt;a href="http://www.slideboom.com/presentations/344639/PaaS-on-OpenStack" style="margin: 12px 0px 3px; display: block; font: 14px helvetica,arial,sans-serif; color: #0000cc; text-decoration: underline;" title="PaaS on OpenStack"&gt;PaaS on OpenStack&lt;/a&gt; &#xD;
&lt;object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://fpdownload.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,28,0" height="370" id="onlinePlayer344639" width="425"&gt;&#xD;
&lt;param name="movie" value="http://www.slideboom.com/player/player.swf?id_resource=344639"&gt;&lt;/param&gt;&#xD;
&lt;param name="allowScriptAccess" value="always"&gt;&lt;/param&gt;&#xD;
&lt;param name="quality" value="high"&gt;&lt;/param&gt;&#xD;
&lt;param name="bgcolor" value="#ffffff"&gt;&lt;/param&gt;&#xD;
&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&#xD;
&lt;param name="flashVars"&gt;&lt;/param&gt;&lt;embed allowfullscreen="true" allowscriptaccess="always" bgcolor="#ffffff" height="370" name="onlinePlayer344639" pluginspage="http://www.macromedia.com/go/getflashplayer" quality="high" src="http://www.slideboom.com/player/player.swf?id_resource=344639" type="application/x-shockwave-flash" width="425"&gt;&lt;/embed&gt;&#xD;
&lt;/object&gt;&#xD;
&lt;div style="font-family: tahoma,arial; height: 26px; font-size: 11px; padding-top: 2px;"&gt;View &lt;a href="http://www.slideboom.com" style="color: #0000cc;"&gt;more presentations&lt;/a&gt; or &lt;a href="http://www.slideboom.com/upload" style="color: #0000cc;"&gt;Upload&lt;/a&gt; your own.&lt;/div&gt;&#xD;
&lt;/div&gt;&#xD;
&lt;p&gt;We’ve recorded a short demo that shows how all different pieces pieces actually work together in the context of OpenStack:&lt;/p&gt;&#xD;
&lt;p&gt;  &lt;iframe allowfullscreen="allowfullscreen" frameborder="0" height="390" src="http://www.youtube.com/embed/Rm8Ux5cyNcw" title="YouTube video player" width="425"&gt;&lt;/iframe&gt;&lt;/p&gt;&#xD;
&lt;p&gt;You should note that since the demo was mainly targeted at illustrating the OpenStack integration through JClouds/OpenStack provider, it doesn’t cover much of the feature set, such as Multi-tenancy, fine-grained monitoring, or fail-over, nor does it show deploying full-blown web-apps and big-data apps, etc.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;span&gt;The actual code for the JClouds/OpenStack provider should be available through the &lt;a href="http://www.jclouds.org"&gt;JClouds&lt;/a&gt; project shortly.&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;h2&gt;Call for action&lt;/h2&gt;&#xD;
&lt;p&gt;Today, there is a lot of work and interesting innovation being done in the PaaS world by different providers. Unfortunately a lot of that work is done with very little collaboration. The OpenStack community can be a great environment to put all those great ideas into something meaningful and open.&lt;/p&gt;&#xD;
&lt;p&gt;I hope that our initial joint work with OpenStack, Citrix, JClouds, and GridDynamics can be a good start in that direction.&lt;/p&gt;&#xD;
&lt;p&gt;I’m trying to figure out what’s should be the right way to establish a more formalized Open-PaaS group as part of the OpenStack community. Any ideas/help would be greatly appreciated...&lt;/p&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=wL8ubAgHHoQ:uJQOj-SmEAA:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=wL8ubAgHHoQ:uJQOj-SmEAA:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/wL8ubAgHHoQ" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/04/paas-on-openstack.html</feedburner:origLink></entry>
    <entry>
        <title>GigaSpaces OpenStack Explained</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/pziUz_LYO5U/gigaspaces-openstack-explained.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/04/gigaspaces-openstack-explained.html" thr:count="4" thr:updated="2011-05-10T19:09:11+02:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef0147e3e1eb46970b</id>
        <published>2011-04-11T13:08:08+02:00</published>
        <updated>2011-04-20T19:32:51+02:00</updated>
        <summary>One of the major concerns of many IT organizations is cloud vendor lock-in. This concern was expressed recently in "Banks fear cloud vendor lock-in," from IT Wire: The onset of cloud computing gives vendors the chance to lock customers in...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Citrix" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="JClouds" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="OpenStack" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;One of the major concerns of many IT organizations is cloud vendor lock-in. This concern was expressed recently in "&lt;a href="http://www.itwire.com/it-industry-news/strategy/45898-banks-fear-cloud-vendor-lock-in"&gt;Banks fear cloud vendor lock-in&lt;/a&gt;," from IT Wire:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;The onset of cloud computing gives vendors the chance to lock customers in to their infrastructure, using proprietary protocols to ensure they’re on the monthly billing cycle as long as possible.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;The &lt;a href="http://openstack.org/"&gt;OpenStack&lt;/a&gt; project emerged with a mission to address this concern by creating a community-led open source project enabling any organization to create and offer cloud computing services running on standard hardware. &lt;/p&gt;&#xD;
&lt;p&gt;GigaSpaces joined the OpenStack project with the mission to enable any organization to build its own Platform-as-a-Service (PaaS), with its own choice of language and best-of-breed middleware stack.&lt;/p&gt;&#xD;
&lt;p&gt;In this post, I’ll try to provide more insight into our current and future plans around OpenStack, and more specifically our joint collaboration with the Citrix OpenCloud initiative.&lt;/p&gt;&#xD;
&lt;h2&gt;GigaSpaces OpenStack Explained&lt;/h2&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;What does GigaSpaces' OpenStack support mean?&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;One of the goals for our second-generation PaaS/SaaS enablement platform was to enable smooth migration between different cloud providers. We were able to achieve this goal through the use of our own abstraction (the Scaling Handler) and through the integration with the &lt;a href="http://www.jclouds.com"&gt;JClouds&lt;/a&gt; project that provides common abstraction to most of the existing cloud providers. With that, we can ensure that any application can be moved from the likes of Amazon to OpenStack or to an organization's own private cloud with zero changes to the application code or configuration.&lt;/p&gt;&#xD;
&lt;p&gt;  The only change involves is setting the user/key of the specific cloud.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e87659ca6970d-pi"&gt;&lt;img alt="image" border="0" height="122" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e87659cc2970d-pi" style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="image" width="244"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;p&gt;By adding support for OpenStack, we now enable users to safely move to an OpenStack-based cloud when they're ready and with little effort, yet they gain all the benefits that comes with it in terms of cost, openness etc.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Can I use the OpenStack integration outside of GigaSpaces' platform?&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Yes. The OpenStack integration was developed in close collaboration with GridDyanmics and Adrian Cole from JClouds. We intend to make the integration with OpenStack available to the entire Java community through the open source JClouds project. In this way, any Java application can easily run on OpenStack.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;What specific application stack is currently supported?&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Today, there are a wide variety of tools and frameworks available throughout the ecosystems in Java, .Net, Ruby, and more. We feel that limiting the platform to a specific stack that we control -- as in the case of Google App Engine -- is too restrictive. &lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef0147e3e53e2e970b-pi"&gt;&lt;img align="right" alt="image" border="0" height="196" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e608a2484970c-pi" style="background-image: none; margin: 4px 0px 5px 7px; padding-left: 0px; padding-right: 0px; display: inline; float: right; padding-top: 0px; border-width: 0px;" title="image" width="244"&gt;&lt;/img&gt;&lt;/a&gt;Instead, we wanted to create a foundation that enables users of the platform to easily host any service or framework they choose on the platform. We also wanted to do this in a way that provides consistent behavior in terms of deployment, monitoring, elasticity, and scaling. Therefore, we developed a Universal Service Manager and Service Orchestration framework. The framework makes it easier users to plug in their own choice of services that comprise the application stack (Tomcat, MySQL, NoSQL through Cassandra, Hadoop, Ruby...).&lt;/p&gt;&#xD;
&lt;p&gt;In addition, we will provide a set of built-in recipes. At first we will be targeting Big Data stacks, which include integration with NoSQL and Hadoop, and also reporting tools to make the deployment of Big Data applications significantly simpler.&lt;/p&gt;&#xD;
&lt;p&gt;The integration with OpenStack will make it possible to bring these benefits to any application running on OpenStack.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;How is this related to the previously announced Open Elastic Platform with Citrix?&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;The new platform builds on the collaboration &lt;a href="http://citrix.com/English/NE/news/news.asp?newsID=2304852" target="_blank"&gt;announced in October 2010&lt;/a&gt; by Citrix and GigaSpaces, highlighting the integration between &lt;a href="http://www.citrix.com/English/ps2/products/product.asp?contentID=21679&amp;amp;ntref=prod_top"&gt;Citrix® NetScaler®,&lt;/a&gt; &lt;a href="http://www.citrix.com/xenserver"&gt;Citrix® XenServer®&lt;/a&gt; -- both part of the Citrix OpenCloud solution -- and &lt;a href="http://www.gigaspaces.com/xap"&gt;GigaSpaces eXtreme Application Platform (XAP)&lt;/a&gt;.&lt;/p&gt;&#xD;
&lt;p&gt;In this new release we are adding the following new development:&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Greater Openness&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;Openness at the IaaS layer -– The specific integration with NetScaler (load balancer) and XenServer would be done through more open interfaces provided through the Citrix/Openstack contribution, which means that OpenStack users can use those intefaces to plug in any hypervisor or load balancer.&lt;/p&gt;&#xD;
&lt;p&gt;Openness at the application layer –- The previous version of GigaSpaces XAP offered a limited set of application stack support features, mostly geared to services that we control and mostly in Java. With the new platform we offer significantly greater flexibility on various fronts:   &lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt; &#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;strong&gt;Open to any application stack and language&lt;/strong&gt;: As I mentioned earlier, users can now easily host any application or service and build their stack of choice, while at the same time managing and controling their application in a consistent way in terms of the deployment, monitoring, elasticity, etc.&lt;/li&gt;&#xD;
&lt;li&gt;&lt;strong&gt;Users can choose their container of choice&lt;/strong&gt;: The new platform will include support for Tomcat in addition to our existing Jetty support.&lt;/li&gt;&#xD;
&lt;li&gt;&lt;strong&gt;Users can choose standard APIs to scale their data&lt;/strong&gt;: In most cases users looking for data scalability have had to rewrite their applications. This is still the case with most of the NoSQL and in-memory data grid solutions. Through standard JPA support in our platform, we can finally reduce that lock-in barrier.&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Better Application Monitoring, Specifically Geared for PaaS&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;One of the challenges with many of the existing monitoring systems is that they were not geared for PaaS-based deployment. In this new offering we provde PaaS-driven monitoring that is tightly integrated with the platform, and which can interact with the platform to deal with failure or scaling events without any human intervention. In addition, it provides a holistic view that includes the application and infrastructure, as can be seen in these screenshots:&lt;/p&gt;&#xD;
&lt;p&gt; &lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef0147e3e53e67970b-pi"&gt;&lt;img alt="image" border="0" height="184" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e87659d63970d-pi" style="background-image: none; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="image" width="244"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;p&gt;Furthermore, the new monitoring was designed to integrate with the existing set of data-center monitoring tools.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e608a24e4970c-pi"&gt;&lt;img alt="image" border="0" height="54" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef0147e3e53e89970b-pi" style="background-image: none; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="image" width="244"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Better Performance and Scalability &lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;By leveraging the Citrix OpenCloud solution, GigaSpaces will provide tighter integration between the platform and the infrastructure layers to up-level services offered through OpenStack clouds. In this way, we leverage the years of experience of both platforms in delivering high performance and low latency to mission-critical applications and remove the hassle involved in tuning the entire stack. We believe that through this joint work we can bring a lot of that experience into the OpenStack community. &lt;/p&gt;&#xD;
&lt;p&gt;We can also now offer more fine-grained multi-tenancy support –- by combining process and VM-based multi-tenancy. This makes it possible to achieve significantly higher density and utilization of existing resources, and reduces the amount of resources -- and therefore the cost -- associated with serving a particular application load.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;What does this means for Enterprises, ISV/SaaS, and Cloud/IaaS providers?&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Any user of this new offering will benefit from the greater openess and reduced lock-in concern. They  wouldn’t be relying on single vendor for their future. This also means that they will have better control over their own stack and cost margins, and will therefore gain the ability to offer more competitive prices with other equivalent cloud providers. More specifically&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;strong&gt;Enterprises &lt;/strong&gt;can use this offering to build their own Enterprise PaaS layer that is specifically geared for big-data analytics, e-commerce, and financial applications. &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;strong&gt;SaaS ISVs&lt;/strong&gt; can use this offering to SaaS-enable their application and provide the same solution off- and on-premises. &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;strong&gt;Infrastructure/Cloud providers&lt;/strong&gt; can use this offering to offer an Amazon-like services stack, including RDS, SimpleDB, SQS, Cloudwatch, and Elastic Beanstalk as a pre-packaged solution. &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;p&gt;&lt;span style="font-size: 20px; font-weight: bold;"&gt;Final Words&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p&gt;I’m very excited about all of these new developments. OpenStack fills in the missing piece in the cloud –- as its name suggests, it unlocks the cloud by providing a truly open cloud stack and provides an essential substrate that drives innovation and collaboration -- which couldn’t be done before. It's interesting to compare the level of effort that we had to invest in the past when we did the first integration with Citrix to the one with OpenStack. Openstack enables us to work completely in parallel and make progress without too much coordination and still achieve substantial progress. Moreover, every delivery of our work can be relevant to wider spectrum of users. It also enabled us to join forces with other members of the community, such as &lt;a href="http://www.linkedin.com/in/adrianforrestcole?goback=.nppvan_%2Fturgay" target="_self"&gt;Adrian Cole&lt;/a&gt; from JClouds and &lt;a href="http://www.griddynamics.com/" target="_self"&gt;GridDynamics&lt;/a&gt;, simply because we share a similar goal to support the OpenStack open cloud mission.&lt;/p&gt;&#xD;
&lt;p&gt;This is only the beginning. I hope that with this new development we can continue to work toward greater openess at the application platform layer together with Citrix and the rest of the OpenStack community.&lt;/p&gt;&#xD;
&lt;p&gt;I’m going to give a talk on on the subject (PaaS on Openstack) during the upcoming &lt;a href="http://www.openstack.org/blog/2011/03/openstack-conference-design-summit-2011-sponsored-by-citrix/"&gt;OpenStack Design Summit&lt;/a&gt; on April 26&lt;sup&gt;th&lt;/sup&gt;, 2011, in Santa Clara, California, where I hope to share some of these thoughts and hopefully get more work on this happening within the community. &lt;/p&gt;&#xD;
&lt;h2&gt;&lt;span style="font-size: 12pt;"&gt;&lt;strong&gt;Availability&lt;/strong&gt;&lt;/span&gt;&lt;/h2&gt;&#xD;
&lt;p&gt;I will also be announcing the first GigaSpaces code contribution to OpenStack at the Design Summit. Note that the integration with OpenStack will be made available to the entire Java community through the open source JClouds project. &lt;/p&gt;&#xD;
&lt;p&gt;You can also get an early look at our upcoming 2nd-generation CEAP release by registering for the &lt;a href="http://www.gigaspaces.com/registration_earlyaccess" target="_blank"&gt;Early Access Program&lt;/a&gt;.&lt;/p&gt;&#xD;
&lt;h2&gt;&lt;span style="font-size: 12pt;"&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/span&gt;&lt;/h2&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;a href="http://community.citrix.com/display/ocb/2011/04/11/Spring+is+Sprung"&gt;Spring is Sprung&lt;/a&gt; by &lt;a href="http://community.citrix.com/blogs/citrite/simoncr" target="_self"&gt;Simon Crosby&lt;/a&gt; -Citrix CTO)&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/01/paas-shouldnt-be-built-in-silos.html" target="_self"&gt;PaaS shouldn't be built in Silos&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://www.gigaspaces.com/paas_enablement_faq"&gt;GigaSpaces 2nd generation PaaS/SaaS enablement platform FAQ&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://www.gigaspaces.com/paas_enablement_faq"&gt;&lt;/a&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/03/gigaspaces-new-cloud-platform-sneak-preview.html"&gt;GigaSpaces CEAP sneak preview&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/03/gigaspaces-new-cloud-platform-sneak-preview.html"&gt;&lt;/a&gt;&lt;a href="http://www.gigaspaces.com/citrix"&gt;GigaSpaces/Citrix  Open Elastic Platform&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;/ul&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=pziUz_LYO5U:WqV_0PtHbek:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=pziUz_LYO5U:WqV_0PtHbek:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/pziUz_LYO5U" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/04/gigaspaces-openstack-explained.html</feedburner:origLink></entry>
    <entry>
        <title>Multi-tenancy: emulation or the real thing? - My Take</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/Ys7B2fkMhtE/multi-tenancy-emulation-or-the-real-thingmy-take.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/04/multi-tenancy-emulation-or-the-real-thingmy-take.html" thr:count="7" thr:updated="2011-06-22T09:03:51+02:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef0147e3d4f1c6970b</id>
        <published>2011-04-08T16:49:26+02:00</published>
        <updated>2011-04-08T16:49:45+02:00</updated>
        <summary>Phil Wainewright wrote and interesting take on Zdnet (Multi-tenancy: emulation or the real thing?) where he outlines different multi-tenancy levels: Hosting single-tenant applications on multi-tenant infrastructure Redistributing application processes across virtualized machines Injecting a user-specific column in the database driver...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Multi-tenancy" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="PaaS" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="SaaS" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;&lt;a href="http://www.linkedin.com/in/philwainewright"&gt;Phil Wainewright&lt;/a&gt; wrote and interesting take on Zdnet (&lt;a href="http://www.zdnet.com/blog/saas/multi-tenancy-emulation-or-the-real-thing/1286"&gt;Multi-tenancy: emulation or the real thing?&lt;/a&gt;) where he outlines different multi-tenancy levels:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;strong&gt;Hosting single-tenant applications on multi-tenant infrastructure&lt;/strong&gt; &lt;/li&gt;&#xD;
&lt;li&gt;&lt;strong&gt;Redistributing application processes across virtualized machines&lt;/strong&gt; &lt;/li&gt;&#xD;
&lt;li&gt;&lt;strong&gt;Injecting a user-specific column in the database driver&lt;/strong&gt; &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;Phil also addressed two approaches to achieve multi-tenancy, to which he referred as emulation - &lt;em&gt;multi-tenancy&lt;/em&gt; that is pre-backed into the infrastructure as in the case of GigaSpaces, or the "&lt;em&gt;real thing&lt;/em&gt;," which stands for a custom build where multi-tenancy is handled at the application level as in the case of Salesforce.&lt;/p&gt;&#xD;
&lt;p&gt;In his final takeaway, Phil suggested that to get the most out of multi-tenancy, you have to deal with it at the application level.&lt;/p&gt;&#xD;
&lt;h2&gt;The build vs buy dilemma&lt;/h2&gt;&#xD;
&lt;p&gt;Phil's point reminded me of the classic &lt;strong&gt;build vs buy dilemma&lt;/strong&gt; – at first sight you would think that a custom build that is tailored to your specific domain and application would always be better than a generic solution. But is it?&lt;/p&gt;&#xD;
&lt;p&gt;I addressed "build vs buy" in a previous post on the subject (&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2008/11/the-impact-of-cloud-computing-on-buy-vs-build-behaviour.html"&gt;The impact of cloud computing on build vs. buy behavior&lt;/a&gt;), using &lt;a href="http://en.wikipedia.org/wiki/Fred_Brooks"&gt;Fred Brooks&lt;img alt="" src="http://i.ixnp.com/images/v6.59/t.gif"&gt;&lt;/img&gt;&lt;/a&gt;'s article &lt;a href="http://en.wikipedia.org/wiki/No_Silver_Bullet"&gt;No Silver Bullet&lt;/a&gt;. There are many lessons that Fred points out in his article that are worth reading, but here's a relevant point I'd like to highlight:   &lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;&lt;strong&gt;The hardest single part of building a software system is deciding precisely what to build.&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;Fred suggests that we tend to justify custom builds without properly realizing the associated costs. In many cases, what we tend to refer to as “specific requirements” aren’t that specific after all - or to be more accurate, the differences don’t justify the costs.&lt;/p&gt;&#xD;
&lt;h2&gt;What to choose?&lt;/h2&gt;&#xD;
&lt;p&gt;In the context of multitenancy, we should consider the following things:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;h3&gt;Does the difference justify the cost?&lt;/h3&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;&lt;strong&gt;How different would Salesforce multi-tenancy be compared to the multitenancy of any other SaaS or PaaS provider?&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;I would argue that most multitenancy providers are the same. Differences are minor; the goal is multitenancy, not a mix-and-match of potential features.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;h3&gt;Complexity consideration&lt;/h3&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Dealing with data multi-tenancy at the application level can be error prone and can be significantly more complex than dealing with it at the infrastructure level, with serious effects. For example, I heard from one SaaS provider that a bug in their system led to one customer accidentally getting access to another customer's data. In addition to the complexity, there are things that just &lt;em&gt;can’t&lt;/em&gt; be dealt with properly at the application level – for example, in the case of Salesforce, a single tenant is bound to a single database instance and can’t scale beyond that single database.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;h3&gt;Time to Market consideration&lt;/h3&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;When you launch a new SaaS or PaaS application, time to market often plays a critical role. Dealing with multi-tenancy at the application level requires lots of effort which can lead to a significant delay in time to market.&lt;/p&gt;&#xD;
&lt;p&gt;Joseph Ottinger points out that this effort can be addressed in lots of ways - perhaps by programming language choice, for example. But even these solutions carry costs, in terms of your team's capabilities or even in scalability. The language you choose, for example, might be able to help your time to market, but might cost you in performance, or even address how your architecture is built. These choices can be made well, but you should go in with eyes open to what choices you're making.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
&lt;h3&gt;Cost consideration&lt;/h3&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;When were dealing with multitenancy at the application level, we're often limited to cross-grain multi-tenancy as we don’t have full control over the underlying infrastructure.  For example, in the case of data-multi-tenancy we can easily scale out and and scale down our data services based on the actual capacity that is being used – doing that at the application level would be much harder, maybe close to impossible. That means that we can achieve better density and thus control the number of resources and thus the cost at each given point in time by using platform multitenancy.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;h2&gt;It’s not a mutually exclusive decision&lt;/h2&gt;&#xD;
&lt;p&gt;In the context of multi-tenancy, there are always going to be things better dealt with at the infrastructure level such as data-multi-tenancy – and application specific multi-tenancy such as the user interface layout.&lt;/p&gt;&#xD;
&lt;p&gt;By delegating the generic part of the challenges of multi-tenancy to the infrastructure layer we can devote more time and effort at areas more specific to our business. &lt;/p&gt;&#xD;
&lt;h2&gt;The full story in 15 min&lt;/h2&gt;&#xD;
&lt;p&gt;Phil's article triggered a more in-depth discussion with &lt;a href="http://www.linkedin.com/in/uricohen"&gt;Uri Cohen&lt;/a&gt;, GigaSpaces product manager, which later led to a recorded interview (&lt;a href="http://blip.tv/file/4966084"&gt;15 Min on Multi-Tenancy&lt;/a&gt;) in which Uri address the following topics:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Multi-tenancy: emulation or the real thing? the classic build vs buy dilemma &lt;/li&gt;&#xD;
&lt;li&gt;What is multitenancy anyway? &lt;/li&gt;&#xD;
&lt;li&gt;Why is it such a popular topic? &lt;/li&gt;&#xD;
&lt;li&gt;Is it a feature or best practice? &lt;/li&gt;&#xD;
&lt;li&gt;What is GigaSpaces Multi-tenancy? &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;Enjoy..&lt;/p&gt;&#xD;
&lt;p&gt;  &lt;embed allowfullscreen="true" allowscriptaccess="always" height="330" src="http://blip.tv/play/AYKwmgMC" type="application/x-shockwave-flash" width="480"&gt;&lt;/embed&gt;&lt;/p&gt;&#xD;
&lt;h2&gt;References:&lt;/h2&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;a href="http://www.zdnet.com/blog/saas/multi-tenancy-emulation-or-the-real-thing/1286"&gt;Multi-tenancy: emulation or the real thing?&lt;/a&gt; &lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2008/11/the-impact-of-cloud-computing-on-buy-vs-build-behaviour.html"&gt;The impact of cloud computing on build vs. buy behavior&lt;/a&gt; &lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2010/03/multitenancy-does-it-have-to-be-that-hard.html"&gt;Multi-tenancy: does it have to be that hard?&lt;/a&gt; &lt;/li&gt;&#xD;
&lt;/ul&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=Ys7B2fkMhtE:H6jwZCsEOyA:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=Ys7B2fkMhtE:H6jwZCsEOyA:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/Ys7B2fkMhtE" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/04/multi-tenancy-emulation-or-the-real-thingmy-take.html</feedburner:origLink></entry>
    <entry>
        <title>Scaling Social Ecommerce: Architecture Case study</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/OmpDibaT2_U/scaling-searscom-social-ecommerce-architecture-case-study.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/04/scaling-searscom-social-ecommerce-architecture-case-study.html" thr:count="7" thr:updated="2011-11-14T10:21:56+01:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef014e6059fc0a970c</id>
        <published>2011-04-04T15:18:25+02:00</published>
        <updated>2011-04-05T17:11:50+02:00</updated>
        <summary>In February this year I had the opportunity to meet Tomer Gabel, an Application Engineer at SHC Israel, who gave an excellent talk at one of our customer road shows about Social E-Commerce and the scalability challenges associated with handling...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Grid" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;In February this year I had the opportunity to meet &lt;a href="http://www.linkedin.com/in/tomergabel"&gt;Tomer Gabel&lt;/a&gt;, an Application Engineer at SHC Israel, who gave an excellent talk at one of our &lt;a href="http://www.gigaspaces.com/content/gigaspaces-road-show-notes-events"&gt;customer road shows&lt;/a&gt; about Social E-Commerce and the scalability challenges associated with handling of social graphs. Tomer later joined me for a case study, &lt;a href="http://www.infoq.com/news/2011/03/sears-case-study"&gt;published on InfoQ&lt;/a&gt;.&lt;/p&gt;&#xD;
&lt;p&gt;In this post I tried to summarize the main takeaway from the interview. Enjoy..&lt;/p&gt;&#xD;
&lt;h2&gt;What is Social E-Commerce anyway?&lt;/h2&gt;&#xD;
&lt;p&gt;&lt;strong&gt;According to wikipedia, &lt;a href="http://en.wikipedia.org/wiki/Social_shopping"&gt;Social Shopping&lt;/a&gt;&lt;/strong&gt; is a mechanism for &lt;a href="http://en.wikipedia.org/wiki/E-commerce"&gt;e-commerce&lt;/a&gt; where shoppers' friends become involved in the shopping experience. Social shopping attempts use technology to mimic the social interactions found in physical malls and stores.&lt;/p&gt;&#xD;
&lt;h3&gt;The opportunity&lt;/h3&gt;&#xD;
&lt;p&gt;A recent study showed that over 92 percent of executives from leading retailers are focusing their marketing efforts on Facebook and subsequent applications. Furthermore,&lt;strong&gt; over 71 percent of users have confirmed they are more likely to make a purchase after “liking” a brand they find online. (&lt;a href="http://www.webhostingfan.com/2011/03/ecommerce-stores-with-a-social-networking-platform/"&gt;source&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;I recently came across an interesting analysis on the &lt;a href="http://www.exacttarget.com/Resources/SFF8.pdf"&gt;Subscribers, Fans, &amp;amp; Followers: The Social Break-Up&lt;/a&gt;. The report analyzed social behavior, and more specifically why consumers end brand relationships. Even though the focus of this post is not about &lt;em&gt;ending&lt;/em&gt; brand relationships, I thought that some of the statistics presented in this report are quite useful for quantifying where we are today with social e-commerce in terms of the market size and its potential.&lt;/p&gt;&#xD;
&lt;p&gt;Facebook statistics:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;73% of U.S. online consumers have created a profile on Facebook. &lt;/li&gt;&#xD;
&lt;li&gt;65% of U.S. online consumers are currently active on Facebook. &lt;/li&gt;&#xD;
&lt;li&gt;42% of U.S. online consumers (64% of those on Facebook) are “FANS” (have “liked” a company on Facebook). &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;Twitter statistics:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;17% of U.S. online consumers have created a Twitter account. &lt;/li&gt;&#xD;
&lt;li&gt;9% of U.S. online consumers are currently active on Twitter. &lt;/li&gt;&#xD;
&lt;li&gt;5% of U.S. online consumers (56% of those on Twitter) are FOLLOWERS  (use Twitter and have “followed” at least one company). &lt;/li&gt;&#xD;
&lt;li&gt;71% of FOLLOWERS expect to receive marketing messages from companies through Twitter. &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;Clearly the adoption of social commence is quite staggering.&lt;/p&gt;&#xD;
&lt;p&gt;The focus of this post wasn’t to convince you should invest in social commerce but to give a context for one of the common scalability challenges associate with building social ecommerce platform - the "Social Graph."&lt;/p&gt;&#xD;
&lt;h2&gt;The Social Graph challenge&lt;/h2&gt;&#xD;
&lt;p&gt;&lt;img align="right" alt="" height="231" src="http://www.toprankblog.com/wp-content/uploads/2010/08/fbscreen-socialgraph.png" style="margin: 2px 1px 9px 2px; display: inline; float: right;" width="235"&gt;&lt;/img&gt;&lt;/p&gt;&#xD;
&lt;p&gt;The &lt;a href="http://www.google.com/search?hl=en&amp;amp;defl=en&amp;amp;q=define:Social+Graph&amp;amp;sa=X&amp;amp;ei=JVKYTf-DGY6WswaC5dmtCA&amp;amp;ved=0CCAQkAE"&gt;Social Graph&lt;/a&gt; is “the global mapping of everybody and how they're related.”&lt;/p&gt;&#xD;
&lt;p&gt;Processing social networks is not an easy proposition:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Massive amounts of branching data &lt;/li&gt;&#xD;
&lt;li&gt;No data locality &lt;/li&gt;&#xD;
&lt;li&gt;Very few assumptions can be made about the data &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;In other words, to meet the capacity scaling demand you can’t store the data in a centralized location. That forces us to distribute the data. On the other hand, to access (or query) the data, we can’t assume that the data for our query is located in a single node. That forces us to look for the data on multiple nodes.&lt;/p&gt;&#xD;
&lt;p&gt;Let’s take a simple scenario to get some sense of the complexity of the problem:&lt;/p&gt;&#xD;
&lt;p&gt; &lt;a href="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e6059fbf4970c-pi"&gt;&lt;img align="right" alt="image" border="0" height="218" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef014e6059fbfc970c-pi" style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; float: right; padding-top: 0px; border-width: 0px;" title="image" width="244"&gt;&lt;/img&gt;&lt;/a&gt;Example:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Imagine every Facebook user (500 million) &lt;/li&gt;&#xD;
&lt;li&gt;Imagine each person is only connected to 100 others (conservative estimate) &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;Query: How is user X connected with Y?&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;X has 100 friends &lt;/li&gt;&#xD;
&lt;li&gt;Each of them has 100 friends &lt;/li&gt;&#xD;
&lt;li&gt;10,001 nodes visited! &lt;/li&gt;&#xD;
&lt;li&gt;101 reads from the underlying storage system! &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;Clearly even if we can scale at the capacity level we can’t scale at the query level.&lt;/p&gt;&#xD;
&lt;h3&gt;The crux of the problem:&lt;/h3&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;High branch factor necessitates many loads to serve even a simple request &lt;/li&gt;&#xD;
&lt;li&gt;No data locality + high branch factor means very high random I/O &lt;/li&gt;&#xD;
&lt;li&gt;Traditional storage models (RDBMS, flat files etc.) are a poor fit &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;h2&gt;The Solution:&lt;/h2&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Use Memory as the main storage           &#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Random I/O access works much better on memory devices than on disk (See more details &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2010/03/memory-is-the-new-disk-for-the-enterprise.html"&gt;here&lt;/a&gt;) &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;li&gt;Execute the code with the data - Using Real Time Map/Reduce           &#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;To reduce the number of iterations required to execute a particular query we use the &lt;a href="http://www.gigaspaces.com/wiki/display/XAP8/Task+Execution+over+the+Space"&gt;executor API&lt;/a&gt;. The executor API enables us to push the code to the data. By doing that we can execute fairly complex data processing on the data node at memory speed vs network speed. &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;li&gt;De-normalize the data           &#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;To reduce the amount of traversal access and network hops per query on the graph we need to copy elements of the graph into each node. For example the list of Friends and friends of friends (up to a certain degree) could be stored in each node and thus become available to any element of the graph  without the need to consult with other nodes. &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;h2&gt;The operational perspective&lt;/h2&gt;&#xD;
&lt;p&gt;Scaling Social Ecommerce involves not just the application architecture but also the operational aspect. This is of particular importance as ecommerce and social sites tend to evolve quite rapidly and therefore the time it takes to release a new feature from development to production is critical.  This process is often referred to as Continuous Deployment or in our specific case it can be referred to also as  "Continuous Scaling."&lt;/p&gt;&#xD;
&lt;p&gt;There are basically two factors that are important to achieve continuous scaling/deployment:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;strong&gt;Automation&lt;/strong&gt; – if the process of deployment and scaling involves lots of human intervention than the time it would take to release a new feature would get significantly higher. It is therefore important that the entire process can be fully automated. In some cases, you may want to have manual check points in the process but even in that case the expectation is that manual intervention would involve clicking on a &amp;lt;continue&amp;gt; button and nothing else.  To achieve this level of automation we need to interact with the our application infrastructure through an API. This process of automation and integration between the development and operational environment is often referred to as DevOps. The GigaSpaces reference for dev-ops API is provided &lt;a href="http://www.gigaspaces.com/wiki/display/XAP8/Administration+and+Monitoring+API+Security"&gt;here&lt;/a&gt;.&lt;/li&gt;&#xD;
&lt;li&gt;&lt;strong&gt;Schema evolution&lt;/strong&gt; –There are many cases in which we would want to add or change our existing data structure to our existing application as part of the upgrade process without brining the system down. This is considered one of the more complex challenges to deal with with many of the existing data bases. A d&lt;a href="http://www.gigaspaces.com/wiki/display/XAP8/Document+%28Schema-Free%29+API"&gt;ocument model&lt;/a&gt; is schema-less and therefore more suited for this sort of rapid data change.&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt; &lt;span style="font-size: 20px; font-weight: bold;"&gt;The full story&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;p&gt;Tomer did a a great job describing in more details the scalability challenges and the approach that they have taken in Sears to address those challenges as well as providing the specific insight on the performance figures that they achieved with their current system. The full interview recording is provided below:&lt;/p&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;p&gt;&lt;embed allowfullscreen="true" allowscriptaccess="always" height="300" src="http://blip.tv/play/AYKv1mMC" type="application/x-shockwave-flash" width="480"&gt;&lt;/embed&gt;&lt;/p&gt;&#xD;
&lt;p&gt;&lt;span style="font-size: 15px; font-weight: bold;"&gt;References&lt;/span&gt;&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt; &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2009/04/writing-your-own-scalable-twitter.html"&gt;Designing a Scalable Twitter&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2010/03/memory-is-the-new-disk-for-the-enterprise.html"&gt;Memory is the New Disk for the Enterprise&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2009/11/why-existing-databases-rac-are-so-breakable.html"&gt;Why Existing Databases (RAC) are So Breakable!&lt;/a&gt; &lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2009/12/the-common-principles-behind-the-nosql-alternatives.html"&gt;The Common Principles Behind the NOSQL Alternatives&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;/ul&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=OmpDibaT2_U:-tR6g3HFerM:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=OmpDibaT2_U:-tR6g3HFerM:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/OmpDibaT2_U" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/04/scaling-searscom-social-ecommerce-architecture-case-study.html</feedburner:origLink></entry>
    <entry>
        <title>When Big-IT meets Big-Web</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/8mlSc-v_vrM/when-big-it-meets-big-web.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/03/when-big-it-meets-big-web.html" thr:count="2" thr:updated="2011-03-16T22:45:05+01:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef014e86bcc75c970d</id>
        <published>2011-03-16T14:23:48+01:00</published>
        <updated>2011-03-16T14:23:48+01:00</updated>
        <summary>Last week I had the honor to join an interesting event held by Battery Venture in Stanford titled “BiG IT meets Big Web” which brought together people from the financial industry (JP Morgan, NYSE, BOFA, Barclays, for example) and the...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="PaaS" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;&lt;a href="http://www.google.com/imgres?imgurl=http://www.jamminshirts.com/catalog/TIE%2520SHIRT.jpg&amp;amp;imgrefurl=http://www.jamminshirts.com/servlet/the-186/MYTHBUSTERS-TIE-WITH-POCKET/Detail&amp;amp;usg=__S_KNjjwWE9jMtMD67t9aPSC-jI0=&amp;amp;h=471&amp;amp;w=720&amp;amp;sz=43&amp;amp;hl=en&amp;amp;start=0&amp;amp;sig2=ntCoXoQw2XeM4uj6F88frA&amp;amp;zoom=1&amp;amp;tbnid=66YkkBkz16MApM:&amp;amp;tbnh=141&amp;amp;tbnw=215&amp;amp;ei=lvJ_TZr_NcPOtAaR3o3ECA&amp;amp;prev=/images%3Fq%3Dtshirt%2B-%2Btie%26hl%3Den%26biw%3D1280%26bih%3D675%26tbs%3Disch:1&amp;amp;itbs=1&amp;amp;iact=rc&amp;amp;dur=661&amp;amp;oei=lvJ_TZr_NcPOtAaR3o3ECA&amp;amp;page=1&amp;amp;ndsp=18&amp;amp;ved=1t:429,r:2,s:0&amp;amp;tx=91&amp;amp;ty=58"&gt;&lt;img align="left" alt="Black shirt (Big Web), Tie (Big IT)" border="0" height="244" src="http://natishalom.typepad.com/.a/6a00d835457b7453ef0147e33ce21f970b-pi" style="background-image: none; margin: 0px 9px 3px 0px; padding-left: 0px; padding-right: 0px; display: inline; float: left; padding-top: 0px; border-width: 0px;" title="Black shirt (Big Web), Tie (Big IT), Credit goes to Alex Benlik" width="207"&gt;&lt;/img&gt;&lt;/a&gt;Last week I had the honor to join an interesting event held by &lt;a href="http://www.battery.com/"&gt;Battery Venture&lt;/a&gt; in Stanford titled “&lt;strong&gt;BiG IT meets Big Web&lt;/strong&gt;” which brought together people from the financial industry (JP Morgan, NYSE, BOFA, Barclays, for example) and the Web facing world (Twitter, Netflix, Nimbula, DataStax, Bing Mobile, and others.)&lt;/p&gt;&#xD;
&lt;p&gt;As someone who happened to cover those two worlds for a while through my work at GigaSpaces, this was a unique opportunity meet the two camps under the same roof for the first time.&lt;/p&gt;&#xD;
&lt;p&gt;The two opening presentations by &lt;a href="http://www.linkedin.com/profile/view?id=2817&amp;amp;authType=name&amp;amp;authToken=P-sd&amp;amp;pvs=pp&amp;amp;trk=ppro_viewmore"&gt;Brad Spiers&lt;/a&gt; (SVP President at Bank of America) and &lt;a href="http://www.linkedin.com/profile/view?id=1533913&amp;amp;authType=NAME_SEARCH&amp;amp;authToken=c4mm&amp;amp;locale=en_US&amp;amp;srchid=99c46a4f-0573-4964-a2b9-bfddf3e3f373-0&amp;amp;srchindex=1&amp;amp;srchtotal=1&amp;amp;pvs=ps&amp;amp;pohelp=&amp;amp;goback=%2Efps_Adrian+Cockcroft+netflix_*1_*1_*1_*1_*1_*1_*51_*1_Y_*1_*1_*1_false_1_R_true_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2"&gt;Adrian Cockcroft&lt;/a&gt; (Cloud Architect at Netflix) were great examples of the different in challenges and culture between the worlds.&lt;/p&gt;&#xD;
&lt;p&gt;Brad talked about Risk Management and Monte Carlo simulations as the core of their IT and DevOps as automation that is being done through scripting by the operation guys to ensure consistent deployment of their apps.&lt;/p&gt;&#xD;
&lt;p&gt;Adrian talked about their cloud strategy: running their entire data center on the cloud, how their devs handle also operations, as well as how they achieved continuous deployment to reduce their release cycles.&lt;/p&gt;&#xD;
&lt;p&gt;&lt;a href="http://www.linkedin.com/profile/view?id=177670&amp;amp;authType=NAME_SEARCH&amp;amp;authToken=VfY3&amp;amp;locale=en_US&amp;amp;srchid=5751d690-dab4-4829-9b9e-826007ba9952-0&amp;amp;srchindex=1&amp;amp;srchtotal=1&amp;amp;pvs=ps&amp;amp;pohelp=&amp;amp;goback=%2Efps_Adrian+Kunzle+_*1_*1_*1_*1_*1_*1_*51_*1_Y_*1_*1_*1_false_1_R_true_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2"&gt;Adrian Kunzle&lt;/a&gt;, MD., Head of Engineering &amp;amp; Architecture at JPMorgan Chase, was somewhat in the middle between the two camps and provided a good example on how Big IT can adopt some of the lessons from Big Web despite the unique challenges that the finance industry faces - i.e. being highly regulated with lots of legacy apps that are hard to change, etc.&lt;/p&gt;&#xD;
&lt;p&gt;I thought that the main lessons from these talks could provide an interesting view on how Big IT can adopt some of the ideas and lessons from the Big Web world.&lt;/p&gt;&#xD;
&lt;h2&gt;My Take: Bringing Big IT and Big Web closer&lt;/h2&gt;&#xD;
&lt;p&gt;The move to private cloud and a PaaS model could be a great opportunity to address two of the main challenges that are often thought of as conflicting: &lt;strong&gt;control&lt;/strong&gt; and &lt;strong&gt;agility&lt;/strong&gt;.  A common Cloud/PaaS gives the IT better governance and control of the stack that runs their application as well as a way to measure and control their application's SLA in a consistent way. On the other hand, it enables better agility as the cycle to deploy new application and setup a new environment can be reduced significantly. &lt;/p&gt;&#xD;
&lt;p&gt;The list below summarizes the main points that makes it possible to bring the worlds closer together .&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;Monte Carlo simulations and risk management applications are not the only applications that a typical bank runs. There are still many applications that can take advantage of the agility that comes with the Cloud/PaaS model and they could serve as the early adopters within the bank. &lt;/li&gt;&#xD;
&lt;li&gt;Applications should be abstracted from the notion of public vs private cloud - i.e. they should be able to run in both environments in the same way. The choice of where to run should be determined based on business justification - costs, security, etc. &lt;/li&gt;&#xD;
&lt;li&gt;&lt;strong&gt;Continuous deployment&lt;/strong&gt; can be done by integrating the development and QA systems directly into the PaaS. As many of the banks are highly regulated the actual rollout to production will still involve more control but the cycle for rolling out up to staging environment could be fully automated and the time to release a new feature could be shortened significantly. &lt;/li&gt;&#xD;
&lt;li&gt;Many of the Big IT organizations have consolidated their use of language to Java and .Net. Hence by supporting Java and .Net, it is possible to cover 90% of the total applications. &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;h2&gt;Final notes&lt;/h2&gt;&#xD;
&lt;p&gt;At the end of the day, and hearing the different views, it was clear that the difference in challenges and culture between the two industries is still fairly large. In many cases, and for good reasons, Big IT is highly regulated, many of the applications are highly transactional, and don’t have the same level of flexibility in terms of consistency,  latency and even the types of applications that they are running. &lt;strong&gt;Those differences have been the excuse for people in the two camps to  dismiss each other's arguments and live in separate worlds in the way they operate and runs their applications.&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;p&gt;We're reaching the point where it becomes clear that there is a direct link between the way you run your applications and the way you run your business.&lt;/p&gt;&#xD;
&lt;p&gt;The Big Web world had the luxury to start with a clean slate and quickly iterate and develop efficient models for running their applications (and their businesses). For different reasons (regulation, legacy, applications) Big IT couldn’t move at the same speed but there no reason why it couldn’t adopt those same models even if it wouldn’t apply to their entire IT organization.&lt;/p&gt;&#xD;
&lt;p&gt;On the other hand, Big IT learned how to deal with scaling under extreme latency and throughput and without compromising on consistency. Big Web could benefit from some of the patterns that address this sort of extreme requirements without giving up on consistency or latency as I already noted in &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/02/a-interesting-note-on-google-megastore-cap.html"&gt;An interesting note on Google Megastore &amp;amp; CAP&lt;/a&gt;.&lt;/p&gt;&#xD;
&lt;p&gt;Anyway, I came out of this day encouraged that the two worlds can actually converge despite the challenges and differences. This sort of meeting is definitely a good and important start in that direction.&lt;/p&gt;&#xD;
&lt;p&gt;In the next post I’ll cover the second part of that day – specifically the DevOps pannel.&lt;/p&gt;&#xD;
&lt;h2&gt;References&lt;/h2&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/02/a-interesting-note-on-google-megastore-cap.html"&gt;An interesting note on Google Megastore &amp;amp; CAP&lt;/a&gt; &lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://perspectives.mvdirona.com/2010/02/24/ILoveEventualConsistencyBut.aspx"&gt;I Love Eventual Consistency but …&lt;/a&gt; &lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/03/productivity-vs-control-tradeoffs-in-paas.html"&gt;Productivity vs. Control tradeoffs in PaaS&lt;/a&gt; &lt;/li&gt;&#xD;
&lt;/ul&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=8mlSc-v_vrM:2EagLkcF4eg:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=8mlSc-v_vrM:2EagLkcF4eg:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/8mlSc-v_vrM" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/03/when-big-it-meets-big-web.html</feedburner:origLink></entry>
    <entry>
        <title>GigaSpaces New Cloud Platform - Sneak Preview</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/YjCNlCvzRxc/gigaspaces-new-cloud-platform-sneak-preview.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/03/gigaspaces-new-cloud-platform-sneak-preview.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef0147e315dc6b970b</id>
        <published>2011-03-08T23:14:44+01:00</published>
        <updated>2011-03-08T23:36:35+01:00</updated>
        <summary>The presentation below provides a sneak preview into one of the cool features of our upcomming Cloud Enabled Application Platform - Uniersal Service Manager (USM). The universal service manager enable to handle the deployment, elasticity, continues avliability, scalabiltiy of existing...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Connect" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="NOSQL" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="PaaS" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="SaaS" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;&lt;img alt="" border="0" height="0" src="http://c.gigcount.com/wildfire/IMP/CXNID=2000002.11NXC/bHQ9MTI5OTYyMjE3NDQ2NSZwdD*xMjk5NjIyNDIzNTYyJnA9MjU5ODkxJmQ9Jm49dHlwZXBhZCZnPTEmb2Y9MA==.gif" style="visibility: hidden; width: 0px; height: 0px;" width="0"&gt;&lt;/img&gt;&lt;/p&gt;&#xD;
&lt;div style="width: 425px; text-align: left;"&gt;The presentation below provides a sneak preview into one of the cool features of our upcomming &lt;a href="http://www.gigaspaces.com/paas-enablement" target="_self" title="PaaS/SaaS Enablement Platform"&gt;Cloud Enabled Application Platform&lt;/a&gt; - Uniersal Service Manager (USM).&lt;/div&gt;&#xD;
&lt;div style="width: 425px; text-align: left;"&gt;&lt;/div&gt;&#xD;
&lt;div style="width: 425px; text-align: left;"&gt;The universal service manager enable to handle the deployment, elasticity, continues avliability, scalabiltiy of existing application on any cloud or local data center. In this specific presentation were demonstrating how we can deploy NoSQL Service - Cassandra, using the USM , scale-it, monitor it, handle failure scenario etc. All that without writing any line of code and without any need for prior knowlage of GigaSpaes!&lt;/div&gt;&#xD;
&lt;div style="width: 425px; text-align: left;"&gt;&lt;/div&gt;&#xD;
&lt;div style="width: 425px; text-align: left;"&gt;All you need to do is point to your cassandra directory, write few shell scripts for handling the pre-post deployment and the rest is taken care of by the platform.&lt;/div&gt;&#xD;
&lt;div style="width: 425px; text-align: left;"&gt;&lt;/div&gt;&#xD;
&lt;div style="width: 425px; text-align: left;"&gt;Were showing a live demo of that feature on Cloud Connect, so if your in the area stop by our &lt;a href="http://www.gigaspaces.com/content/cloud-connect-expo"&gt;CloudConnect booth (106).&lt;img alt="" id="snap_com_shot_link_icon" src="http://i.ixnp.com/images/v6.58/t.gif"&gt;&lt;/img&gt;&lt;/a&gt;  &lt;/div&gt;&#xD;
&lt;div style="width: 425px; text-align: left;"&gt;&lt;/div&gt;&#xD;
&lt;div style="width: 425px; text-align: left;"&gt;For those interested in trying it out themselfs were going to make the first milestone avliable for public download by the end of this month.&lt;/div&gt;&#xD;
&lt;div style="width: 425px; text-align: left;"&gt;You can also send an email to (pm at gigaspaces dot com) and register for updates on this new release!&lt;/div&gt;&#xD;
&lt;div style="width: 425px; text-align: left;"&gt;&#xD;
&lt;p&gt;&lt;a href="http://www.slideboom.com/presentations/317140/GigaSpaces-new-cloud-platfom---sneak-preview-I" style="font: 14px Helvetica,Arial,Sans-serif; color: #0000cc; display: block; margin: 12px 0 3px 0; text-decoration: underline;" title="GigaSpaces new cloud platfom - sneak preview I"&gt;GigaSpaces new cloud platfom - sneak preview I&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;/div&gt;&#xD;
&lt;div style="width: 425px; text-align: left;"&gt;&#xD;
&lt;object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://fpdownload.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,28,0" height="370" id="onlinePlayer317140" width="425"&gt;&#xD;
&lt;param name="movie" value="http://www.slideboom.com/player/player.swf?id_resource=317140"&gt;&lt;/param&gt;&#xD;
&lt;param name="allowScriptAccess" value="always"&gt;&lt;/param&gt;&#xD;
&lt;param name="quality" value="high"&gt;&lt;/param&gt;&#xD;
&lt;param name="bgcolor" value="#ffffff"&gt;&lt;/param&gt;&#xD;
&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&#xD;
&lt;param name="flashVars"&gt;&lt;/param&gt; &lt;embed allowfullscreen="true" allowscriptaccess="always" bgcolor="#ffffff" height="370" name="onlinePlayer317140" pluginspage="http://www.macromedia.com/go/getflashplayer" quality="high" src="http://www.slideboom.com/player/player.swf?id_resource=317140" type="application/x-shockwave-flash" width="425"&gt;&lt;/embed&gt;&#xD;
&lt;/object&gt;&#xD;
&lt;/div&gt;&#xD;
&lt;div style="width: 425px; text-align: left;"&gt;&#xD;
&lt;div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;"&gt;View &lt;a href="http://www.slideboom.com" style="color: #0000cc;"&gt;more presentations&lt;/a&gt; or &lt;a href="http://www.slideboom.com/upload" style="color: #0000cc;"&gt;Upload&lt;/a&gt; your own.&lt;/div&gt;&#xD;
&lt;/div&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=YjCNlCvzRxc:vzyB7EUwDo8:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=YjCNlCvzRxc:vzyB7EUwDo8:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/YjCNlCvzRxc" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/03/gigaspaces-new-cloud-platform-sneak-preview.html</feedburner:origLink></entry>
    <entry>
        <title>Productivity vs. Control tradeoffs in PaaS</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/NatiShalom/~3/LyxMXS_bvYQ/productivity-vs-control-tradeoffs-in-paas.html" />
        <link rel="replies" type="text/html" href="http://natishalom.typepad.com/nati_shaloms_blog/2011/03/productivity-vs-control-tradeoffs-in-paas.html" thr:count="1" thr:updated="2011-03-12T21:42:25+01:00" />
        <id>tag:typepad.com,2003:post-6a00d835457b7453ef014e8682a66f970d</id>
        <published>2011-03-08T03:05:07+01:00</published>
        <updated>2011-03-07T18:16:38+01:00</updated>
        <summary>Gartner published recently an interesting paper: Productivity vs. Control: Cloud Application Platforms Must Split to Win. (The paper requires registration.) The paper does a pretty good job covering the evolution that is taking place in the PaaS market toward a...</summary>
        <author>
            <name>Nati Shalom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Agile" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Cloud Computing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="GigaSpaces" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="PaaS" />
        
        
<content type="html" xml:lang="he" xml:base="http://natishalom.typepad.com/nati_shaloms_blog/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;&lt;img align="right" alt="" src="http://t0.gstatic.com/images?q=tbn:ANd9GcSqAsK3JuM8GF_ES9dNHCGbJc4RiwBxF-i1MRgzLUJwDIHuzB8eaw" style="display: inline; float: right;"&gt;&lt;/img&gt;Gartner published recently an interesting paper: &lt;a href="http://www.gartner.com/DisplayDocument?id=1526216"&gt;Productivity vs. Control: Cloud Application Platforms Must Split to Win&lt;/a&gt;. (The paper requires registration.)&lt;/p&gt;&#xD;
&lt;p&gt;The paper does a pretty good job covering the evolution that is taking place in the PaaS market toward a more open platform and compares between the two main categories: aPaaS (essentially a PaaS running as a service) and CEAP (Cloud Enabled Application Platform) which is the  *P* out of PaaS that gives you the platform to build your own PaaS in private or public cloud.&lt;/p&gt;&#xD;
&lt;p&gt;According to Gartner the main split between the two categories is Productivity vs Control:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;The cloud application platform markets are splitting to support two different constituencies: mainstream application developments that are focused on fast time to deployment, and advanced projects requiring the full control of the underlying cloud application platform attributes.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;The paper also provides useful takeaways that follow that line of thought:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;The aPaaS market is now dominated by high-productivity offerings targeting mainstream, often opportunistically oriented, rapid application deployments, at times implemented by "citizen developers" without formal IT department approval. &lt;/li&gt;&#xD;
&lt;li&gt;High-control offerings are emerging to support the most-advanced, systematically oriented requirements. These projects often look primarily for a CEAP, rather than an aPaaS, to enable maximum control over the technology environment.&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;h2&gt;My take:&lt;/h2&gt;&#xD;
&lt;p&gt;While I was reading through the paper I felt that something continued to bother me with this definition, even though I tend to agree with the overall observation. If I follow the logic of this paper than I have to give away productivity to gain control, hmm…  that’s a hard choice.&lt;/p&gt;&#xD;
&lt;p&gt;The issue seem to be with the way we define productivity. Let me explain.&lt;/p&gt;&#xD;
&lt;h2&gt;Productivity – redefined&lt;/h2&gt;&#xD;
&lt;p&gt;The term "productivity" is pretty wide. Even after a short research on the subject it became clear to me that even today were lacking an agreed-upon model for measuring &lt;a href="http://en.wikipedia.org/wiki/Programming_productivity"&gt;Programming productivity&lt;/a&gt; as noted in the wikipedia definition:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;A generally accepted working definition of programmer productivity needs to be established and agreed upon. Appropriate metrics need to established. Productivity needs to be viewed over the lifetime of code. Example: Programmer A writes code in a shorter interval than programmer B but programmer A's code is of lower quality and months later requires additional effort to match the quality of programmer B's code; in such a case, it is fair to claim that programmer B was actually more productive.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;During my short research I came across an &lt;a href="http://agile101.net/2009/09/01/measuring-programmer-productivity-a-scientific-study/"&gt;interesting study&lt;/a&gt; on the subject by Google. I thought that the study went a long way toward defining productivity. It does so by taking a a specific challenge (HPC in this specific case) and measured the actual progress students made on developing a certain task at different stages using various technologies.  For example, it measured how much time was spent to develop a certain algorithm, the time it took to debug it, optimize it and get it working.  One of the interesting outcomes of this study that seems to be generally applicable to many other technologies is that we often spend more time debugging and optimizing code than on the actual development. It was also found that &lt;strong&gt;the biggest bang for the buck is gained by shortening the transition between the various development cycles&lt;/strong&gt; and less by reducing the time of each individual cycle. (I’ll get back to that later on in this post.)&lt;/p&gt;&#xD;
&lt;h3&gt;Which platform is more productive?&lt;/h3&gt;&#xD;
&lt;p&gt;On a question comparing the RoR and Garils platforms on stackoverflow (&lt;a href="http://stackoverflow.com/questions/853624/is-developer-productivity-higher-on-ruby-on-rails-or-grails"&gt;Is developer productivity higher on Ruby on Rails or Grails?&lt;/a&gt;) it became clear that answer to the question is &lt;em&gt;always&lt;/em&gt; subjective as it depends on your existing skillset, legacy, framework etc. In other words, there is no clear winner and the answer can be different on a per case basis.  This brings me to the point that its going be hard to claim that one platform is more productive than the other without taking the specific context in mind.&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;Also see Joseph Ottinger's post, "&lt;a href="http://www.enigmastation.com/?p=173" target="_self"&gt;It's not about C++ and Java performance&lt;/a&gt;," which addresses a series of benchmarks people were trying to use to assert than Java was better than C++ and vice versa.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;h3&gt;Defining Platform-Productivity&lt;/h3&gt;&#xD;
&lt;p&gt;The definition that seem to resonate best for me was provided by &lt;a href="http://stackoverflow.com/users/98487/cam-wolff"&gt;Can Wolf&lt;/a&gt; on one of his comments on stackoverflow thread above:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;&lt;strong&gt;Productivity is measured by units of features being delivered (not lines of code).&lt;/strong&gt;&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;Can Wolf's comment in brackets is even more interesting. In many of the recent language debate centered around productivity we seem to be zeroing on the number of lines of code required to deliver a given functionality as the main measure for productivity. As I noted in the opening of this section reducing the number of lines of code covers only a narrow aspect of productivity - and not necessarily the best one. &lt;/p&gt;&#xD;
&lt;p&gt;Measuring the the time it takes to bring new feature from development into production is where I believe we should put the focus when we measure productivity as this is what really matters – by looking at that measure we may find that language choice has actually only a marginal contribution on our overall productivity in comparison to how well we can speed up the time it takes to go through the QA and Production cycles. We may also find that the right choice would actually be to combine different languages for different purposes.&lt;/p&gt;&#xD;
&lt;p&gt;In a case of a PaaS platform that means that a platform that makes it extremely easy to write your own hello-world application wouldn’t be considered productive if it fails to take you effectively through the entire debugging and optimization cycles into production.&lt;/p&gt;&#xD;
&lt;h2&gt;Productivity vs. Control tradeoffs in PaaS&lt;/h2&gt;&#xD;
&lt;p&gt;I started this post by quoting Gartner on the choices between Productivity and Control when looking into the two main categories in PaaS market.&lt;/p&gt;&#xD;
&lt;p&gt;I thought that &lt;strong&gt;Carlos Ble's post &lt;/strong&gt;&lt;a href="http://www.carlosble.com/2010/11/goodbye-google-app-engine-gae/"&gt;Goodbye Google App Engine (GAE)&lt;/a&gt; is a good example that illustrate why the initial perception behind GAE as simple platform that provides extreme productivity can be completely wrong.&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;..developing on GAE introduced such a design complexity that working around it pushes us 5 months behind schedule.&lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;Part of the reason that brought Carlos through that experience IMO is that in the course of trying to make GAE extremely productive the owner made the platform &lt;strong&gt;&lt;span style="text-decoration: underline;"&gt;too opinionated&lt;/span&gt; &lt;/strong&gt;to the point where you lose all the potential productivity gains by trying to adopt their model. In addition to that, with a platform like GAE you have very little freedom to leverage existing frameworks such as your own database, or messaging system, or any other third party service that can by itself be a huge contributor to productivity.&lt;/p&gt;&#xD;
&lt;p&gt;Instead, you're completely dependent on the platform provider's stack and pace of development and that in itself can work against agility and productivity in yet another dimension. In this specific example Carlos couldn’t use a certain version of a Python library which would have made his productivity higher and had to work around issues that were already solved elsewhere.  This is a good example how the lack of flexibility lead to poorer productivity even in the case of simple applications.&lt;/p&gt;&#xD;
&lt;h3&gt;Amazon PaaS (BeanStalk) vs Google App Engine – Productivity + Control&lt;/h3&gt;&#xD;
&lt;p&gt;It's interesting to compare Google App Engine with Amazon Beanstalk. Amazon provides full control over almost every piece of their platform. You could choose your own operating system and integrate with  external database and other services of your choice, etc. At the same time, Amazon provide layers of abstraction on top of their infrastructure through services like RDS, SimpleDB, SQS, MapReduce, and Beanstalk. In addition, to that they have an network of ecosystem PaaS providers such as Heroku, GigaSpaces and others that run on top of their IaaS services and complement their offering.&lt;/p&gt;&#xD;
&lt;h3&gt;Open productivity&lt;/h3&gt;&#xD;
&lt;p&gt;Amazon's approach is to provide what I would call &lt;strong&gt;Open Productivity. It&lt;/strong&gt; enables you to start with extreme productivity where you would give up some degree of control at the earlier stages. Unlike GAE you're not locked into any of those layers of abstraction and you always have the choice to go another level down and pick your own database of choice, OS framework, etc.&lt;/p&gt;&#xD;
&lt;p&gt;Amazon's approach is pretty much aligned with the approach that we took in GigaSpaces in 2009 when we developed our first generation PaaS, as I noted in this post &lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2009/06/google-app-engine-plus-amazon-aws-best-of-both-worlds.html"&gt;Google App Engine plus Amazon AWS: Best of both worlds&lt;/a&gt;. In other words, you don’t always have to trade extreme productivity for control. If the platform architecture is layered correctly it is possible to get a good degree of productivity and control as in the case of Amazon and GigaSpaces.  Choosing between the degrees of productivity and control becomes a decision of which layer of abstraction to start with and not necessarily which platform to choose from.&lt;/p&gt;&#xD;
&lt;p&gt;Doing it the other way around - i.e. providing &lt;strong&gt;closed productivity&lt;/strong&gt; first - without a good degree of control could lead to high productivity at the beginning but end up with low productivity as the project evolves as noted in Carlos' experience. In addition to that it is often much harder to take a closed productivity platform and open its control levels at a later stage if the platform was not designed for it from the start.&lt;/p&gt;&#xD;
&lt;p&gt;The caveat though is that it often takes more time to start with an open platform and build the layer of abstraction on top. In my view, this leads to a better evolution as you grow with your users and follow their demand rather than the other way around.&lt;/p&gt;&#xD;
&lt;h2&gt;Bottom line – It’s not about productivity&lt;/h2&gt;&#xD;
&lt;p&gt;After working on this short analysis, I came to the conclusion that since productivity is a broad term it would be wrong to think that our choice is to trade productivity with control when we select aPaaS or CEAP platforms.&lt;/p&gt;&#xD;
&lt;p&gt;To put it simply the main difference between the current class of aPaaS and CEAP is that most aPaaS are targeted toward the long-tail of web applications where CEAP is targeted to the more high end part of the spectrum. &lt;/p&gt;&#xD;
&lt;p&gt;All of the platform would try to provide extreme productivity for the type of applications that they are targeting.&lt;/p&gt;&#xD;
&lt;p&gt;A better way to make a platform choice would be based on the target applications that each platform is aimed for. For example I would look at GAE for extremely small Java or Payhon applications (widgets) or prototype, Heroku for small Ruby based App, Force for an application that needs to integrate more tightly with the salesforce ecosystem.&lt;/p&gt;&#xD;
&lt;p&gt;I would look for Beanstalk if I’m already running a simple java application and don’t plan to switch off of Amazon anytime soon. I would choose CEAP such as GigaSpaces in cases where I don’t want to be bound to a particular cloud provider and want the flexibility to run the same application on variety of environments private or public with or even without virtualization, In Java or .Net.  The later would fit into most enterprises and the more high end SaaS providers where the first would probably more into end users applications.&lt;/p&gt;&#xD;
&lt;h3&gt;Extensibility &amp;amp; Flexibility&lt;/h3&gt;&#xD;
&lt;p&gt;One of the attributes of control is extensibility and flexibility. Extensibility and flexibility is another factor worth considering when selecting a platform. As it stands today, the various platforms tend to be quite different in the level of choice that they offer in language, frameworks, and the ability to plug-in your service of choice or using only parts of the platform (database, messaging, ..). As noted earlier in the GAE example, flexibility can actually lead to greater productivity.&lt;/p&gt;&#xD;
&lt;h2&gt;What’s next.. toward a 2ndG PaaS&lt;/h2&gt;&#xD;
&lt;p&gt;If we follow the evolution of aPaaS and CEAP solutions, we can see a consistent shift toward more open platforms than the one that was available in the first generation PaaS. With the emergence of more open infrastructures in the form of OpenStack, as well as the evolution of enterprise data centers toward hybrid clouds, we can expect even higher degree of flexibility not just on Amazon but on any cloud (private or public). The other thing that comes up often in many of the recent DevOps discussions is the demand to enable similar level of productivity and automation that Amazon (or GigaSpaces for that matter) provides for its own built-in services to external services of my choice.&lt;/p&gt;&#xD;
&lt;p&gt;Gartner's paper does a petty good job covering that shift in the PaaS market but the use of productivity as the differentiating factors can be misleading as I outlined in this paper.&lt;/p&gt;&#xD;
&lt;p&gt;The good news is that they were expected to get much better selection of choices between the various tradeoffs around productivity and control.  I will dive into the details of that in a followup post.&lt;/p&gt;&#xD;
&lt;h2&gt;PaaS trend discussion @ Cloud Connect&lt;/h2&gt;&#xD;
&lt;p&gt;&lt;a href="http://www.gigaspaces.com/content/cloud-connect-expo"&gt;&lt;img align="left" alt="" height="119" src="http://t0.gstatic.com/images?q=tbn:ANd9GcRm9Hx-sAEoviPj_uJtS1VH0jbmX-WRolr50G7xZLqEx5TM_Lsd" style="margin: 0px 13px 0px 0px; display: inline; float: left;" width="119"&gt;&lt;/img&gt;&lt;/a&gt;&lt;img alt="" src="http://www.cloudconnectevent.com/images/clear.gif"&gt;&lt;/img&gt;In this post I tried to lay out my own analysis on this matter. I’m very interested in other people experience on this subject so If this topic is of interest to you and you happened to be in CloudConnect next week I’d be happy if you would drop me an email (natis at gigaspaces dot com) or simply meet me at our &lt;a href="http://www.gigaspaces.com/content/cloud-connect-expo"&gt;CloudConnect booth (106).&lt;/a&gt; &lt;/p&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;h2&gt;References&lt;/h2&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;a href="http://www.gartner.com/DisplayDocument?id=1526216"&gt;Productivity vs. Control: Cloud Application Platforms Must Split to Win&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://agile101.net/2009/09/01/measuring-programmer-productivity-a-scientific-study/"&gt;Measuring Programmer Productivity – A scientific study&lt;/a&gt; &lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://en.wikipedia.org/wiki/Programming_productivity"&gt;Programming productivity&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://stackoverflow.com/questions/853624/is-developer-productivity-higher-on-ruby-on-rails-or-grails"&gt;Is developer productivity higher on Ruby on Rails or Grails?&lt;/a&gt;&lt;/li&gt;&#xD;
&lt;li&gt;&lt;a href="http://natishalom.typepad.com/nati_shaloms_blog/2009/06/google-app-engine-plus-amazon-aws-best-of-both-worlds.html"&gt;Google App Engine plus Amazon AWS: Best of both worlds&lt;/a&gt;.&lt;/li&gt;&#xD;
&lt;/ul&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=LyxMXS_bvYQ:cGxJxIRzAvs:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/NatiShalom?a=LyxMXS_bvYQ:cGxJxIRzAvs:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/NatiShalom?d=qj6IDK7rITs" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/NatiShalom/~4/LyxMXS_bvYQ" height="1" width="1"/&gt;</content>



    <feedburner:origLink>http://natishalom.typepad.com/nati_shaloms_blog/2011/03/productivity-vs-control-tradeoffs-in-paas.html</feedburner:origLink></entry>
 
</feed><!-- ph=1 -->

