<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>DBMS 2 : DataBase Management System Services</title>
	
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 02 Sep 2010 09:06:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/dbms2/feed" /><feedburner:info uri="dbms2/feed" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>More on NoSQL and HVSP (or OLRP)</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/3xIj3xLQT4A/</link>
		<comments>http://www.dbms2.com/2010/08/26/nosql-hvsp-olrp/#comments</comments>
		<pubDate>Thu, 26 Aug 2010 09:10:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Basho and Riak]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Riptano]]></category>
		<category><![CDATA[Schooner]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Tokutek]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2907</guid>
		<description><![CDATA[Since posting last Wednesday morning that I&#8217;m looking into NoSQL and HVSP, I&#8217;ve had a lot of conversations, including with (among others):

Dwight Merriman of 10gen (MongoDB)
Damien Katz of Couchio (CouchDB)
Matt Pfeil of Riptano (Cassandra)
Todd Lipcon of Cloudera (HBase committer)
Tony Falco of Basho (Riak)
John Busch of Schooner
Ori Herrnstadt of Akiban

By no means do I have time [...]]]></description>
			<content:encoded><![CDATA[<p>Since posting last Wednesday morning that <a href="http://www.dbms2.com/2010/08/18/nosql-hvsp-adoption/" >I&#8217;m looking into NoSQL and HVSP</a>, I&#8217;ve had a lot of conversations, including with (among others):</p>
<ul>
<li>Dwight Merriman of 10gen (MongoDB)</li>
<li>Damien Katz of Couchio (CouchDB)</li>
<li>Matt Pfeil of <a href="http://www.dbms2.com/2010/07/06/riptano-and-cassandra-adoption/" >Riptano</a> (Cassandra)</li>
<li>Todd Lipcon of Cloudera (HBase committer)</li>
<li>Tony Falco of Basho (Riak)</li>
<li>John Busch of Schooner</li>
<li><strong><span style="font-weight: normal;">Ori Herrnstadt</span></strong> of <a href="http://www.dbms2.com/2010/04/03/akiban-highlights/" >Akiban</a></li>
</ul>
<p><span id="more-2907"></span>By no means do I have time to do these conversations justice, in terms of giving them the write-ups and/or immediate follow-up that they deserve. Indeed, I&#8217;ll leave for vacation Saturday morning with my 2000-word NoSQL article still unwritten. So I&#8217;ll dump as many observations as I can into one or a few posts now, and play catch-up later as circumstances allow.</p>
<p>In no particular order:</p>
<ul>
<li>A number of NoSQL offerings have had more uptake to date than most of the scale-out SQL offerings have.</li>
<li>&#8220;Document-oriented&#8221; NoSQL projects CouchDB and MongoDB have probably had the most users get into production, but perhaps for pretty small systems.</li>
<li>Cassandra and Hbase &#8212; the column-group-architecture guys &#8212; have probably had the most bang-in-lots-of-writes <a href="http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/" >HVSP</a> production uptake.*</li>
<li>I didn&#8217;t talk customer count with Schooner, but the decently-stocked <a href="http://www.schoonerinfotech.com/customers" onclick="javascript:pageTracker._trackPageview('/www.schoonerinfotech.com');">Schooner customer page</a> suggests Schooner may be something of an exception to these generalities.</li>
<li>A lot of these companies are in the low-to-mid-teens of employees.</li>
<li>The SQL-oriented companies, despite having fewer or no customers, often seem to have more money. (One reason I get the impression SQL guys have more money is, frankly, that more  of them are talking about engaging <a href="http://www.monash.com/advantage.html" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">my services</a>.)
<ul>
<li>Schooner cites $20 million in VC.</li>
<li><a href="http://www.dbms2.com/2010/05/12/the-clustrix-story/" >Clustrix</a> cites a figure close to that.</li>
<li>Basho cites $10 million, plus <a href="http://www.masshightech.com/stories/2010/08/02/daily35-Basho-rejects-VC-takes-late-friends-and-family-round.html" onclick="javascript:pageTracker._trackPageview('/www.masshightech.com');">a new round of $1.5 or $2 or $2.5 million</a>. The new round is at a  lowered valuation.</li>
<li>That same site says <a href="http://www.dbms2.com/2009/04/16/introduction-to-tokutek/" >Tokutek</a> finally was able to<a href="http://www.masshightech.com/stories/2010/08/16/daily47-Database-software-firm-Tokutek-lands-28M.html" onclick="javascript:pageTracker._trackPageview('/www.masshightech.com');"> raise some VC</a>. Congrats!</li>
</ul>
</li>
<li>It&#8217;s only a two-company trend, but I was pleased to hear that both 10gen/MongoDB and Akiban were seeing Drupal as a major use case or potential use case. No word on rescuing WordPress from its MySQL implementation, alas, but it seems that a Drupal site typically has 40-200+ tables, while a WordPress one has 10ish.</li>
<li>Another trend I think I&#8217;m seeing is serious object-oriented apps banging things straight into a simple back end. <a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >Workday</a> is a huge example of that. Akiban hopes to do something similar with Hibernate.</li>
<li>Stability and maturity are still issues for many of these products. E.g., HBase isn&#8217;t even in Release 1.0 yet. Ditto Cassandra, and surely many of the others. Unsurprisingly, <a href="http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html" onclick="javascript:pageTracker._trackPageview('/blog.mikiobraun.de');">making Cassandra stable is still a challenge</a>.</li>
</ul>
<p><em>*As is common for terms I suggest, the &#8220;HVSP&#8221; name is not getting any traction. What do you think of Marton Trencseni&#8217;s suggestion of <a href="http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/#comment-182138" >OLRP, for OnLine Request Processing</a>?</em></p>
<p>One thing that makes following this area interesting is that so many projects are open source, leading there to be a lot of information in the wild. I hardly have time to read the mailing list for each project; but the people I talk with do, and often they may sorta kinda remember something somebody else posted one or several months back. As just one example, the mailing lists are said to confirm:</p>
<ul>
<li>Contrary to rumor, <a href="http://twitter.com/eventcloudpro/status/17872687577" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Facebook hasn&#8217;t moved in-box search off of Cassandra</a>.</li>
<li>Apparently, however, it&#8217;s true that <a href="http://www.dbms2.com/2008/07/21/project-cassandra-facebook-open-sourced-quasi-dbms/" >Cassandra inventor Facebook</a> has stopped working on Cassandra, and Facebook&#8217;s core Cassandra developers have shifted over to HBase.</li>
</ul>
<p>Also, figuring out usage of open source software can be &#8230; interesting.</p>
<ul>
<li> People who use open source software don&#8217;t have to reveal themselves, as there&#8217;s no purchase transaction to kick things off.</li>
<li>On the other hand, if they&#8217;re serious enough in their use, they often do.
<ul>
<li>There are two main ways to get tech support for open source software &#8212; the community or a company that sells support &#8212; and both ways let the main support-selling company know that one is a user.</li>
<li>Some folks even add themselves to open lists of users, for example these rather long lists for <a href="http://wiki.apache.org/hadoop/Hbase/PoweredBy" onclick="javascript:pageTracker._trackPageview('/wiki.apache.org');">HBase</a> and <a href="http://wiki.apache.org/couchdb/CouchDB_in_the_wild" onclick="javascript:pageTracker._trackPageview('/wiki.apache.org');">CouchDB</a>.</li>
<li>Or they show up at conferences. For example, <a href="http://twitter.com/spyced/status/21490457839" onclick="javascript:pageTracker._trackPageview('/twitter.com');">two</a> <a href="http://twitter.com/spyced/status/21675203015" onclick="javascript:pageTracker._trackPageview('/twitter.com');">tweets</a> from Riptano founder Jonathan Ellis suggest at least 30 production Cassandra users were represented at a recent event. That&#8217;s more detail than his colleague Matt Pfeil wanted to give me when talked. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
</li>
</ul>
<p>OK. This post has gotten pretty long, even without me saying anything resembling an overview of any of the seven companies I listed up top, or of their products&#8217; adoption. So I&#8217;ll just publish this now, and edit in links below to follow-on posts if and when they become available.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/3xIj3xLQT4A" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/26/nosql-hvsp-olrp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/08/26/nosql-hvsp-olrp/</feedburner:origLink></item>
		<item>
		<title>Workday comments on its database architecture</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/iRLrN7R0R7w/</link>
		<comments>http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/#comments</comments>
		<pubDate>Sun, 22 Aug 2010 10:20:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Workday]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2874</guid>
		<description><![CDATA[In my discussion of Workday&#8217;s technology, I gave an estimate that Workday&#8217;s database, if relationally designed, would require “1000s” of tables. That estimate came from Workday, Inc. CTO Stan Swete, in a thoughtful email that made several points about Workday&#8217;s database strategy. Workday kindly gave me permission to quote it below.


I would say thousands. The [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in; page-break-before: always;"><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">In my discussion of </span></span></span><span style="font-size: small;"><span style="font-weight: normal;"><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >Workday&#8217;s technology</a>,</span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;"> I gave an estimate that Workday&#8217;s database, if relationally designed, would require “1000s” of tables. That estimate came from Workday, Inc. CTO Stan Swete, in a thoughtful email that made several points about Workday&#8217;s database strategy. Workday kindly gave me permission to quote it below.</span></span></span><br />
<span id="more-2874"></span></p>
<blockquote>
<p style="margin-bottom: 0in; font-style: normal; font-weight: normal;"><span style="font-size: small;">I would say thousands. The object model for our applications consists of over 2000 classes. On average these classes have multiple relationships with other classes so that would have some kind of multiplicative effect when it came to using tables.</span></p>
<p style="margin-bottom: 0in; font-style: normal; font-weight: normal;"><span style="font-size: small;">One example of where you’d be proliferating tables (and not getting as satisfactory of a solution relationally) is worktags. Currently we have a class for worktags. Instances of this class can point to various instances of detail lines (expense lines, po lines, invoice lines, etc…). A detail line can have many worktags pointing to it. To model this relationally you’d need either a separate table for each type of detail line in the system to store the tags associated with it or a single worktag for detailed line table that could be foreign keyed for all types of detail lines that would store their worktag. Either way involves more tables and more clunkiness.</span></p>
<p style="margin-bottom: 0in; font-style: normal; font-weight: normal;"><span style="font-size: small;">Another example of where our oo designs wouldn’t directly translate is our ability to describe to shared part of a detail line in one class and have all instances of detail lines inherit the fields that are shared. To do this relationally you’d probably replicate the shared fields in each table representing the various kinds of transactional details (again lines, po lines, invoice lines, etc…). You’d lose the ability to maintain and change the shared fields (and the processing logic for those fields) in one place.</span></p>
<p style="margin-bottom: 0in; font-style: normal; font-weight: normal;"><span style="font-size: small;">Anyway, I’d go with “thousands” as our answer. I do think this is an interesting question and wish we had more time to figure out a more accurate answer.</span></p>
</blockquote>
<p><em><strong>This post is part of a three-post series</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2010/08/22/workday-inc-company-overview/" >Workday Inc. company overview</a> (brief)</li>
<li><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >Workday Inc. technology overview</a> (detailed)</li>
<li>Workday Inc. CTO Stan Swete&#8217;s <a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >comments on database strategy</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/iRLrN7R0R7w" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/</feedburner:origLink></item>
		<item>
		<title>The Workday architecture — a new kind of OLTP software stack</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/k0rxJZVA96A/</link>
		<comments>http://www.dbms2.com/2010/08/22/workday-technology-stack/#comments</comments>
		<pubDate>Sun, 22 Aug 2010 10:20:08 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Object]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Workday]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2865</guid>
		<description><![CDATA[One of my coolest company visits in some time was to  SaaS  (Software as a Service) vendor Workday, Inc., earlier this month. Reasons included:

Workday has 	forward-thinking ideas about SaaS enterprise 	applications and the integration of business intelligence into same.
Workday has highly 	innovative ideas in how it manages data.
Companies founded by 	Dave Duffield tend [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><span style="font-size: small;">One of my coolest company visits in some time was to </span><span style="font-size: small;"> SaaS  (Software as a Service) vendor</span><span style="font-size: small;"> Workday, Inc., earlier this month. Reasons included:</span></p>
<ul>
<li><span style="font-size: small;">Workday has 	forward-thinking ideas about SaaS enterprise 	applications and the integration of business intelligence into same.</span></li>
<li><span style="font-size: small;">Workday has highly 	innovative ideas in how it manages data.</span></li>
<li><span style="font-size: small;">Companies founded by 	Dave Duffield tend to feature smart, likeable people who talk to one</span><span style="font-size: small;"><span style="font-style: normal;"> pleasantly and forthrightly. Workday is no exception; CTO Stan Swete 	and the other Workday folks present were a delight to talk with.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">I&#8217;d 	invited Merv Adrian to come along with me. He asked great questions, 	and I could gather myself a bit despite how sleep-deprived I was for 	the first part of that trip.</span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Workday kindly allowed me to post this </span></span><span style="font-size: small;"><a href="http://www.monash.com/uploads/Workday-August-2010.ppt" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">Workday slide deck</a>.</span><span style="font-size: small;"><span style="font-style: normal;"> Otherwise, I&#8217;ve split out a quick </span></span><a href="http://www.dbms2.com/2010/08/22/workday-inc-company-overview/" ><span style="font-size: small;">Workday, Inc. company overview</span></a><span style="font-size: small;"><span style="font-style: normal;"> into a separate post.</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">The biggie for me was the data and object management part. Specifically:  <span id="more-2865"></span><br />
</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><strong>Workday&#8217;s 	applications run entirely in-memory,</strong></span></span><span style="font-size: small;"><span style="font-style: normal;"> in a highly object-oriented structure. Persistence is mainly for the 	sake of data safety …</span></span></li>
<li>… <span style="font-size: small;"><span style="font-style: normal;">but 	not entirely. In earlier releases, Workday kept absolutely 	everything in RAM. However, certain things are kept only on disk, 	such as:</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Audit 	files.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Certain 	documents (notably resumes).</span></span></li>
</ul>
</li>
<li><strong><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	whole database</span></span></strong><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;"> – data and metadata alike – is persisted to disk in </span></span></span><strong><span style="font-size: small;"><span style="font-style: normal;">&lt;10 	MySQL/InnoDB tables. </span></span></strong><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">MySQL 	is basically just being used as a </span></span></span><strong><span style="font-size: small;"><span style="font-style: normal;">key-value 	store, </span></span></strong><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">albeit 	one with </span></span></span><strong><span style="font-size: small;"><span style="font-style: normal;">ACID 	transactional support. </span></span></strong>
<ul>
<li><span style="font-size: small;">There <span style="font-weight: normal;">are </span><strong>3 main tables: attributes, relationships, instances.</strong></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">When 	I suggested this might be like an entity-attribute-value model, 	Workday said it would be even better to think in terms of</span><span style="font-style: normal;"><strong> instanceID-attribute-value.</strong></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">As 	you might expect for a database that simple, its schema doesn&#8217;t 	change much.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">By 	way of comparison, Workday estimates that if its software were 	written relationally, </span></span></span><span style="font-size: small;"><span style="font-style: normal;">there 	would b</span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">e </span></span></span><span style="font-size: small;"><span style="font-weight: normal;"><a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >1000s 	of tables</a>,</span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;"> which</span></span></span><span style="font-size: small;"><span style="font-style: normal;"> would take up 10-100X as much disk space. </span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">All 	write transactions are banged immediately into the MySQL database. 	I.e., RAM and disk are never allowed to get out of sync.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	database is append-only. This is exploited for effective dating 	(pretty heavily, it seems, perhaps because that&#8217;s a useful concept 	in human resources) and snapshotted reporting.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	built-in BI doesn&#8217;t have a lot of choice but to do scans, traversing 	the object model. This turns out to be fast enough.</span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;">Other notes on Workday&#8217;s data and object management strategy include:</span></p>
<ul>
<li><span style="font-size: small;">Workday is 	object-oriented through and through – no object-relational mapping 	&#8211; <a href="http://en.wikipedia.org/wiki/Turtles_all_the_way_down" onclick="javascript:pageTracker._trackPageview('/en.wikipedia.org');">turtles 	all the way down</a>. On average, a class has about 2 attributes.</span></li>
<li><span style="font-size: small;">94% of requests are 	reads, traversing the object hierarchy.</span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	databases are pretty small.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">The 	biggest database Workday supports uses 17 gigabytes of RAM. </span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	databases are much smaller on disk than in RAM.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;">Workday&#8217;s “dream” 	is to move from disk to solid-state memory. </span></li>
<li><span style="font-size: small;">Workday uses GPLed 	MySQL/InnoDB. So there&#8217;s no software license reason to ever move 	away (e.g., to a pure key-value store).</span></li>
<li><span style="font-size: small;">Disaster recove</span><span style="font-size: small;"><span style="font-style: normal;">ry 	is based on local and remote MySQL slaves. </span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Obviously, serious apps have been built before in object-oriented and/or key-value ways, with the resulting objects then being banged to disk (or in some cases kept in memory). Examples include:</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Numerous 	applications are built on <a href="../2010/01/15/intersystems-cache-highlights/">object-oriented 	DBMS</a>. Generally they go against disk, although <a href="../2005/11/14/defining-and-surveying-memory-centric-data-management/">memory-centric 	implementations can save a lot of pointer-chasing</a>. Often they&#8217;re 	queried via SQL.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Basho&#8217;s 	website says that its key-value store Riak was originally conceived 	in connection with a planned salesforce automation product, but I 	don&#8217;t think that the application part of that plan ever got built. </span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">SAP 	has <a href="../2005/12/09/36/">longstanding</a> doubts about relational dogma, although not nearly to Workday&#8217;s 	extreme.</span></span></li>
<li><span style="font-size: small;">Obviously, 	some major internet applications just bang data into key-value 	stores.</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Still, perhaps because it wholly object-oriented yet doesn&#8217;t even bother with anything like a real object-oriented DBMS, Workday&#8217;s approach seems particularly cool. </span></span></p>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Other highlights of Workday, Inc.&#8217;s technical story include:</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	has settled into a schedule of three releases per year, and has 	pretty much lived up to that for &gt;2 years.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Every 	user is always on the latest Workday release.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">You 	can delay turning on significant new Workday software functionality 	if you want to.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Pure 	UI changes to the Workday software are handled much as they are on 	various websites today. Sometimes you have no choice but to live 	with them; sometimes the prior version of the UI remains available 	to you for a while.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday&#8217;s 	navigational approaches look pretty cool.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">The 	core concept is a list of actions you can perform now, rather than 	more standard menus.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Roles/permissions 	are of course central to this.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Reports 	have lots of actionable links in them. (More than just drilldown, 	although specific examples have slipped my memory.)</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Alternatively, 	you can navigate via a search box, searching both on names of 	objects (e.g. users, divisions) or on names of tasks. This is 	somewhat reminiscent of <a href="http://www.texttechnologies.com/2007/02/28/sap%E2%80%99s-%E2%80%9Csearch%E2%80%9D-strategy-isn%E2%80%99t-about-search/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">an 	approach SAP was considering a few years ago</a>.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;">Workday says it has 	four key design premises:</span>
<ul>
<li><span style="font-size: small;"><em>Web-Familiar 	Experience.</em> I&#8217;d say that&#8217;s true to to the extent it makes sense. 	In many ways, the web needs to catch up to Workday.</span></li>
<li><span style="font-size: small;"><em>Enterprise 	Reporting.</em> The idea is that you get a report, then take actions 	based on it. Hence the report-centric options for navigation.</span></li>
<li><span style="font-size: small;"><em>Integration 	On-Demand.</em> That&#8217;s a fancy way of saying “Plays nicely with 	others.”</span></li>
<li><span style="font-size: small;"><em>Configurable 	Business Processes.</em><span style="font-style: normal;"> Duh. That&#8217;s 	pretty essential if you want to do serious SaaS applications.</span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	maintains a strong separation between application logic and UI 	development. Developer do no screen layouts. Instead, Uis are 	automatically generated for:</span></span>
<ul>
<li><span style="font-size: small;">Flash/FLEX</span></li>
<li><span style="font-size: small;">iPhone</span></li>
<li><span style="font-size: small;">Mobile HTML</span></li>
<li><span style="font-size: small;">PDF export</span></li>
<li><span style="font-size: small;">Excel export</span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	only talks to the outside world via web services.</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	is heavily </span></span><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">into 	SOAP (Simple Object Access Protocol). </span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">The 	acquisition of OEM partner CapeClear gave Workday an Integration 	Service (i.e., enterprise service bus) that translates SOAP into 	whatever else might be needed for integration, and also does 	reliable delivery. </span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">All 	that said, Stan Swete sees integration among various SaaS offerings 	as an area needing significant future attention.</span></span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday&#8217;s 	business intelligence ideas are interesting, but I think there&#8217;s a 	long way for that technology still to go.</span></span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday&#8217;s 	BI seems to be focused on report/drilldown kinds of functionality.</span></span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">You 	can slice by up to 2 dimensions at once.</span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Then 	you can keep slicing, however, by more dimensions, as many times as 	you like.</span></span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">While 	you can take actions straight from reports, some of the specific 	BI/app integration ideas we discussed are still futures. (E.g., 	analyzing spend at the time of expense report data entry or 	approval.)</span></span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Of 	course, Workday&#8217;s web services interface lets you export Workday 	data into 3rd-party tools. Indeed, if you want to integrate data 	from Workday and some other source(s), that&#8217;s your only choice.</span></span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday 	offers a clever metaphor to illustrate that your data may be more 	secure offsite than on – the bank vault. (I have no idea whether 	that&#8217;s a SaaS industry standard, but I hadn&#8217;t heard it before.) Of 	course, that metaphor does beg some issues specific to the remote 	data case, such as:</span></span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">When 	your data is on premises, you know whether the government has 	insisted on looking at it.</span></span></span></li>
<li><span style="font-size: small;">More than cash, data keeps traveling back and forth to 	the remote location, which creates at least a theoretical risk of 	interception.</span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;"><span style="font-weight: normal;">Workday 	says the toughest part of globalization is the issue of which 	personal data is or is not maintained. For example, in the US you&#8217;re 	not allowed to not ask a job applicant&#8217;s religion, but in the UK 	you&#8217;re not only permitted but indeed required to.</span></span></span></li>
</ul>
<p><em><strong>This post is part of a three-post series</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2010/08/22/workday-inc-company-overview/" >Workday Inc. company overview</a> (brief)</li>
<li><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >Workday Inc. technology overview</a> (detailed)</li>
<li>Workday Inc. CTO Stan Swete&#8217;s <a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >comments on database strategy</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/k0rxJZVA96A" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/22/workday-technology-stack/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/08/22/workday-technology-stack/</feedburner:origLink></item>
		<item>
		<title>Workday, Inc. company overview</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/DHoupvfYuQs/</link>
		<comments>http://www.dbms2.com/2010/08/22/workday-inc-company-overview/#comments</comments>
		<pubDate>Sun, 22 Aug 2010 10:20:02 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Pricing]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Workday]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2878</guid>
		<description><![CDATA[My main post on Workday&#8217;s technology got really long, so I decided to split out a company backgrounder separately. Here goes.
Workday, Inc. was founded by Dave Duffield and Aneel Bhusri, who&#8217;d previously worked together at PeopleSoft. It is generally the case that the companies Dave starts:  


Develop 	application software for large or fairly large enterprise [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">My main post on </span></span><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" ><span style="font-size: small;">Workday&#8217;s technolog</span></a><span style="font-size: small;"><span style="font-style: normal;"><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >y</a> got really long, so I decided to split out a company backgrounder separately. Here goes.</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Workday, Inc. was founded by Dave Duffield and Aneel Bhusri, who&#8217;d previously worked together at PeopleSoft. It is generally the case that the companies Dave starts:  <span id="more-2878"></span><br />
</span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Develop 	application software for large or fairly large enterprise customers.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Build 	those applications in/on their own platform technology, which is 	meant to be cutting-edge in its day. (For example, PeopleSoft was 	early in building an RDBMS-based client/server application suite, 	and did so with the help of a clever technology called PeopleTools <a href="../../../../../2008/04/13/scaledb-presents-the-revenge-of-the-pointer/">I 	nonetheless helped talk PeopleSoft out of further commercializing</a>.)</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Focus 	first on human resources software (Dave had another HR company 	before PeopleSoft).</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Move 	fairly early into non-profit/higher-education accounting (Dave had a 	company in that area before PeopleSoft, and PeopleSoft was fairly 	active in the area too).</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Emphasize a pleasant corporate culture.</span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">All these things seem true of Workday Inc., although the non-profit/higher-ed move is just underway now. Specifically: </span></span></p>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	was founded in 2005, starting with an asset buy of some platform 	software a key PeopleTools developer had been working on for years.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	has had multitenant SaaS offerings from the getgo. (And that&#8217;s all Workday does.)<br />
</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	has around 150 customers.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	sells mainly to multinational corporations, generally based in North 	America. Efforts in the UK are beginning to ramp up.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	has six core application modules, among which are:</span></span>
<ul>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	Human Capital Management (almost all the customers).</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	Payroll (a little under 50 customers). Workday is partnered with 	local providers for payroll in 20 countries, and is building its 	second inhouse version (Canadian) now.</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	Financial Management (a little under 20 customers, for what is far 	from a complete system).</span></span></li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday 	Benefits Network, providing connectivity to benefits providers 	(that&#8217;s the only Workday module that isn&#8217;t straight software).</span></span></li>
</ul>
</li>
<li><span style="font-size: small;"><span style="font-style: normal;">Workday, 	Inc. has around 500 employees, mainly in Pleasanton, CA. About 20 	are in Dublin, Ireland, courtesy of the acquisition of CapeClear. 	About 1/3 are in development.</span></span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Workday prices its services based on metrics for the overall client business, not per-Workday-user. (Actually, the metric is basically headcount, which makes sense given Workday&#8217;s application focus.)</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-size: small;"><span style="font-style: normal;">Some of these points are covered in more detail in a <a href="http://www.monash.com/uploads/Workday-August-2010.ppt" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">Workday Inc. slide deck</a>.<br />
</span></span></p>
<p><em><strong>This post is part of a three-post series</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2010/08/22/workday-inc-company-overview/" >Workday Inc. company overview</a> (brief)</li>
<li><a href="http://www.dbms2.com/2010/08/22/workday-technology-stack/" >Workday Inc. technology overview</a> (detailed)</li>
<li>Workday Inc. CTO Stan Swete&#8217;s <a href="http://www.dbms2.com/2010/08/22/workday-stan-swete-database-architecture/" >comments on database strategy</a></li>
</ul>
<p><em>Edit: Also, there&#8217;s a <a href="http://blogs.workday.com/Blog.html" onclick="javascript:pageTracker._trackPageview('/blogs.workday.com');">Workday blog</a> with only a few posts, which nonetheless seems to flesh out a few of the ideas in this post series.</em></p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/DHoupvfYuQs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/22/workday-inc-company-overview/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/08/22/workday-inc-company-overview/</feedburner:origLink></item>
		<item>
		<title>The substance of Pentaho’s Hadoop strategy</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/QcZCGvMUFgY/</link>
		<comments>http://www.dbms2.com/2010/08/21/the-substance-of-pentahos-hadoop-strategy/#comments</comments>
		<pubDate>Sat, 21 Aug 2010 06:40:29 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Pentaho]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2848</guid>
		<description><![CDATA[Pentaho has been talking about a Hadoop-related strategy. Unfortunately, in support of its Hadoop efforts, Pentaho has been &#8212; quite insistently &#8212; saying things that don&#8217;t make a lot of sense to people who know anything about Hadoop.
That said, I think I found four sensible points in Pentaho&#8217;s Hadoop strategy, namely:

If you use an ETL [...]]]></description>
			<content:encoded><![CDATA[<p>Pentaho has been talking about a Hadoop-related strategy. Unfortunately, in support of its Hadoop efforts, Pentaho has been &#8212; quite insistently &#8212; saying things that don&#8217;t make a lot of sense to people who know anything about Hadoop.</p>
<p>That said, I think I found four sensible points in Pentaho&#8217;s Hadoop strategy, namely:</p>
<ol>
<li>If you use an ETL tool like Pentaho&#8217;s to move things in and out of HDFS, you may be able to orchestrate two more steps in the ETL process than if you used Hadoop&#8217;s native orchestration tools.</li>
<li>A lot of what you want to do in MapReduce is things that can be graphically specified in an ETL tool like Pentaho&#8217;s. (That would include tokenization or regex.)</li>
<li>If you have some really lightweight BI requirements (ad hoc, reporting, or whatever) against HDFS data, you might be content to do it straight against HDFS, rather than moving the data into a real DBMS. If so, BI tools like Pentaho&#8217;s might be useful.</li>
<li>Somebody might want to use a screwy version of MapReduce, where by &#8220;screwy&#8221; I mean anything that isn&#8217;t <a href="http://www.dbms2.com/2010/06/30/cloudera-enterprise-hadoop-evolution/" >Cloudera Enterprise</a>, <a href="http://www.dbms2.com/2009/12/02/mapreduce-for-complex-analytics-webina/" >Aster Data SQL/MapReduce</a>, or some other implementation/distribution with a lot of supporting tools. In that case, they might need all the tools they can get.</li>
</ol>
<p>The first of those points is, in the grand scheme of things, pretty trivial.</p>
<p>The third one makes sense. While Hadoop&#8217;s Hive client means you could roll your own integration with your own favorite BI tool in any case, having somebody certify it for you themselves could be nice. So if Pentaho ships something that works before other vendors do, good on them. (Target date seems to be October.)</p>
<p>The fourth one is kind of sad.</p>
<p>But if there&#8217;s any shovel-meet-pony aspect to all this &#8212; or indeed a reason for writing this blog post &#8212; it would be the second point. If one understands data management, but is in the &#8220;Oh no! Hadoop wants me to PROGRAM!&#8221; crowd, then being able to specify one&#8217;s MapReduce might be a really nice alternative versus having to actually code it.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/QcZCGvMUFgY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/21/the-substance-of-pentahos-hadoop-strategy/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/08/21/the-substance-of-pentahos-hadoop-strategy/</feedburner:origLink></item>
		<item>
		<title>I’m collecting data points on NoSQL and HVSP adoption</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/qfXTQ5aPqxk/</link>
		<comments>http://www.dbms2.com/2010/08/18/nosql-hvsp-adoption/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 13:09:08 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Groovy Corporation]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Northscale]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[ScaleDB]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>
		<category><![CDATA[dbShards and CodeFutures]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2840</guid>
		<description><![CDATA[I was asked to do a magazine article on NoSQL, where by &#8220;NoSQL&#8221; is meant &#8220;whatever they talk about at NoSQL conferences.&#8221; By now the number of publications planning to run the article is up to 2, the deadline is next week and, crucially, it has been agreed that I may talk about HVSP in [...]]]></description>
			<content:encoded><![CDATA[<p>I was asked to do a magazine article on NoSQL, where by &#8220;NoSQL&#8221; is meant &#8220;whatever they talk about at NoSQL conferences.&#8221; By now the number of publications planning to run the article is up to 2, the deadline is next week and, crucially, it has been agreed that I may talk about <a href="http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/" >HVSP</a> in general, NoSQL and SQL alike.</p>
<p>It also is understood that, realistically, I can&#8217;t be expected to know and mention the very latest news for all the many products in the categories. Even so, I think this would be fine time to check just where NoSQL and HVSP adoption stand. Here is most of what I know, or links to same; it would be great if you guys would contribute additional data in the comment thread.</p>
<p>In the NoSQL area:  <span id="more-2840"></span></p>
<ul>
<li>Back in April, the VoltDB guys told me they thought Cassandra and HBase were the two NoSQL systems with the most momentum.</li>
<li>I know distressingly little about HBase adoption, but a source who may or may not wish to remain anonymous was kind enough to alert me that Twitter and StumbleUpon each have ~30 node deployments, for analytics and analytics/HVSP respectively.</li>
<li>I wrote in detail on <a href="http://www.dbms2.com/2010/07/06/riptano-and-cassandra-adoption/" >Cassandra adoption</a> last month. News since then includes:
<ul>
<li>Facebook is rumored to have dropped Cassandra completely.</li>
<li><a href="http://engineering.twitter.com/2010/07/cassandra-at-twitter-today.html" onclick="javascript:pageTracker._trackPageview('/engineering.twitter.com');">Twitter clarified that it may not be quite as lovestruck by Cassandra as before</a>, but they&#8217;re still very close friends.</li>
<li>It&#8217;s not obvious that the <a href="http://www.riptano.com/blog/cassandra-summit-recap" onclick="javascript:pageTracker._trackPageview('/www.riptano.com');">Cassandra Summit</a> unveiled a lot of new adoption stories.</li>
</ul>
</li>
<li>Northscale&#8217;s <a href="http://www.dbms2.com/2010/08/18/northscale-membase-roadmap/" >Membase</a> is still in its early days.  Zynga is bought in, however, as is something called NHN Korea. <em>(Edit: I subsequently saw NHN Korea on a prominent SEO expert&#8217;s list of the top half dozen or so search engines in the world. Who knew?)</em></li>
<li>Basho has listed a few <a href="http://www.basho.com/customers.html" onclick="javascript:pageTracker._trackPageview('/www.basho.com');">Riak customers</a>. If memory serves (I haven&#8217;t spoken with Basho for a while, and some of my notes are misplaced due to some computer sloppiness), Basho has a few dozen customers in total.</li>
<li>Mozilla has <a href="http://blog.mozilla.com/data/2010/08/16/benchmarking-riak-for-the-mozilla-test-pilot-project/" onclick="javascript:pageTracker._trackPageview('/blog.mozilla.com');">a 4 machine, 64 core Riak cluster</a> in production.</li>
<li><a href="http://highscalability.com/hypertable-new-bigtable-clone-runs-hdfs-or-kfs" onclick="javascript:pageTracker._trackPageview('/highscalability.com');">Hypertable</a> has a few users/project sponsors, Baidu being the biggest name among them.</li>
<li>I don&#8217;t really know how the MongoDB/10gen guys are doing. I think this is at least as much my fault as theirs. Anyhow, they seem to have <a href="http://www.10gen.com/news" onclick="javascript:pageTracker._trackPageview('/www.10gen.com');">links</a> to a couple of folks who have written about MongoDB usage.</li>
<li>NimbusDB is still in stealth mode. I&#8217;d be surprised if they had users  for a while yet, since in January they didn&#8217;t yet sound as if  development was very far underway. (Actually, I forget whether NimbusDB  is supposed to be SQL-based or not.)</li>
</ul>
<p>Among the SQL or SQL-friendly guys:</p>
<ul>
<li><a href="http://www.dbms2.com/2010/05/12/the-clustrix-story/" >Clustrix</a> says it has a few production users, some big-name, but is not disclosing them yet.</li>
<li><a href="http://www.dbms2.com/2010/07/28/dbshards/" >dbShards has around 6 customers</a>, including Facebook. (Facebook may outpace even Twitter and Zynga in using the most products mentioned in this post.)</li>
<li>As of May, <a href="http://www.dbms2.com/2010/05/25/voltdb-finally-launches/" >VoltDB</a> had one paying customer, plus 150 beta customers who weren&#8217;t in production yet.</li>
<li><a href="http://www.dbms2.com/2010/04/03/akiban-highlights/" >Akiban</a> says they&#8217;ll get me up to speed on Thursday. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li><a href="http://www.dbms2.com/2008/04/13/scaledb-presents-the-revenge-of-the-pointer/" >ScaleDB</a> seems to be pedaling along in perennial beta. Whether ScaleDB has any actual beta users is less clear. On the plus side, checking that out uncovered a pretty funny <a href="http://scaledb.blogspot.com/2010/04/scaledb-introduces-clustered-database.html" onclick="javascript:pageTracker._trackPageview('/scaledb.blogspot.com');">April Fool blog post</a>.</li>
<li><a href="http://www.dbms2.com/2009/07/30/groovy-corp-puts-out-a-ridiculous-press-release/" >Groovy Corporation</a> seems to have disappeared, or morphed into something called <a href="http://www.groovycorp.com/home.html" onclick="javascript:pageTracker._trackPageview('/www.groovycorp.com');">uCirrus</a>, or something like that.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/qfXTQ5aPqxk" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/18/nosql-hvsp-adoption/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/08/18/nosql-hvsp-adoption/</feedburner:origLink></item>
		<item>
		<title>Finally confirmed: Membase has a reasonable product roadmap</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/a3zqDZ8wU5M/</link>
		<comments>http://www.dbms2.com/2010/08/18/northscale-membase-roadmap/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 09:37:55 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Northscale]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[memcached]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2830</guid>
		<description><![CDATA[On my recent trip to California, neither I nor my clients at Northscale covered ourselves in meeting-arranging glory. Still, from the rushed 30 minute meeting we did wind up having, I finally came away feeling good about Membase&#8217;s product direction.
To review, Membase is a reasonably elastic persistent data store, sporting the memcached API, making memcached/Membase [...]]]></description>
			<content:encoded><![CDATA[<p>On my recent trip to California, neither I nor my clients at Northscale covered ourselves in meeting-arranging glory. Still, from the rushed 30 minute meeting we did wind up having, I finally came away feeling good about Membase&#8217;s product direction.</p>
<p>To review, Membase is a reasonably elastic persistent data store, sporting the memcached API, making memcached/Membase an attractive alternative to memcached/sharded MySQL. As of now, Membase is a pure key-value store.</p>
<p>Northscale defends pure key-value stores by arguing, in effect:  <span id="more-2830"></span></p>
<ul>
<li>You can do a lot with entity-attribute-value triples.</li>
<li>If your key looks like an entity-attribute concatenation, then  your entity-attribute-value triple can be transformed into a key-value  pair.</li>
</ul>
<p>Northscale has a point. Still, I think that in most use cases you&#8217;ll want a data model and/or data access methods that are at least a little richer than pure entity-attribute-value.</p>
<p>Fortunately, that&#8217;s the direction Northscale is taking Membase. I don&#8217;t get the impression that the details have been worked out yet, but the general idea is:</p>
<ul>
<li>Northscale is putting a publish-subscribe interface into Membase it calls &#8220;tap,&#8221; useful for replication, node rebalancing, etc.</li>
<li>Tap will also serve to connect Membase data to a Membase feature Northscale calls “Node Code,&#8221; which will be code that runs in a separate process on each Membase node.</li>
<li>Node Code will include things like:
<ul>
<li>Language run-times</li>
<li>Standard libraries for things like 	index-building</li>
</ul>
</li>
</ul>
<p>Will Membase Node Code be a close substitute for relational DBMS functionality, or even the <a href="http://www.dbms2.com/2010/07/06/cassandra-technical-overview/" >Cassandra</a> architecture? I doubt it, especially at first. But at least it will keep Membase developers from getting locked in to a very simple and restrictive data management paradigm.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/a3zqDZ8wU5M" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/18/northscale-membase-roadmap/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/08/18/northscale-membase-roadmap/</feedburner:origLink></item>
		<item>
		<title>DB2 workload management</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/l0PZue1otng/</link>
		<comments>http://www.dbms2.com/2010/08/18/ibm-db2-workload-management/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 08:47:09 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Workload management]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2819</guid>
		<description><![CDATA[DB2 has added a lot of workload management features in recent releases. So when we talked Tuesday afternoon, Tim Vincent and I didn&#8217;t bother going through every one. Even so, we covered some interesting subjects in the area of DB2 workload management, including:  

If your goal is to keep a certain 	class of queries from [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><a href="../2009/04/24/some-db2-highlights/">DB2 has added a lot of workload management features in recent releases</a>. So when we talked Tuesday afternoon, Tim Vincent and I didn&#8217;t bother going through every one. Even so, we covered some interesting subjects in the area of DB2 workload management, including:  <span id="more-2819"></span></p>
<ul>
<li>If your goal is to keep a certain 	class of queries from taking too many resources, Tim thinks a great 	way of doing that is to control how many of them are allowed to run 	concurrently.</li>
<li>By way of contrast, Tim is 	cautious about the common approach of just lowering a query&#8217;s 	priority. His concern is that a long-running query could linger even 	longer, creating a long-lasting bottleneck in, for example, <a href="http://www.dbms2.com/2010/08/18/more-on-temp-space-compression-and-random-io/" >temp 	space</a>.</li>
<li>When running over (I believe) 	Linux and AIX, DB2 workload management is integrated with operating 	system workload management. I.e., the same “service class” or 	“workload class” (at a guess, the former is the official term 	and the latter is the term that makes sense) of queries and 	associated processes gets the same treatment in both DB2 and the OS.</li>
<li>DB2&#8217;s workload management extends 	to buffer pools, to inhibit low-priority queries from evicting a 	higher-priority query&#8217;s data from cache.</li>
<li>Sometimes, workload management 	doesn&#8217;t throttle a query, but just decides to collect stats for 	future analysis. (This is on the eminently reasonably theory that 	the best stats to collect are the ones that are live when  	performance problems are actually occurring.)</li>
</ul>
<p style="margin-bottom: 0in;">Finally, Tim spoke of what I regard as the weirdest workload management requirement, one I also heard about from <a href="http://www.dbms2.com/2009/07/18/netezza-on-concurrency-and-workload-management/" >Netezza</a> <span style="font-style: normal;">(but didn&#8217;t explicitly mention) in</span> June. Sometimes, it seems, you simply don&#8217;t want queries to finish too fast. Why? Because if you give great performance when the machine is lightly loaded, then business users might expect that performance too when the machine is heavily loaded and you can&#8217;t deliver it. Apparently, in some environments it&#8217;s better to never deliver great query performance than it is to do so only inconsistently.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/l0PZue1otng" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/18/ibm-db2-workload-management/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/08/18/ibm-db2-workload-management/</feedburner:origLink></item>
		<item>
		<title>More on temp space, compression, and “random” I/O</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/VB9LgAY9i8w/</link>
		<comments>http://www.dbms2.com/2010/08/18/more-on-temp-space-compression-and-random-io/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 05:44:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2805</guid>
		<description><![CDATA[My PhD was in a probability-related area of mathematics (game theory), so I tend to squirm when something is described as &#8220;random&#8221; that clearly is not. That said, a comment by Shilpa Lawande on our recent Flash/temp space discussion suggests the following way of framing a key point:

You really, really want to have multiple data [...]]]></description>
			<content:encoded><![CDATA[<p>My PhD was in a probability-related area of mathematics (game theory), so I tend to squirm when something is described as &#8220;random&#8221; that clearly is not. That said, <a href="http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/#comment-181134" >a comment by Shilpa Lawande</a> on our recent <a href="http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/" >Flash/temp space discussion</a> suggests the following way of framing a key point:</p>
<ul>
<li>You really, really want to have multiple data streams coming out of temp space, as close to simultaneously as possible.</li>
<li>The storage performance characteristics of such a workload are more reminiscent of &#8220;random&#8221; than &#8220;sequential&#8221; I/O.</li>
</ul>
<p>If everybody else is cool with it too, I can live with that. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Meanwhile, I talked again with Tim Vincent of IBM this afternoon. Tim endorsed the temp space/Flash fit, but with a different emphasis, which upon review I find I don&#8217;t really understand. The idea is:</p>
<ul>
<li>Analytic DBMS processing generally stresses reads over writes.</li>
<li>Temp space is an exception &#8212; read and write use of temp space is pretty balanced. (You spool data out once, you read it back in once, and that&#8217;s the end of that; next time it will be overwritten.)</li>
</ul>
<p>My problem with that is: Flash typically has lower write than read IOPS (I/O per second), so being (relatively) write-intensive would, to a first approximation, seem if anything to disfavor a workload for Flash.</p>
<p>On the plus side, I was reminded of something I should have noted when I wrote about <a href="http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/" >DB2 compression</a> before:</p>
<p>Much like Vertica, <strong>DB2 operates on compressed data all the way through, including in temp space. </strong></p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/VB9LgAY9i8w" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/18/more-on-temp-space-compression-and-random-io/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/08/18/more-on-temp-space-compression-and-random-io/</feedburner:origLink></item>
		<item>
		<title>Vertica’s innovative architecture for Flash, plus more about temp space than you perhaps wanted to know</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/k9LS91AsWzY/</link>
		<comments>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/#comments</comments>
		<pubDate>Mon, 16 Aug 2010 08:07:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2788</guid>
		<description><![CDATA[Vertica is announcing:

Technology it already has 	released*, but has not published any reference architectures 	for
A 	Barney partnership**

In other words, Vertica has succumbed to the common delusion that it&#8217;s a good idea to put out half-baked press releases the week of TDWI conferences. But if we look past that kind of all-too-common nonsense, Vertica is highlighting [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Vertica is announcing:</p>
<ul>
<li>Technology it already has 	released*, but has not published any reference architectures 	for</li>
<li><span style="font-style: normal;">A 	<a href="http://www.strategicmessaging.com/barney-partnerships/2010/08/12/" onclick="javascript:pageTracker._trackPageview('/www.strategicmessaging.com');">Barney</a> partnership**</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">In other words, Vertica has succumbed to the common delusion that it&#8217;s a good idea to put out half-baked press releases the week of TDWI conferences. </span>But if we look past that kind of all-too-common nonsens<span style="font-weight: normal;">e, Vertica is highlighting an interesting technical story, about </span><strong>how the analytic DBMS industry can exploit solid-state memory technology.</strong></p>
<p style="margin-bottom: 0in;"><em>*Upgrades to <a href="../2009/08/04/flexstore-and-the-rest-of-vertica-35/">Vertica FlexStore</a> to handle Flash memory, actually released as part of <a href="../2010/02/22/vertica-4/">Vertica 4.0</a></em></p>
<p style="margin-bottom: 0in;"><em>** With Fusion I/O</em></p>
<p style="margin-bottom: 0in;">To set the context, let&#8217;s recall a few points I&#8217;ve noted in the past:</p>
<ul>
<li><a href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Solid-state 	memory&#8217;s price/throughput tradeoffs obviously make it the future of 	database storage</a>.</li>
<li><a href="../2010/06/25/flash-is-coming-well/">The 	Flash future is coming soon</a>, in part because Flash&#8217;s propensity 	to wear out is overstated. This is especially true in the case of 	modern analytic DBMS, which tend to write to blocks all at once, and 	most particularly the case for append-only systems such as Vertica.</li>
<li><a href="../2010/08/12/teradata-future-product-strategy/">Being 	able to intelligently split databases among various cost tiers of 	storage – e.g. Flash and disk – makes a whole lot of sense</a>.</li>
</ul>
<p style="margin-bottom: 0in;">Taken together, those points tell us:</p>
<p style="margin-bottom: 0in;"><strong>For optimal price/performance, analytic DBMS should support databases that run part on Flash, part on disk.</strong></p>
<p style="margin-bottom: 0in;">While all this is a future for some other analytic DBMS vendors, Vertica is shipping it today.* What&#8217;s more, three aspects of Vertica&#8217;s architecture make it particularly well-suited for hybrid Flash/disk storage, in each case for a similar reason – you can get most of the performance benefit of all-Flash for a relatively low actual investment in Flash chips:  <span id="more-2788"></span></p>
<ul>
<li><strong>Vertica lets you split tables 	by column, </strong><span style="font-weight: normal;">and Vertica 	FlexStore is versatile enough to let you put only the most-used 	columns in Flash. (Vertica offers a figure that 85% of usage calls 	on only 15% of columns, but I don&#8217;t know how rigorously grounded 	those numbers are.)</span></li>
<li>To the extent that Vertica data is<span style="font-weight: normal;"> <a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">more </a></span><a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">compressed</a> than many of Vertica&#8217;s competitors&#8217; (which it probably is, debates 	over the magnitude of Vertica&#8217;s advantage notwithstanding), the 	total storage-hardware cost of sticking stuff in Flash is less when 	you use Vertica than with other systems.</li>
<li>Vertica has <span style="font-weight: normal;">relatively 	less need for </span><strong>temp space</strong> than some other systems. 	(Vertica uses figures of &lt;20% of total storage, vs. 30%+ for some 	other systems.) If you want to use Flash for temp space, so as to 	accelerate your toughest queries, that can save you some cash …</li>
<li>… and by the way, <strong>temp space 	is an especially good use of Flash, </strong>because <strong>temp space is 	accessed in a less sequential manner than data storage is.</strong></li>
</ul>
<p style="margin-bottom: 0in;">The least obvious of those points are about temp space; I only understood the particulars when Vertica development chief Shilpa Lawande explained them to me Thursday.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em>* At least in theory; customer adoption may be a different matter.</em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">But before drilling down on temp space, let me first note that there&#8217;s one offsetting factor to all those “We need somewhat less Flash than the other guys” Vertica advantages. Like all serious databases, a Vertica installation keeps two or more copies of all data, to that there&#8217;s no storage single point of failure. In a flexible system like Vertica, you can put one copy on Flash and one on disk. But if you do that in Vertica, you forgo fully exploiting one possible benefit of Vertica&#8217;s architecture – the ability to store different copies of a column in different orders, which are beneficial for accelerating different groups of queries.*</p>
<p style="margin-bottom: 0in;"><em>*More precisely, you don&#8217;t get the full benefits of Flash acceleration for every query touching those columns.</em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">OK. Back to temp space. There are four kinds of things you can put in storage if you&#8217;re running a database management system:</p>
<ul>
<li>The <strong>software</strong> itself.</li>
<li><span style="font-weight: normal;">Persistent </span><strong>data. </strong><span style="font-weight: normal;">(I.e., tables, 	if the DBMS you&#8217;re running is relational.)</span></li>
<li><strong>Metadata,</strong> especially the 	kind that lets you find data &#8211;<strong> indexes,</strong> zone maps, catalogs, 	etc.</li>
<li><strong>Temporary data constructs</strong> built as part of, say, a s<span style="font-weight: normal;">ort-merge 	join. These, by definition, are what populate temp space.</span></li>
</ul>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">Just to be clear, those constructs are NOT temporary tables of the sort created by, say, Microstrategy; such tables are handled like any other data. Rather, they are ephemeral creat<span style="font-weight: normal;">ions and, so far as I can tell, not tables at all. </span></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">Vertica offered two theories as to why its DBMS requires less temp space than competitors do:</p>
<ul>
<li>To the extent data is decompressed 	before being operated on in memory by the DBMS, that decompression 	would of course also apply to temp space as well. Vertica prides 	itself on <strong>keeping data compressed</strong> all the way through, and 	seems to get away with smaller temp space allocations as a benefit.</li>
<li>Since Vertica can store columns in 	expedient sort orders, it does less sorting overall, and sorting is 	a big use of temp space.</li>
</ul>
<p style="margin-bottom: 0in;">Obviously, no matter which DBMS you use, the amount of temp space you need is surely workload-dependent. Even so, Vertica&#8217;s claim to something of an advantage seems legit.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em>Truth be told, I&#8217;m not convinced the savings involved are great enough to </em>matter<em> a whole lot – but it&#8217;s a fun subject to think through. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">And finally: One of my biggest surprises since starting to look at analytic-DBMS-on-Flash has been the centrality of temp space. Talking to Vertica Thursday, I finally uncovered a key reason why: <strong>Temp space tends to be accessed via multiple streams of data at once.</strong> I&#8217;m still struggling with WHY that is true, with two reasons suggested being:</p>
<ul>
<li>Temp space can be accessed by 	multiple operations at once. (But isn&#8217;t that also true of the rest 	of storage?)</li>
<li>Merge sorts, a common use of temp 	space, read multiple streams of data. (Couldn&#8217;t you tweak your 	software to make that not be true?)</li>
</ul>
<p style="margin-bottom: 0in;">But if we grant that temp space naturally is accessed in multiple places at once – well, that&#8217;s a lot like random I/O, and <a href="../2005/11/13/breaking-the-disk-speed-barrier/">if you&#8217;re doing a lot of random reads, you&#8217;d love to use something other than spinning disk</a>.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/k9LS91AsWzY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/</feedburner:origLink></item>
	</channel>
</rss><!-- Dynamic page generated in 0.423 seconds. --><!-- Cached page generated by WP-Super-Cache on 2010-09-02 05:06:52 -->
