<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>DBMS2 -- DataBase Management System Services</title>
	
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Fri, 19 Mar 2010 15:49:58 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/dbms2/feed" /><feedburner:info uri="dbms2/feed" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Some business trends in the data warehouse market</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/hS_WdeDqqTE/</link>
		<comments>http://www.dbms2.com/2010/03/19/some-business-trends-in-the-data-warehouse-market/#comments</comments>
		<pubDate>Fri, 19 Mar 2010 13:48:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1741</guid>
		<description><![CDATA[In recent conversations with various analytic DBMS vendors, a fairly consistent picture has emerged.

Business is strong. Multiple vendors claim to be going gangbusters, with the happy sounds coming out of Vertica and Infobright being echoed by several competitors. Hearsay suggests 	some other companies in related businesses are doing well too. 	Depending on who you talk [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">In recent conversations with various analytic DBMS vendors, a fairly consistent picture has emerged.</p>
<ul>
<li><strong>Business is strong.</strong> Multiple vendors claim to be going gangbusters, with the happy sounds coming out of <a href="../2010/03/19/vertica-update-4/">Vertica</a> and <a href="../2010/03/19/infobright-blog-update/">Infobright</a> being echoed by several competitors. Hearsay suggests 	some other companies in related businesses are doing well too. 	Depending on who you talk to, the business pickup dates back to Q4, give or 	take a quarter.</li>
<li><strong>Oracle Exadata has become a 	formidable competitor,</strong><span style="font-weight: normal;"> on the 	strength of Exadata 2.</span> Exadata 2&#8217;s positioning and perception 	among Oracle users seem to be pretty much in line with <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >what 	Oracle portrayed to me</a>.</li>
<li><strong>Teradata is portrayed as a weak 	competitor.</strong> Competitors don&#8217;t worry about Teradata nearly as 	much as they do about Oracle. That said, I suspect a bit of wishful 	thinking; Teradata is clearly still getting a lot of business the 	other vendors would dearly love to have.</li>
<li><strong>HP Neoview is reeling.</strong> (Almost) nobody sees Neoview competitively. The Walmart Neoview 	installation is said to have stayed small at best. JP Morgan is said 	to have completely thrown Neoview out (and a bunch of HP engineers 	with it).</li>
<li><strong>(Almost) nobody mentions 	competing against DB2</strong> either. This continues to baffle me.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/hS_WdeDqqTE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/19/some-business-trends-in-the-data-warehouse-market/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/03/19/some-business-trends-in-the-data-warehouse-market/</feedburner:origLink></item>
		<item>
		<title>Vertica update</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/WjSsbXMuNBE/</link>
		<comments>http://www.dbms2.com/2010/03/19/vertica-update-4/#comments</comments>
		<pubDate>Fri, 19 Mar 2010 13:42:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1738</guid>
		<description><![CDATA[I caught up with Jerry Held (Chairman) and Dave Menninger (VP Marketing) of Vertica for a chat yesterday. The immediate reason for the call was that a competitor had tipped me off to the departure of Vertica CEO Ralph Breslauer, which of course raises a host of questions.  Highlights of the call included:

Vertica had [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I caught up with Jerry Held (Chairman) and Dave Menninger (VP Marketing) of Vertica for a chat yesterday. The immediate reason for the call was that a competitor had tipped me off to the departure of Vertica CEO Ralph Breslauer, which of course raises a host of questions.  Highlights of the call included:</p>
<ul>
<li>Vertica had a “killer” Q4 and 	is doing very well in Q1 again.</li>
<li>Vertica burned hardly any cash 	last year; i.e., it was close to cash-flow neutral in 2009.</li>
<li>Vertica is hiring aggressively, 	e.g., in sales.</li>
<li>Vertica is well down the path with 	several CEO candidates who Jerry regards as outstanding. He is 	hopeful there will be a new CEO in April. (But I bet that would be 	late April, given what Jerry mentioned about his own travel plans.)</li>
<li>Absent a full-time CEO, Jerry and 	Andy Palmer are spending a lot more time with Vertica.</li>
<li>One Vertica customer is 	approaching a petabyte of user data. The last time Vertica had 	checked, that customer had been more in the ¼ petabyte range.</li>
<li>Other multi-hundred terabyte 	Vertica databases were mentioned, including one where Vertica claims 	to have beaten Teradata and perhaps other competitors in a 	head-to-head competition (it sounds like that one&#8217;s too recent to be 	deployed yet).</li>
<li>Vertica se<span style="font-style: normal;">es 	Aster and Greenplum competitively more often than it sees ParAccel.</span></li>
<li>Vertica sees 	Sybase IQ competitively a lot in financial services (in new-name 	accounts for Sybase as well as where some kind of Sybase DBMS is an 	incumbent), and more occasionally in other sectors.</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">NDA parts of the conversation also gave me the impression that Vertica is moving forward just as eagerly as it&#8217;s peers. I.e., I didn&#8217;t uncover any reason to think that Ralph&#8217;s departure is a sign of trouble, of the company being shopped, etc.<span id="more-1738"></span></p>
<p style="margin-bottom: 0in; font-style: normal;">On the other hand, I didn&#8217;t uncover any other clear reason for Ralph&#8217;s departure either. The party line is that Ralph left for “personal reasons”.  It was his decision to leave. He did a great job while at Vertica.  Basically, Vertica is saying that, even though all was going swimmingly, Ralph just up and quit, leaving some very valuable unvested stock options on the table at what had been his first CEO gig ever.</p>
<p style="margin-bottom: 0in; font-style: normal;">Contacting Ralph didn&#8217;t add any immediate insight. He responded quickly via a Twitter DM, but was in the airport for a trip to his home country of South Africa, and deferred discussion until after his return.</p>
<p style="margin-bottom: 0in; font-style: normal;">Incidentally, <a href="http://www.dbms2.com/2010/03/19/infobright-blog-update/" >Infobright</a> and ParAccel have both also had recent CEO turnover. Stated reasons in each case were of the “Right person to lead the next stage of the company&#8217;s growth” variety.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/WjSsbXMuNBE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/19/vertica-update-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/03/19/vertica-update-4/</feedburner:origLink></item>
		<item>
		<title>Infobright blog update</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/O9i1hUViz3k/</link>
		<comments>http://www.dbms2.com/2010/03/19/infobright-blog-update/#comments</comments>
		<pubDate>Fri, 19 Mar 2010 13:42:01 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Scientific research]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1733</guid>
		<description><![CDATA[I often offer that, if a company puts up a sufficiently good blog post, I&#8217;ll link to it. Well, I just noticed that Infobright CEO Mark Burton (somewhere along the way he seems to have dropped the “interim”) put up an excellent post last month.
Highlights on the market share/sector side include:

Infobright’s customer base grew 500% [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I often offer that, if a company puts up a sufficiently good blog post, I&#8217;ll link to it. Well, I just noticed that Infobright CEO Mark Burton <span style="text-decoration: line-through;">(somewhere along the way he seems to have dropped the “interim”)</span> put up <a href="http://www.infobright.com/Blog/Entry/infobright_strategy_and_plans" onclick="javascript:pageTracker._trackPageview('/www.infobright.com');">an excellent post</a> last month.</p>
<p style="margin-bottom: 0in;">Highlights on the market share/sector side include:<span id="more-1733"></span></p>
<ul>
<li>Infobright’s customer base grew 500% over the past year, to 	120 paying customers.</li>
<li>This included end users (60%), as well as ISVs and SaaS 	providers (40%) who embed Infobright&#8217;s DBMS in their application. It 	is great to see both segments taking advantage of our technology.</li>
<li>During the same period, Infobright&#8217;s open source software was 	downloaded 35,000 times.</li>
<li>The end user applications were heavily clustered around web 	and online analytics tracking, with a focus on understanding 	customer behavior on the web.</li>
<li>Infobright also continues to see the growth of 	application-specific data marts.</li>
<li>There is also continued interest and growth in using 	Infobright technology to analyze IT logs and telecom CDR (Call 	Detail Record) data, to identify fraud or security issues, to 	understand and improve network performance, and other purposes.</li>
</ul>
<p>Product highlights include:</p>
<ul>
<li>Infobright be much more transparent in 2010 about its plans.</li>
<li>Infobright will start posting and commenting on future 	releases and themes in March of this year. (However, they haven&#8217;t 	run much of that by me yet, and we&#8217;re past the middle of March.)</li>
<li>Infobright expects to drop 3-4 interim releases for every 	major release, with at least two major releases in 2010.</li>
<li>Some of Infobright&#8217;s major improvements this year will be:
<ul>
<li>Continued SMP performance improvements “without the need 	for complex hardware configurations or administrative effort”.</li>
<li>Extending the “hit rate” of the Knowledge Grid, which is 	central to Infobright&#8217;s performance story.</li>
<li>Better international support with UTF-8 extensions.</li>
</ul>
</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/O9i1hUViz3k" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/19/infobright-blog-update/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/03/19/infobright-blog-update/</feedburner:origLink></item>
		<item>
		<title>XtremeData update</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/vEjOqa6G0Sw/</link>
		<comments>http://www.dbms2.com/2010/03/18/xtremedata-update/#comments</comments>
		<pubDate>Thu, 18 Mar 2010 05:17:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[XtremeData]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1722</guid>
		<description><![CDATA[I talked with Geno Valente of XtremeData tonight. Highlights included:

XtremeData still hasn&#8217;t sold any 	dbX stuff (they&#8217;ve had a side business in generic 	FPGA-based boards paying the bills for years). Well, there may 	have been some paid POCs (proofs of concept) or something, but real 	sales haven&#8217;t come through yet.
XtremeData does have three 	prospects who [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I talked with Geno Valente of XtremeData tonight. Highlights included:</p>
<ul>
<li>XtremeData still hasn&#8217;t sold any 	dbX stuff (they&#8217;ve had a side business in <a href="../2009/06/29/xtreme-data-readies-a-different-kind-of-fpga-based-data-warehouse-appliance/">generic 	FPGA-based boards</a> paying the bills for years). Well, there may 	have been some paid POCs (proofs of concept) or something, but real 	sales haven&#8217;t come through yet.</li>
<li>XtremeData does have three 	prospects who have said “Yes”, and expects one order to come 	through this month.</li>
<li>XtremeData continues to believe it 	shines when:
<ul>
<li>Data models are complex</li>
<li>In particular, there are complex 	joins</li>
<li>In particular, two large tables 	have to be joined with each other, under circumstances where no 	product can avoid doing vast data redistribution</li>
</ul>
</li>
<li>XtremeData insists that all the 	nice things Bill Inmon – including in webinars &#8212; has said about 	it has not been for pay or other similar business compensation. 	<a href="http://www.monashreport.com/2006/02/13/everybody-gets-paid-or-would-like-to/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">That&#8217;s 	quite unusual</a>.</li>
<li>XtremeData is coming out with a 	new product, codenamed the Personal Data Warehouse (PDW), which:
<ul>
<li>Is ready to go into beta test</li>
<li>Should be launched in a month and 	a half or so</li>
<li>Will have a different name when it 	is launched</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;">Naming aside,<span id="more-1722"></span></p>
<ul>
<li>The XtremeData PDW consists of 	XtremeData software running on a <a href="http://cray.com/Products/CX/Systems.aspx" onclick="javascript:pageTracker._trackPageview('/cray.com');">Cray 	CX1 box</a>.</li>
<li>Thus, the XtremeData PDW will plug 	into a 20 amp wall power socket. It consumes 1600 watts.</li>
<li>The XtremeData PDW also inherits 	the Cray CX1&#8217;s noise cancellation feature.</li>
<li>Bottom line on the form factor: 	<strong>The XtremeData PDW is meant to be stuck in the corner of a 	business analyst&#8217;s office, not a computer room.</strong></li>
<li>The XtremeData PDW will have 16 1 	TB disks (going up in size later), for 5 TB of uncompressed user 	data.</li>
<li>Pricing isn&#8217;t finalized for the 	XtremeData PDW, but it will be around XtremeData&#8217;s usual figure &#8212; 	$20K/TB of uncompressed user data.</li>
<li>XtremeData hasn&#8217;t “released” 	compression yet, but it&#8217;s “ready to go.”</li>
<li>The XtremeData PDW will not 	include FPGAs, <a href="../2009/07/27/xtremedata-announces-its-dbx-data-warehouse-appliance/">unlike 	other XtremeData dbX appliances</a>. It will just run the XtremeData 	dbX software on 8 Nehalem chips.</li>
<li>XtremeData calls this a “3-node” 	machine. I didn&#8217;t bother asking why it wasn&#8217;t 4-node. (Perhaps 	there&#8217;s a head node of some kind that properly isn&#8217;t counted.)</li>
</ul>
<p style="margin-bottom: 0in;">Some comparative notes:</p>
<ul>
<li>A <strong><a href="http://www.netezza.com/documents/skimmer_ds.pdf" onclick="javascript:pageTracker._trackPageview('/www.netezza.com');">Netezza 	Skimmer</a> has similar size and price</strong> to the XtremeData PDW, seems to draw less 	power, has less uncompressed user data capacity (but already has 	compression), is also in essence a three-node system (I think), and 	of course has a lot of software connectivity. If XtremeData can 	match Netezza&#8217;s compression, the XtremeData PDW will have a 2X or so 	price/TB advantage over Netezza Skimmer – but Netezza&#8217;s 	compression is of course a moving target. I don&#8217;t know how happy Skimmer is outside a computer room.</li>
<li><a href="http://www.kickfire.com/Products/Data-sheet" onclick="javascript:pageTracker._trackPageview('/www.kickfire.com');">Kickfire</a> manages similar amounts of data on a smaller box (5 rack units vs. 	7), drawing less power (600 watts vs.1600), also with a lot of BI 	and ETL tool connectivity.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/vEjOqa6G0Sw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/18/xtremedata-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/03/18/xtremedata-update/</feedburner:origLink></item>
		<item>
		<title>Memcached-based company NorthScale launches</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/g8QfJDGBPo4/</link>
		<comments>http://www.dbms2.com/2010/03/16/memcached-northscale-launc/#comments</comments>
		<pubDate>Tue, 16 Mar 2010 17:52:48 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Parallelization]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1717</guid>
		<description><![CDATA[NorthScale, a start-up based around memcached, has just launched, two weeks after the Todd Hoff&#8217;s post arguing the MySQL/memcached combo is passe&#8217;. NorthScale wouldn&#8217;t necessarily argue with Todd, arguing that what you really should use instead is NorthScale&#8217;s combo of memcached and MemBase, a memcached-like DBMS &#8230;
&#8230; or something like that. I don&#8217;t intend to [...]]]></description>
			<content:encoded><![CDATA[<p>NorthScale, a start-up based around memcached, has just launched, two weeks after the Todd Hoff&#8217;s post arguing <a href="http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/" >the MySQL/memcached combo is passe&#8217;</a>. NorthScale wouldn&#8217;t necessarily argue with Todd, arguing that what you really should use instead is NorthScale&#8217;s combo of memcached and MemBase, a memcached-like DBMS &#8230;</p>
<p>&#8230; or something like that. I don&#8217;t intend to write seriously about NorthScale until I have a better idea of what MemBase is.</p>
<p>In the mean time,</p>
<ul>
<li>VentureBeat put up a solid post on <a href="http://deals.venturebeat.com/2010/03/16/northscale-zynga-memcached/" onclick="javascript:pageTracker._trackPageview('/deals.venturebeat.com');">NorthScale&#8217;s company history</a> and so on</li>
<li>Om Malik bought into <a href="http://gigaom.com/2010/03/16/northscale/" onclick="javascript:pageTracker._trackPageview('/gigaom.com');">the NorthScale memcached pitch</a></li>
<li>TechCrunch has <a href="http://techcrunch.com/2010/03/16/northscales-data-management-technology-attracts-zynga-and-others/" onclick="javascript:pageTracker._trackPageview('/techcrunch.com');">a low-quality post about NorthScale</a> (although it wasn&#8217;t as error-riddled as the same author&#8217;s post about nStein, which<a href="http://intelligent-enterprise.informationweek.com/blog/archives/2010/02/open_text_buyin.html;jsessionid=T51GQFI1CCPL1QE1GHOSKHWATMY32JVN" onclick="javascript:pageTracker._trackPageview('/intelligent-enterprise.informationweek.com');"> Seth Grimes properly blasted</a>)</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/g8QfJDGBPo4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/16/memcached-northscale-launc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/03/16/memcached-northscale-launc/</feedburner:origLink></item>
		<item>
		<title>Toward a NoSQL taxonomy</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/2vTp7yIXOgk/</link>
		<comments>http://www.dbms2.com/2010/03/14/nosql-taxonomy/#comments</comments>
		<pubDate>Sun, 14 Mar 2010 23:24:45 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Structured documents]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1708</guid>
		<description><![CDATA[I talked Friday with Dwight Merriman, founder of 10gen (the MongoDB company). He more or less convinced me of his definition of NoSQL systems, which in my adaptation goes:
NoSQL = HVSP (High Volume Simple Processing) without joins or explicit transactions
Within that realm, Dwight offered a two-part taxonomy of NoSQL systems, according to their data model [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I talked Friday with Dwight Merriman, founder of 10gen (the MongoDB company). He more or less convinced me of his definition of NoSQL systems, which in my adaptation goes:</p>
<p style="margin-bottom: 0in;"><strong>NoSQL = <a href="http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/" >HVSP (High Volume Simple Processing)</a> without joins or explicit transactions</strong></p>
<p style="margin-bottom: 0in;">Within that realm, Dwight offered a two-part taxonomy of NoSQL systems, according to their data model and replication/sharding strategy. I&#8217;d be happier, however, with at least three parts to the taxonomy:</p>
<ul>
<li>How data looks logically on a 	single node</li>
<li>How data is stored physically on a 	single node</li>
<li>How data is distributed, 	replicated, and reconciled across multiple nodes, and whether 	applications have to be aware of how the data is partitioned among 	nodes/shards.<span id="more-1708"></span></li>
</ul>
<p style="margin-bottom: 0in;">After talking with Dwight, and also with Cassandra project chair Jonathan Ellis, I feel I&#8217;m doing decently in understanding the first of those three areas. But there&#8217;s a long way yet to go on the other two.</p>
<p style="margin-bottom: 0in;">In Dwight&#8217;s opinion, as I understand it, NoSQL data models come in four general kinds.</p>
<ul>
<li><em><strong>Key-value stores,</strong></em><em> more or less pure.</em> I.e., they store keys+BLOBs (Binary Large 	OBjects), except that the “Large” part of “BLOB” may not 	come into play.</li>
<li><em><strong>Table-oriented,</strong></em><em> more or less. </em>The major examples here are Google&#8217;s BigTable, and 	Cassandra.</li>
<li><em><strong>Document-oriented,</strong></em><em> where a “document” is more like XML than free text. </em>MongoDB 	and CouchDB are the big examples here.</li>
<li><strong><em>Graph-oriented.</em> </strong><span style="font-weight: normal;">To 	date, this is the smallest area of the four. I&#8217;m reserving judgment 	as to whether I agree it&#8217;s properly included in HVSP and NoSQL.</span></li>
</ul>
<p style="margin-bottom: 0in;">As Dwight sees it, JSON (JavaScript Object Notation) is the emerging markup standard for the document-oriented data models, and to some extent the BLOB part of key-value models as well. Reasons seem to include:</p>
<ul>
<li>JSON is something web developers 	are likely to know anyway.</li>
<li>JSON, unlike XML, is schema-less. 	In the NoSQL world, that&#8217;s perceived as a good thing.</li>
<li>Perhaps for both these reasons, 	JSON is perceived as easier to use than XML.</li>
</ul>
<p style="margin-bottom: 0in;">Except as noted, I&#8217;m not aware of anything that solidly contradicts the above.</p>
<p style="margin-bottom: 0in;">Dwight went on to say that there are two main NoSQL replication/sharding models, in line with the seminal papers to which I <a href="http://www.dbms2.com/2010/03/12/some-nosql-links/" >previously linked</a>:</p>
<ul>
<li><em>Based on or resembling </em><em><strong>Dynamo.</strong></em> The core idea here is accepting <strong>eventual consistency</strong> among 	nodes as being good enough, even if that means you sometimes read 	dirty data. The benefit is that <strong>you never are blocked from 	writing.</strong> By way of contrast, systems that enforce true 	inter-node consistency (think of a two-phase commit) can shut you 	down from writing if consistency guarantees aren&#8217;t being confirmed 	in a timely manner. Thus, in a Dynamo-like scheme you write data to 	multiple nodes, via <strong>consistent hashing;</strong> then when the time 	comes you read one or more nodes, and hope that what you&#8217;re getting 	back is a correct result.</li>
<li><em>Based on or resembling </em><em><strong>BigTable.</strong></em> In this model you&#8217;re trying to keep the 	nodes fully consistent in the usual way, e.g. by synchronous 	replication. Indeed, what&#8217;s being kept consistent is both data 	itself, and metadata about the data&#8217;s location. Details surely vary 	a lot from implementation to implementation.</li>
</ul>
<p style="margin-bottom: 0in;">I&#8217;m fuzzier on this stuff than on the data models, because to date nobody has ever explained to me how an actual live system (MongoDB, Cassandra, whatever) implements its replication strategy. Also, while I think that in both these models applications are allowed to be ignorant of the replication/sharding strategy, I&#8217;m not as sure of that as I&#8217;d like to be.</p>
<p style="margin-bottom: 0in;">If we stop here, we already have something useful. MongoDB has a document data model, and is in the BigTable-like replication camp, at least at first. Cassandra has a table-like data model, and is on the Dynamo-like eventual consistency side. But to say those are the only differences that matter would be like saying that all shared-disk RDBMS (e.g., Oracle and Sybase IQ) are essentially alike. That, of course, would be nonsense.</p>
<p style="margin-bottom: 0in;">So a third dimension needed in this taxonomy is how the systems actually bang data on and off of disk (or silicon, as the case may be). I don&#8217;t yet have an overview of that. I know something of how Cassandra does it, and will write about same in a future post, but that&#8217;s about it. So please stay tuned.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/2vTp7yIXOgk" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/14/nosql-taxonomy/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/03/14/nosql-taxonomy/</feedburner:origLink></item>
		<item>
		<title>The Naming of the Foo</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/CByUwq0gWq4/</link>
		<comments>http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/#comments</comments>
		<pubDate>Sat, 13 Mar 2010 22:47:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Mark Logic]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1703</guid>
		<description><![CDATA[Let&#8217;s start from some reasonable premises.

No technology category name is 	ever perfect.
It&#8217;s particularly hard to describe 	NoSQL (Not Only SQL) accurately, given the basic confusion as to 	what NoSQL is all about.
That said, it 	seems pretty clear that NoSQL is about making big websites (and 	perhaps other cloud-like installations) run and scale.
Dwight Merriman (founder/CEO of [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s start from some reasonable premises.<span id="more-1703"></span></p>
<ul>
<li><a href="http://www.strategicmessaging.com/monashs-first-law-of-commercial-semantics-explained/2009/01/09/" onclick="javascript:pageTracker._trackPageview('/www.strategicmessaging.com');">No technology category name is 	ever perfect</a>.</li>
<li>It&#8217;s particularly hard to describe 	NoSQL (Not Only SQL) accurately, given <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >the basic confusion as to 	what NoSQL is all about</a>.</li>
<li>That said, it 	seems pretty clear that NoSQL is about making big websites (and 	perhaps other cloud-like installations) run and scale.</li>
<li>Dwight Merriman (founder/CEO of 	MongoDB vendor 10gen) is heading in the right direction when he says 	that the unifying ideas of NoSQL are that you do away with 	transactions and joins. But if he&#8217;s ever said something like “NoSQL 	is Foo without joins and transactions,” I don&#8217;t know what Foo is.</li>
<li><span style="font-style: normal;">Actually, 	I do know what Foo is – Foo is what happens when lots of people 	want to get small amounts each of information in or out of a 	database at the same time. I just don&#8217;t know what Foo is called.</span></li>
<li>Obviously, Foo is a lot like OLTP 	(OnLine Transaction Processing). However, it would be pretty silly 	for Foo to actually be OLTP, given that one of the core points of 	NoSQL is that you don&#8217;t have transactions.</li>
<li>It not just the “T” part of 	OLTP that&#8217;s fried.  Calling something “OnLine” only makes sense 	as long as offline is an option, and offline transaction processing 	has been obsolete for a very long time.*</li>
</ul>
<p style="margin-bottom: 0in;"><em>*Sure, if you strain you can talk yourself into exceptions. But the point stands.</em></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">So we need a name for Foo, where Foo is what happens when</span><span style="font-style: normal;"><strong> lots of people want to get small amounts each of information in or out of a database at the same time.</strong></span><span style="font-style: normal;"> Thus, three major subcategories of more-or-less disk-based Foo are:</span></p>
<ul>
<li><span style="font-style: normal;">No-compromises 	ACID-compliant relational OLTP</span></li>
<li><span style="font-style: normal;">Sharded 	MySQL</span></li>
<li>NoSQL</li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">There may be some more purely memory-centric versions too, but let&#8217;s put those aside for the moment. </span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">Absent a better idea, I can squeeze Foo into yet another four-letter acronym:</span></p>
<p style="margin-bottom: 0in;"><strong><span style="font-style: normal;">HVSP (High-Volume Simple Processing)</span></strong></p>
<p style="margin-bottom: 0in; font-style: normal;">That&#8217;s as imperfect as any other category name, and an awkward mouthful to boot. So I&#8217;d love to hear a better one; if you have such, please share it!  In the mean time, I think “HVSP” has merit because:</p>
<ul>
<li><span style="font-style: normal;">The 	“Processing” part should be noncontroversial.</span></li>
<li>“<span style="font-style: normal;">High-Volume” 	is inherent to the challenge. If RDBMS scale well enough for your 	use case, using something less powerful is probably silly.*  	Similarly, while Oracle shines at high-volume OLTP workloads, there 	are many cheaper DBMS that do a fine job of OLTP at lower volumes.</span></li>
<li>“<span style="font-style: normal;">Simple” 	is the core principle of NoSQL systems, which drop joins and 	transactions as being too much foofarah.  That only makes sense at 	all under the assumption that you have bone-simple queries and 	updates, so that programming around the lack of joins and 	transactions isn&#8217;t all that much of a burden.</span></li>
<li><span style="font-style: normal;">Something 	similar is true of sharded MySQL.</span></li>
<li><span style="font-style: normal;">Less 	obviously, “simple” is a core principle of relational OLTP as 	well. The point of the relational model is to cap the complexity of 	data operations, or more precisely to hide that complexity from 	programmers.</span></li>
<li><span style="font-style: normal;">And 	overloading the word “simple” a bit, it&#8217;s fair to say that if 	you&#8217;re reading or writing one record at a time, you&#8217;re doing 	something relatively simple, at least as opposed to what you do in 	analytic processing. The OLTP vs. OLAP distinction is preserved in 	this name change.</span></li>
<li><span style="font-style: normal;">The whole thing matches my definition above, namely &#8220;what happens when lots of people want to get small amounts each of information in or out of a database at the same time.&#8221;</span></li>
</ul>
<p style="margin-bottom: 0in;"><em>*Assuming, of course, that rows-and-tables are a good metaphor for your data structure in the first place.</em></p>
<p style="margin-bottom: 0in; font-style: normal;">Systems I&#8217;m leaving out of the HVSP and hence also NoSQL categories include:</p>
<ul>
<li><span style="font-style: normal;"><strong>Hadoop 	and other batch-oriented MapReduce.</strong></span><span style="font-style: normal;"> Hadoop isn&#8217;t part of NoSQL. I&#8217;m pretty sure that </span><a href="http://twitter.com/mikeolson/status/10388695185" onclick="javascript:pageTracker._trackPageview('/twitter.com');">Cloudera 	CEO Mike Olson</a><span style="font-style: normal;"> agrees with me.</span></li>
<li><span style="font-style: normal;"><span style="font-weight: normal;">More 	generally, </span></span><span style="font-style: normal;"><strong>non-SQL 	data stores that don&#8217;t meet the HVSP criteria.</strong></span><span style="font-style: normal;"> Dave Kellogg stretches things when he claims that <a href="http://www.kellblog.com/2010/03/10/ieee-computer-society-article-on-nosql-an-executive-level-overview/" onclick="javascript:pageTracker._trackPageview('/www.kellblog.com');">MarkLogic 	is a NoSQL system</a>. (But then, that was in a post where he 	seemingly praised </span><a href="http://www.dbms2.com/2009/12/11/nosql-q-and-a/" >a train wreck of an article</a><span style="font-style: normal;">.)</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">But hey – what good is a categorization if it doesn&#8217;t leave some things out?</span></p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/CByUwq0gWq4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/</feedburner:origLink></item>
		<item>
		<title>Some NoSQL links</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/qSr8cvs6VdA/</link>
		<comments>http://www.dbms2.com/2010/03/12/some-nosql-links/#comments</comments>
		<pubDate>Fri, 12 Mar 2010 23:51:42 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Continuent]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Tokutek]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1692</guid>
		<description><![CDATA[I plan to post a few things soon about MongoDB, Cassandra, and NoSQL in general. So I&#8217;m poking around a bit reading stuff on the subjects. Here are some links I found.

A little over a year ago, Julian Browne put up a great post on Eric Brewer&#8217;s CAP conjecture/theorem, which provides much of the impetus [...]]]></description>
			<content:encoded><![CDATA[<p>I plan to post a few things soon about MongoDB, Cassandra, and NoSQL in general. So I&#8217;m poking around a bit reading stuff on the subjects. Here are some links I found.<span id="more-1692"></span></p>
<ul>
<li>A little over a year ago, Julian Browne put up a great post on <a href="http://www.julianbrowne.com/article/viewer/brewers-cap-theorem" onclick="javascript:pageTracker._trackPageview('/www.julianbrowne.com');">Eric Brewer&#8217;s CAP conjecture/theorem</a>, which provides much of the impetus to relax the traditional requirement for atomicity/consistency.</li>
<li>Even more directly inspirational to NoSQL technology development were two seminal papers: Google&#8217;s on <a href="http://labs.google.com/papers/bigtable.html" onclick="javascript:pageTracker._trackPageview('/labs.google.com');">BigTable</a> and Amazon&#8217;s on <a href="http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf" onclick="javascript:pageTracker._trackPageview('/s3.amazonaws.com');">Dynamo</a>. (That said, I&#8217;m having trouble getting myself to actually read them from start to finish, especially since they&#8217;ve been superseded by subsequent technology development.)</li>
<li>10gen (the MongoDB guys) hosted a NoSQL conference yesterday. Much blogging has ensued. The best post I&#8217;ve seen so far was by <a href="http://blog.marcua.net/post/442594842/notes-from-nosql-live-boston-2010" onclick="javascript:pageTracker._trackPageview('/blog.marcua.net');">Adam Marcus</a>. I find the graph database notes near the bottom particularly interesting.</li>
<li>Mark Callaghan hit back against the <a href="http://mysqlha.blogspot.com/2010/03/plays-well-with-others.html" onclick="javascript:pageTracker._trackPageview('/mysqlha.blogspot.com');">NoSQL <span style="text-decoration: line-through;">movement</span> hype</a>, and in particular against the <a href="http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/" >MySQL/memcached is passe</a>&#8216; meme. On the other hand, he also bemoaned many failings of MySQL. On the third hand, he praised or at least expressed hope for a variety of MySQL-related technologies, including <a href="http://www.dbms2.com/2009/04/16/introduction-to-tokutek/" >Tokutek&#8217;s TokuDB</a> and <a href="http://www.dbms2.com/2009/09/03/continuent-on-clustering/" >Continuent&#8217;s Tungsten</a>.</li>
<li>In connection with that debate, Mark Rendle offered a <a href="http://blog.markrendle.net/2010/03/do-you-need-relational-database.html" onclick="javascript:pageTracker._trackPageview('/blog.markrendle.net');">funny rant</a>, mainly pro-NoSQL, in the style of a Socratic dialogue.</li>
<li>John Quinn of Digg recently described <a href="http://www.stumbleupon.com/su/5099Ti/about.digg.com/node/564" onclick="javascript:pageTracker._trackPageview('/www.stumbleupon.com');">Digg&#8217;s move from MySQL to Cassandra</a>, and outlined a lot of features Digg was adding to Cassandra, all of which it is open-sourcing.</li>
<li>The NoSQL guys maintain their own long <a href="http://nosql-database.org/links.html" onclick="javascript:pageTracker._trackPageview('/nosql-database.org');">list of NoSQL-related links</a>.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/qSr8cvs6VdA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/12/some-nosql-links/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/03/12/some-nosql-links/</feedburner:origLink></item>
		<item>
		<title>Cassandra and the NoSQL scalable OLTP argument</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/U_i2wFZEqd8/</link>
		<comments>http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 19:01:13 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1675</guid>
		<description><![CDATA[Todd Hoff put up a provocative post on High Scalability called MySQL and Memcached: End of an Era? The post itself focuses on observations like:

Facebook invented and is adopting Cassandra.
Twitter is adopting Cassandra.
Digg is adopting Cassandra.
LinkedIn invented and is adopting Voldemort.
Gee, it seems as if the super-scalable website biz has moved beyond MySQL/Memcached.

But in addition, he [...]]]></description>
			<content:encoded><![CDATA[<p>Todd Hoff put up a provocative post on High Scalability called <a href="http://highscalability.com/blog/2010/2/26/mysql-and-memcached-end-of-an-era.html" onclick="javascript:pageTracker._trackPageview('/highscalability.com');">MySQL and Memcached: End of an Era?</a> The post itself focuses on observations like:</p>
<ul>
<li>Facebook invented and is adopting Cassandra.</li>
<li>Twitter is adopting Cassandra.</li>
<li>Digg is adopting Cassandra.</li>
<li>LinkedIn invented and is adopting Voldemort.</li>
<li>Gee, it seems as if the super-scalable website biz has moved beyond MySQL/Memcached.</li>
</ul>
<p>But in addition, he provides a lot of useful links, which DBMS-oriented folks such as myself might have previously overlooked. <span id="more-1675"></span>Following those trails gets one to, among other things:</p>
<ul>
<li>A September, 2009 post outlining <a href="http://about.digg.com/blog/looking-future-cassandra" onclick="javascript:pageTracker._trackPageview('/about.digg.com');">Digg&#8217;s reasons for moving to Cassandra</a>. The core idea is that joining two tables is expensive; it&#8217;s cheaper to store the results prejoined on disk. Details are provided.</li>
<li>A February, 2010 post outlining <a href="http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king" onclick="javascript:pageTracker._trackPageview('/nosql.mypopescu.com');">Twitter&#8217;s reasons for moving to Cassandra</a>. They boil down to &#8220;sufficiently scalable, sufficiently simple, sufficiently robust, robustly open source.&#8221;</li>
<li>A <a href="http://www.niallkennedy.com/blog/uploads/flickr_php.pdf" onclick="javascript:pageTracker._trackPageview('/www.niallkennedy.com');">Flickr slide presentation</a> saying &#8220;normalization is for wimps&#8221;. They seemed to be staying with MySQL, but lusting after XPath.</li>
<li>A nice <a href="http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/" onclick="javascript:pageTracker._trackPageview('/blog.evanweaver.com');">Cassandra technical overview</a> by Evan Weaver of Twitter.</li>
</ul>
<p>I also recall seeing something that said &#8220;We have 13X as many queries as updates, so of course we should optimize for reads,&#8221; but I can&#8217;t find that now. The classical OLTP answer to that would probably be &#8220;Yeah, but by the time you&#8217;re two-phase-committing and integrity-checking all the part of that update, it turns out updates are still what you should optimize for.&#8221; Well, what if the update is so simple that that&#8217;s no longer a valid argument?</p>
<p>There certainly seem to be some non-obvious technical choices being made here, with options being conflated that perhaps shouldn&#8217;t be. In particular, I wonder whether things are being written to cheap disk in a really fast way when it might be better to keep them in more expensive RAM or, perhaps better yet, solid-state memory. Perhaps then the functionality/performance tradeoff wouldn&#8217;t be so painful.</p>
<p>On the other hand, the designers of the world&#8217;s most scalable websites &#8212; e-commerce sites perhaps excepted &#8212; seem pretty unanimous in thinking it&#8217;s best to bake some database/integrity management into the applications, rather than offload it all to an RDBMS. Why? Because the transactions are so simple that hand-coding all that isn&#8217;t prohibitive. And of course because of their extreme performance and scalability needs.</p>
<p>I&#8217;m not sure on what basis one could argue that they&#8217;re wrong.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/U_i2wFZEqd8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/03/02/cassandra-nosql-scalable-oltp/</feedburner:origLink></item>
		<item>
		<title>Data exploration vs. data visualization</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/2hjeK1MiqFE/</link>
		<comments>http://www.dbms2.com/2010/03/01/data-exploration-visualization/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 09:29:47 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1666</guid>
		<description><![CDATA[I&#8217;ve tended to conflate data exploration and data visualization, and I&#8217;m far from alone in doing so. But a recent Economist article is a useful reminder that they aren&#8217;t exactly the same thing.
The article makes the same conflation, but while reading it I noticed something interesting. The concrete examples cited are of clever consultants who [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve tended to conflate <a href="http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/" >data exploration and data visualization</a>, and I&#8217;m far from alone in doing so. But a recent <a href="http://www.economist.com/specialreports/displaystory.cfm?story_id=15557455" onclick="javascript:pageTracker._trackPageview('/www.economist.com');"><em>Economist</em></a> article is a useful reminder that they aren&#8217;t exactly the same thing.<span id="more-1666"></span></p>
<p>The article makes the same conflation, but while reading it I noticed something interesting. The concrete examples cited are of clever consultants who crafted innovative data visualizations on the fly, to make conclusions patently apparent to even mathematically-challenged decision-makers. That kind of thing is important, and has been going on <a href="http://tokyohanna.blogspot.com/2009/12/nightingale-x-healthcare-x-visualizing.html" onclick="javascript:pageTracker._trackPageview('/tokyohanna.blogspot.com');">for over 140 years</a>.*</p>
<p><em>*Yes, I&#8217;m trotting out the Florence Nightingale example again. I continue to be in awe of her.</em></p>
<p>What worries me is the article&#8217;s suggestion that <strong>the best data visualizations are done by visualization experts, as ways of making information apparent to other people.</strong> For as long as data visualization relies on hotshot visual-design experts doing one-off projects, its impact on enterprises overall will remain extremely limited. In other words, <strong>to the extent it is incorrect to conflate data visualization and data exploration, data visualization will remain a fringe technology</strong>.</p>
<p>To be fair, a primary decision support/business intelligence usage cycle has always been &#8212; where by &#8220;always&#8221; I mean &#8220;for at least the past 35+ years&#8221; &#8211;</p>
<ul>
<li><strong>Data exploration</strong>. Power user uses technology to find something interesting.</li>
<li><strong>&#8220;Look what I found!&#8221; </strong>Power user then shows a report, chart, or other summary/representation to colleagues.</li>
</ul>
<p>So to the extent modern interactive data exploration/visualization technology fits that paradigm, great. But to the extent that visualization experts are somehow integral to the technology&#8217;s use, it will remain stuck on the analytic fringe.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/2hjeK1MiqFE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/01/data-exploration-visualization/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/03/01/data-exploration-visualization/</feedburner:origLink></item>
	</channel>
</rss><!-- Dynamic page generated in 0.380 seconds. --><!-- Cached page generated by WP-Super-Cache on 2010-03-19 11:50:54 -->
