<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>DBMS 2 : DataBase Management System Services</title>
	
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Mon, 20 May 2013 10:06:06 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/dbms2/feed" /><feedburner:info uri="dbms2/feed" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Some stuff I’m working on</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/VNMFw2LKdYQ/</link>
		<comments>http://www.dbms2.com/2013/05/20/some-stuff-im-working-on/#comments</comments>
		<pubDate>Mon, 20 May 2013 10:06:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Liberty and privacy]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Structured documents]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=8024</guid>
		<description><![CDATA[1. I have some posts up on Strategic Messaging. The most recent are overviews of messaging, pricing, and positioning. 2. Numerous vendors are blending SQL and JSON management in their short-request DBMS. It will take some more work for me to have a strong opinion about the merits/demerits of various alternatives. The default implementation &#8212; [...]]]></description>
			<content:encoded><![CDATA[<p>1. I have some posts up on <em><a href="http://www.strategicmessaging.com/influencers-long-tail-watts-godin/2008/02/02/">Strategic Messaging</a>.</em> The most recent are overviews of <a href="http://www.strategicmessaging.com/faith-hope-and-clarity/2013/05/10/">messaging</a>, <a href="http://www.strategicmessaging.com/notes-on-pricing/2013/05/07/">pricing</a>, and <a href="http://www.strategicmessaging.com/messaging-and-positioning/2013/04/07/">positioning</a>.</p>
<p>2. Numerous vendors are blending SQL and JSON management in their <a href="http://www.dbms2.com/2011/03/30/short-request-and-analytic-processing/">short-request DBMS</a>. It will take some more work for me to have a strong opinion about the merits/demerits of various alternatives.</p>
<p>The default implementation &#8212; one example would be <a href="http://www.dbms2.com/2012/07/18/clustrix-4/">Clustrix&#8217;s</a> &#8212; is to stick the JSON into something like a BLOB/CLOB field (Binary/Character Large Object), index on individual values, and treat those indexes just like any others for the purpose of SQL statements. Drawbacks include:</p>
<ul>
<li>You have to store or retrieve the JSON in whole documents at a time.</li>
<li>If you are spectacularly careless, you could write JOINs with odd results.</li>
</ul>
<p>IBM DB2 is one recent arrival to the JSON party. Unfortunately, I forgot to ask whether IBM&#8217;s JSON implementation was based on IBM <a href="http://www.dbms2.com/2008/10/05/overview-of-ibm-db2-purexml/">DB2 pureXML</a> when I had the chance, and IBM hasn&#8217;t gotten around to answering my followup query.</p>
<p>3. Nor has IBM gotten around to answering my followup queries on the subject of BLU, an interesting-sounding columnar option for DB2.</p>
<p>4. Numerous clients have asked me whether they should be active in DBaaS (DataBase as a Service). After all, Amazon, Google, Microsoft, Rackspace and salesforce.com are all in that business in some form, and other big companies have dipped toes in as well. <span id="more-8024"></span></p>
<p>I&#8217;m skeptical that one can succeed both in that market and in selling database software, for reasons including:</p>
<ul>
<li>Nobody I can think of has done so.</li>
<li>The value propositions are different.
<ul>
<li>DBaaS is about having administration be so easy that you the customer doesn&#8217;t need to worry about it.</li>
<li>Database software is about one or more of:
<ul>
<li>Development ease.</li>
<li>Price/performance/throughput.</li>
<li>Big-enterprise/legacy-vendor considerations.</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>I&#8217;m also skeptical about service-only DBaaS strategies, because users will naturally resist vendor lock-in.</p>
<p>But despite all my skepticism, DBaaS is an area I should probably learn more about.</p>
<p>5. I plan to spend more time looking at machine learning and other advanced analytics. I doubt they&#8217;ll soon match the past few years&#8217; hype about &#8220;big data analytics&#8221;, but even the reality of modern analytics looks like it&#8217;s getting more interesting. Ditto if somebody has an interesting twist on more traditional predictive analytics.</p>
<p>6. <a href="http://www.dbms2.com/2010/04/04/privacy-liberty-continued/">Three years ago</a>,  I wrote:</p>
<blockquote>
<ul>
<li>It is inevitable* that governments and other constituencies will obtain huge amounts of information, which can be used to drastically restrict everybody’s privacy and freedom.</li>
<li>To protect against this grave threat, multiple layers of defense are needed, technical and legal/regulatory/social/political alike.</li>
<li>One particular layer is getting insufficient attention, namely<strong> restrictions upon the use</strong> (as opposed to the acquisition or retention) <strong>of data</strong>.</li>
</ul>
<p><em>*And indeed in many ways even desirable</em></p></blockquote>
<p>It is now frighteningly obvious that the US is becoming a high-surveillance society. The Boston Marathon bombing added three new elements to an already snowballing trend:</p>
<ul>
<li>A revelation that <a href="http://www.fbi.gov/news/pressrel/press-releases/2011-request-for-information-on-tamerlan-tsarnaev-from-foreign-government">the FBI could track Tamerlan Tsarnaev&#8217;s communication content without any known warrant</a>.</li>
<li>A further revelation that the police know how to put on large paramilitary displays of force (and that the public generally approves).</li>
<li>An increased belief that widespread video surveillance of public places is a Good Thing.</li>
</ul>
<p>I need to write more about privacy.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/VNMFw2LKdYQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2013/05/20/some-stuff-im-working-on/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2013/05/20/some-stuff-im-working-on/</feedburner:origLink></item>
		<item>
		<title>It’s time to change around Monash Research’s mailing lists</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/ECcaq3Eqjn4/</link>
		<comments>http://www.dbms2.com/2013/05/03/its-time-to-change-around-monash-researchs-mailing-lists/#comments</comments>
		<pubDate>Fri, 03 May 2013 09:42:15 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[About this blog]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=8012</guid>
		<description><![CDATA[Email delivery of posts has been screwed up; multiple people tell me they haven&#8217;t gotten their email for months. (In the future, please tell me of such difficulties!) So it&#8217;s time for a change, and I&#8217;m asking for your advice as to what you&#8217;d suggest for our mailing list. Yes, I&#8217;m asking via a blog [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Email delivery of posts</strong> has been screwed up; multiple people tell me they haven&#8217;t gotten their email for months. (In the future, please tell me of such difficulties!) So it&#8217;s time for a change, and I&#8217;m asking for your advice as to what you&#8217;d suggest for our mailing list.</p>
<p><em>Yes, I&#8217;m asking via a blog post, even thought the core problem is that people who want to see my posts via e-mail aren&#8217;t getting them. Please work with me on this anyway. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p>My two basic questions are:</p>
<ul>
<li><strong>What should be the frequency of delivery? </strong>To date, it&#8217;s been nightly (at least in theory).</li>
<li><strong>What delivery technology should be used? </strong>To date, it&#8217;s been FeedBlitz.</li>
</ul>
<p>1. The nightly scheduling has been an artifact of an RSS-to-email link that no longer seems stable. So I&#8217;m thinking of just manually pasting each post into a list email, in which case:</p>
<ul>
<li><strong>Posts could be sent without delay.</strong></li>
<li>Every post would be delivered by separate mail. (As opposed to having only one post per night be mailed, while others just get linked to.)</li>
</ul>
<p>It&#8217;s a bit more work for me, but probably nothing dire. <strong><em>Does lower latency sound good to everybody? <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
</em></strong></p>
<p>2. The main technical options seem to be: <span id="more-8012"></span></p>
<ul>
<li><strong>Free services oriented to discussion lists,</strong> such as Yahoo Groups, but set to announce-only. These have very basic functionality.</li>
<li><strong>Commercial services oriented to marketing email lists,</strong> such as Aweber or MailChimp. <strong><em>Does anybody have favorable or unfavorable experience with particular services?</em></strong> Most vendors surely use one or another, but it&#8217;s tough to guess which they&#8217;ve selected just based on their <del>spam and pabulum</del> informative communications, given the customizability those services provide.</li>
</ul>
<p>Any thoughts would be most welcomed.</p>
<p>3. And while I&#8217;m at it &#8212; what I should I do for <strong>social/sharing</strong> buttons? Presumably, if I included buttons that made it easy for you to tweet links to my posts, submit them to Hacker News, etc., more of you would do so. Which specific options would you like to use?</p>
<ul>
<li>Twitter?</li>
<li>LinkedIn?</li>
<li>Google +?</li>
<li>Facebook?</li>
<li>Slashdot?</li>
<li>Hacker News?</li>
<li>dzone?</li>
<li>Digg?</li>
</ul>
<p>Anything else? I&#8217;d like to omit the more dubious possibilities, as offering everything could be a lot of clutter &#8230;</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/ECcaq3Eqjn4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2013/05/03/its-time-to-change-around-monash-researchs-mailing-lists/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2013/05/03/its-time-to-change-around-monash-researchs-mailing-lists/</feedburner:origLink></item>
		<item>
		<title>More on Actian/ParAccel/VectorWise/Versant/etc.</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/FwHRL0uhEj4/</link>
		<comments>http://www.dbms2.com/2013/04/29/actian-paraccel-vectorwise-versant/#comments</comments>
		<pubDate>Mon, 29 Apr 2013 11:50:52 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Actian and Ingres]]></category>
		<category><![CDATA[Amazon and its cloud]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[VectorWise]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=8004</guid>
		<description><![CDATA[My quick reaction to the Actian/ParAccel deal was negative. A few challenges to my views then emerged. They didn&#8217;t really change my mind. Amazon Redshift Amazon did a deal with ParAccel that amounted to: Amazon got a very cheap license to a limited subset of ParAccel&#8217;s product &#8230; &#8230; so that it could launch a [...]]]></description>
			<content:encoded><![CDATA[<p>My quick reaction to <a href="http://www.dbms2.com/2013/04/25/goodbye-vectorwise-farewell-paraccel/">the Actian/ParAccel deal</a> was negative. A few challenges to my views then emerged. They didn&#8217;t really change my mind.</p>
<p><strong>Amazon Redshift</strong></p>
<p><a href="http://www.dbms2.com/2012/12/09/amazon-redshift-and-its-implications/">Amazon did a deal with ParAccel</a> that amounted to:</p>
<ul>
<li>Amazon got a very cheap license to a limited subset of ParAccel&#8217;s product &#8230;</li>
<li>&#8230; so that it could launch a service called Amazon Redshift.</li>
<li>Amazon also invested in ParAccel.</li>
</ul>
<p>Some argue that this is great for ParAccel&#8217;s future prospects. I&#8217;m not convinced.</p>
<p>No doubt there are and will be Redshift users, evidently including <a href="http://www.pcworld.idg.com.au/article/459745/infor_partnering_amazon_redshift-powered_cloud_analytics_platform/">Infor</a>. But so far as I can tell, Redshift uses very standard SQL, so it doesn&#8217;t seed a ParAccel market in terms of developer habits. The administration/operation story is similar. So outside of general validation/bragging rights, Redshift is not a big deal for ParAccel.</p>
<p><strong>OEMs and bragging rights</strong></p>
<p>It&#8217;s not just Amazon and Infor; there&#8217;s also a MicroStrategy deal to OEM ParAccel &#8212; I think it&#8217;s the real ParAccel software in that case &#8212; for a particular service, <a href="http://www.dbms2.com/2012/12/09/paraccel-update/">MicroStrategy Wisdom</a>. But unless I&#8217;m terribly mistaken, HP Vertica, Sybase IQ and even Infobright each have a lot more OEMs than ParAccel, just as they have a lot more customers than ParAccel overall.</p>
<p>This OEM success is a great validation for the idea of columnar analytic RDBMS in general, but I don&#8217;t see where it&#8217;s an advantage for ParAccel vs. the columnar leaders. <span id="more-8004"></span></p>
<p><strong>Concurrency</strong></p>
<p>As I admitted in the comment thread to my first Actian/ParAccel post, I&#8217;m confused about what kind of concurrent usage ParAccel can really support. The data I have, e.g. in the link immediately above, is not conclusive. Googling suggests that VectorWise was at <a href="http://www.actian.com/kb/article/420781">one user per core</a> a couple of years ago, supportive of my hypothesis that it doesn&#8217;t have some big concurrency edge on ParAccel. But to repeat &#8212; I don&#8217;t really know.</p>
<p><strong>DBMS acquisitions in the past</strong></p>
<p>My history blog on <a href="http://www.softwarememories.com/2013/04/29/dbms-acquisitions/">DBMS acquisitions</a> yielded more favorable examples than I was expecting. (Of course, I omitted a lot of small and boring failures.) And DBMS conglomerates are the rule more than the exception, with IBM, Sybase, Teradata and Oracle all adopting acquisition-aided multi-DBMS strategies, at least to some extent.</p>
<p>That said, Sybase is the main example of a vendor of a slow-growth DBMS (Adaptive Server Enterprise) doing well with a faster-growing one (Sybase IQ). Perhaps not coincidentally, Actian&#8217;s latest management team draws significantly on Sybase. So yes; ParAccel is now owned by a company run by guys who know something about selling columnar DBMS.</p>
<p>But the whole thing would be more convincing if Ingres had shown more life under Actian&#8217;s ownership, or indeed at any point in the past 20 years. My bottom line is that <a href="http://www.dbms2.com/2011/09/25/ingres-actian/">Actian was floundering badly in the DBMS market 1 1/2 years ago</a>, and not a lot of favorable news has emerged in the interim &#8212; except, quite arguably, for the management changes and acquisitions themselves.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/FwHRL0uhEj4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2013/04/29/actian-paraccel-vectorwise-versant/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2013/04/29/actian-paraccel-vectorwise-versant/</feedburner:origLink></item>
		<item>
		<title>Goodbye VectorWise, farewell ParAccel?</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/ozPS5DK-R-U/</link>
		<comments>http://www.dbms2.com/2013/04/25/goodbye-vectorwise-farewell-paraccel/#comments</comments>
		<pubDate>Thu, 25 Apr 2013 23:59:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Actian and Ingres]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[HP and Neoview]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[VectorWise]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=7992</guid>
		<description><![CDATA[Actian, which already owns VectorWise, is also buying ParAccel. The argument for why this kills VectorWise is simple. ParAccel does most things VectorWise does, more or less as well. It also does a lot more: ParAccel scales out. ParAccel has added analytic platform capabilities. I don&#8217;t know for sure, but I&#8217;d guess ParAccel has more [...]]]></description>
			<content:encoded><![CDATA[<p>Actian, which already owns <a href="http://www.dbms2.com/2010/06/11/ingres-vectorwise-technical-highlights/">VectorWise</a>, is also buying <a href="http://www.dbms2.com/2012/12/09/paraccel-update/">ParAccel</a>. The argument for why this kills VectorWise is simple. ParAccel does most things VectorWise does, more or less as well. It also does a lot more:</p>
<ul>
<li>ParAccel scales out.</li>
<li>ParAccel has added analytic platform capabilities.</li>
<li>I don&#8217;t know for sure, but I&#8217;d guess ParAccel has more mature management/plumbing capabilities as well.</li>
</ul>
<p>One might conjecture that ParAccel is bad at highly concurrent, single-node use cases, and VectorWise is better at them &#8212; but at the link above, ParAccel bragged of supporting 5,000 concurrent connections. Besides, if one is just looking for a high-use reporting server, why not get Sybase IQ?? Anyhow, <strong>Actian hasn&#8217;t been investing enough in VectorWise to make it a major market player, </strong>and <strong>they&#8217;re unlikely to start now that they own ParAccel </strong>as well.</p>
<p>But<strong> I expect ParAccel to fail too</strong>. Reasons include:</p>
<ul>
<li>ParAccel&#8217;s small market share and traction.</li>
<li>The disruption of any acquisition like this one.</li>
<li>My general view of Actian as a company.</li>
</ul>
<p><span id="more-7992"></span>2 years after being acquired, Vertica &#8212; which conceptually has always been ParAccel&#8217;s closest competitor &#8212; has finally taken major hits on engineering staffing. Even so, I expect HP Vertica to reopen what was once a large technology and momentum gap vs. ParAccel.</p>
<p>My views on Actian start:</p>
<ul>
<li>Actian is attempting to build a database software conglomerate on the cheap, starting with Ingres, ParAccel, VectorWise, Pervasive (itself a small conglomerate) and Versant.</li>
<li>Actian hasn&#8217;t accomplished much with Ingres, its original acquisition.</li>
<li>Actian hasn&#8217;t accomplished much with VectorWise.</li>
<li><a href="http://www.dbms2.com/2011/09/25/ingres-actian/">Actian&#8217;s brief, embarrassing pivot away from database software</a> was a joke. (The comments at that link also show VectorWise&#8217;s positioning as very different in September, 2011 than it is now.)</li>
<li>I&#8217;ve had some very bad experiences with Actian management, although it seems to have largely turned over since then.</li>
<li>I can&#8217;t identify the folks to make this work at the acquired pieces either (even though I think well of a few of them, e.g. Mike Hoskins and Rick Glick).</li>
</ul>
<p>I.e., <a href="http://www.monashreport.com/2006/02/16/whatever-oracle-is-up-to-it-should-work-moderately-well/">building a database conglomerate is hard</a>, and Actian isn&#8217;t up to the challenge.</p>
<p>Actian has three main paths it can follow for synergy:</p>
<ul>
<li><strong>Acquire a lot of pieces and flip the whole thing for more money to a foolish buyer.</strong> This strategy worked splendidly for Autonomy, and to some extent for Sybase as well. But it&#8217;s a longshot, and not necessarily a win for customers even if investors do well.</li>
<li><strong>Sell a bunch of disparate products through the same sales force.</strong> Tough to execute. And at best it raises sales coverage up to the level of that for the most successful product &#8212; and Actian doesn&#8217;t really have successful new products.</li>
<li><strong>Integrate the technologies.</strong> Blech. You don&#8217;t integrate DBMS with wildly different architectures, as Informix died trying in the 1990s.</li>
</ul>
<p>I don&#8217;t see enough opportunity there for the whole thing to work out, with sales synergy being the best opportunity to prove me wrong.</p>
<p><strong><em>Related links</em></strong></p>
<ul>
<li><a href="http://www.informationweek.com/software/information-management/actian-acquires-paraccel-fuel-behind-ama/240153593">Doug Henschen</a> and <a href="http://gigaom.com/2013/04/25/actian-buys-amazon-web-services-database-partner-paraccel/">Derrick Harris</a> offer quotes and numbers about the deal.</li>
<li>VectorWise&#8217;s academic founders <a href="http://www.cwi.nl/people/800">Peter Boncz</a> and <a href="http://www.linkedin.com/in/marcinzukowski">Marcin Zukowski</a> seem to have left the company.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/ozPS5DK-R-U" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2013/04/25/goodbye-vectorwise-farewell-paraccel/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2013/04/25/goodbye-vectorwise-farewell-paraccel/</feedburner:origLink></item>
		<item>
		<title>Analytic application themes</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/G49jgHmbXLA/</link>
		<comments>http://www.dbms2.com/2013/04/25/analytic-application-themes/#comments</comments>
		<pubDate>Thu, 25 Apr 2013 08:41:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aerospike]]></category>
		<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[GIS and geospatial]]></category>
		<category><![CDATA[Health care]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MemSQL]]></category>
		<category><![CDATA[Platfora]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[WibiData]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=7981</guid>
		<description><![CDATA[I talk with a lot of companies, and repeatedly hear some of the same application themes. This post is my attempt to collect some of those ideas in one place. 1. So far, the buzzword of the year is &#8220;real-time analytics&#8221;, generally with &#8220;operational&#8221; or &#8220;big data&#8221; included as well. I hear variants of that [...]]]></description>
			<content:encoded><![CDATA[<p><em>I talk with a lot of companies, and repeatedly hear some of the same application themes. This post is my attempt to collect some of those ideas in one place.</em></p>
<p>1. So far, the buzzword of the year is &#8220;real-time analytics&#8221;,<strong> generally with &#8220;operational&#8221; or &#8220;big data&#8221; included as well. I hear variants of that positioning from NewSQL ven</strong>dors (e.g. <a href="http://www.dbms2.com/2013/04/23/memsql-scales-out/">MemSQL</a>), NoSQL vendors (e.g. <a href="http://www.dbms2.com/2012/08/27/aerospike-the-former-citrusleaf/">AeroSpike</a>), BI stack vendors (e.g. <a href="http://www.dbms2.com/2013/03/26/platfora-at-the-time-of-first-ga/">Platfora</a>), application-stack vendors (e.g. <a href="http://www.dbms2.com/2012/02/06/wibidata-derived-data-and-analytic-schema-flexibility/">WibiData</a>), log analysis vendors (led by <a href="http://www.dbms2.com/2012/01/10/splunk-update/">Splunk</a>), data management vendors (e.g. <a href="http://www.dbms2.com/2012/11/01/more-on-cloudera-impala/">Cloudera</a>), and of course the <a href="http://www.dbms2.com/2011/11/10/cep-streaming-catchup/">CEP</a> industry.</p>
<p><em>Yeah, yeah, I know &#8212; not all the named companies are in exactly the right market category.</em> <em>But that&#8217;s <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">hard to avoid</a>.</em></p>
<p>Why this gold rush? On the demand side, there&#8217;s a real or imagined need for speed. On the supply side, I&#8217;d say:</p>
<ul>
<li>There are vast numbers of companies offering data-management-related technology. They need ways to differentiate.</li>
<li>Doing <a href="http://www.dbms2.com/2011/03/30/short-request-and-analytic-processing/">analytics at short-request speeds</a> is an obvious data-management-related challenge, and not yet comprehensively addressed.</li>
</ul>
<p>2. More generally, most of the applications I hear about are analytic, or have a strong analytic aspect. The three biggest areas &#8212; and these overlap &#8212; are:</p>
<ul>
<li>Customer interaction</li>
<li>Network and sensor monitoring</li>
<li>Game and mobile application back-ends</li>
</ul>
<p>Also arising fairly frequently are:</p>
<ul>
<li>Algorithmic trading</li>
<li>Anti-fraud</li>
<li>Risk measurement</li>
<li>Law enforcement/national security</li>
<li>Healthcare</li>
<li>Stakeholder-facing analytics</li>
</ul>
<p>I&#8217;m hearing less about quality, defect tracking, and equipment maintenance than I used to, but those application areas have anyway been ebbing and flowing for decades.</p>
<p><span id="more-7981"></span>3. Much of <strong>customer interaction</strong> revolves around <strong>recommendation</strong> and <strong>personalization.</strong> In connection with that I&#8217;ll remind you:</p>
<ul>
<li>Multiple sources say that 5 millisecond response is a real need. <a href="http://www.dbms2.com/2013/01/17/ycsb-benchmark-notes/#comment-337489">Srini Srinivasan</a> explained why in a January comment.</li>
<li>The results of the recommendation and personalization can be delivered in many different ways &#8212; product recommendations, ads, special offers, email, snail mail, call center scripts and more. This is the paradigmatic example for my skepticism about <a href="http://www.dbms2.com/2013/02/22/should-you-offer-complete-analytic-applications/">complete analytic applications</a>.</li>
</ul>
<p>4. <strong>Networks </strong>and<strong> sensors</strong> emit the epitome of <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>. Data sources include web logs, network logs (in the IT sense), telecommunication networks, other utilities (e.g. electric), vehicle fleets, and more. Application themes include:</p>
<ul>
<li>Human monitoring, via some kind of <a href="http://www.dbms2.com/2012/11/05/real-time-confusion/">real-time</a> business intelligence view. I hear about that a lot.</li>
<li>Various kinds of automated response. (Security is an obvious example.)</li>
<li>Integration with other kinds of application, data source, or use case.</li>
</ul>
<p>As one example of the last point, Oliver Ratzesberger told me years ago that eBay had up-to-the-minute BI cubes integrating customer response and log data, for the purpose of quickly detecting technology problems. Acunu recently told me that similar applications are one of their sales focuses.</p>
<p>5. In another example,<strong> games</strong> and <strong>mobile applications</strong> can be a lot like websites in terms of the analytics that support them (all the more so if we&#8217;re talking about games with in-app purchases). Two special features come up repeatedly, however &#8212; leaderboards for games, and geospatial data sent by mobile devices.</p>
<p><strong>6. Algorithmic trading</strong> is flashy because of the sums of money involved, and because of what is often hyper-low latency; I&#8217;ve even heard 50 microseconds, and that&#8217;s a slightly out of date figure for a sequence of several atomic operations. But otherwise it&#8217;s not one of the more interesting areas to me, for at least two reasons:</p>
<ul>
<li>It depends on a lot of latency-specific stuff, such as hand-crafted hardware.</li>
<li>The participants are secretive &#8212; understandably so as they&#8217;re literally in a race with each other &#8211;and don&#8217;t reveal much.</li>
</ul>
<p>Another reason I don&#8217;t study it much is that high-frequency trading could be devastated at any time by some simple regulatory changes.</p>
<p>7. I finally figured out one of the big drivers for better <strong>risk analysis.</strong> Banks need to keep capital lying around to cover a fraction of the risk they take on. If they can estimate the risk more precisely, and come up with a lower number, then they need to keep less capital. That&#8217;s a lot like finding large bags of money.</p>
<p>8. <strong>Anti-fraud</strong> applications arise in many industries, with many different kinds of data and latency requirement. For example:</p>
<ul>
<li>Insurers don&#8217;t want to pay bogus claims. They usually have weeks to think about that problem.</li>
<li>Telcos don&#8217;t want to provision services for customers who will defraud them. They have to decide at call-center speed.</li>
<li>Similarly, retailers don&#8217;t want to accept bogus returns.</li>
<li>Stockbrokers don&#8217;t want rogue traders to defeat their controls. A lot of data and analysis go into that mission, as billions of dollars &#8212; literally &#8212; can be at stake.</li>
</ul>
<p>9. And finally, the recent Boston Marathon bombing has brought <strong>law-enforcement/anti-terrorism </strong>applications to the fore. The Boston Globe criticized <a href="http://www.boston.com/news/local/massachusetts/2013/04/24/government-terrorism-information-sharing-program-rated-high-risk-despite-years-effort-since/ncokk1agSs8JN5NOeFObeI/story.html">difficulties in information sharing</a>, but the money quote is:</p>
<blockquote><p>The FBI followed up by checking government databases and looking for things such as “derogatory telephone communications, possible use of online sites associated with the promotion of radical activity, associations with other persons of interest, travel history and plans, and education history,” according to FBI Supervisory Agent Jason J. Pack. “The FBI also interviewed Tamerlan Tsarnaev and family members. The FBI did not find any terrorism activity.”</p></blockquote>
<p>Neither the telephone intercept nor the web-surfing tracking is a capability the government routinely admits, unless there was something like a wiretap order that I so far haven&#8217;t seen reported.</p>
<p><strong><em>Related links</em></strong></p>
<ul>
<li>Government surveillance is even more inevitable than when I wrote in 2010 that <a href="%E2%80%A2%09http:/www.dbms2.com/2010/04/04/privacy-liberty-continued/">freedom can only be preserved by limiting government USES of data</a>.</li>
<li><a href="%E2%80%A2%09http:/www.dbms2.com/2010/05/15/stakeholder-facing-analytics/">Stakeholder-facing analytics</a> isn&#8217;t much better understood than when I wrote about it in 2010.</li>
<li>I wrote up a different list of <a href="http://www.monashreport.com/2006/09/08/where-does-data-mining-succeed-and-why/">analytic use cases</a> back in 2006.</li>
<li>The continued drop in high-frequency trading latency strengthens my 2009 contrast between the <a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/">speed of a turtle and the speed of light</a>; we&#8217;re now over a 3 * 10^10 difference between the speed of trading and the speed of generic planning, and many turtles walk well faster than 1 cm/sec.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/G49jgHmbXLA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2013/04/25/analytic-application-themes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2013/04/25/analytic-application-themes/</feedburner:origLink></item>
		<item>
		<title>MemSQL scales out</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/9sk0062_jxg/</link>
		<comments>http://www.dbms2.com/2013/04/23/memsql-scales-out/#comments</comments>
		<pubDate>Tue, 23 Apr 2013 08:56:10 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Emulation, transparency, portability]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[MemSQL]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NewSQL]]></category>
		<category><![CDATA[Transparent sharding]]></category>
		<category><![CDATA[Zynga]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=7969</guid>
		<description><![CDATA[The third of my three MySQL-oriented clients I alluded to yesterday is MemSQL. When I wrote about MemSQL last June, the product was an in-memory single-server MySQL workalike. Now scale-out has been added, with general availability today. MemSQL&#8217;s flagship reference is Zynga, across 100s of servers. Beyond that, the company claims (to quote a late [...]]]></description>
			<content:encoded><![CDATA[<p>The third of my three MySQL-oriented clients I alluded to <a href="http://www.dbms2.com/2013/04/22/notes-on-tokudb-and-geniedb/">yesterday</a> is MemSQL. When <a href="http://www.dbms2.com/2012/06/18/introduction-to-memsql/">I wrote about MemSQL last June</a>, the product was an in-memory single-server MySQL workalike. Now scale-out has been added, with general availability today.</p>
<p>MemSQL&#8217;s flagship reference is Zynga, across 100s of servers. Beyond that, the company claims (to quote a late draft of the press release):</p>
<blockquote><p>Enterprises are already using distributed MemSQL in production for operational analytics, network security, real-time recommendations, and risk management.</p></blockquote>
<p>All four of those use cases fit MemSQL&#8217;s positioning in &#8220;real-time analytics&#8221;. Besides Zynga, MemSQL cites penetration into traditional low-latency markets &#8212; financial services (various subsectors) and ad-tech.</p>
<p>Highlights of MemSQL&#8217;s new distributed architecture start: <span id="more-7969"></span></p>
<ul>
<li>There are two kinds of MemSQL node &#8212; &#8220;aggregator&#8221; and &#8220;leaf&#8221;.
<ul>
<li>Aggregators are a kind of head node. You can have a bunch of them.</li>
<li>Leafs run full single-server MemSQL. You can have a bunch of them too.</li>
</ul>
</li>
<li>MemSQL has two query optimizers. One kind runs on the aggregator nodes, and thinks about the whole cluster. The other runs on the leafs, and only thinks about its own node.</li>
<li>Much of the join and aggregation work is done on the aggregator nodes, but I didn&#8217;t pursue that issue in much detail.</li>
<li>It is good policy &#8212; and supported &#8212; to replicate small dimension/reference tables across the cluster. These are replicated to aggregator and leaf nodes alike. (This tells us that some joins are indeed done on the leafs. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> )</li>
<li>MemSQL replication can be synchronous or asynchronous. It can be used for high availability.</li>
</ul>
<p>Also:</p>
<ul>
<li>MemSQL writes (whether primary or replicated) go to a buffer. The buffer size can be 0 or positive, in a tradeoff of durability vs. the likelihood of a disk I/O bottleneck.</li>
<li>MemSQL has many virtual nodes on each physical (leaf) node. (This is pretty much an industry-standard best practice, as it helps with elasticity, recovery from node failure, and so on.)</li>
<li>Compression is still a future feature.</li>
<li>So is <a href="http://www.dbms2.com/2012/07/28/some-vertica-6-features/">online schema change</a>.</li>
<li>Leaf nodes have cost-based optimizers.</li>
<li>MemSQL&#8217;s aggregator (cluster-wide) optimizer is mainly heuristic, but is supposed to get more cost-based in future releases.</li>
<li>In some releases it will be possible to keep MemSQL running while upgrading the software. But that&#8217;s not a promise for releases that change how replication works.</li>
</ul>
<p>And which not-easily-parallelized aggregate did MemSQL implement first? The same one <a href="http://www.dbms2.com/2013/03/26/platfora-at-the-time-of-first-ga/">Platfora</a> did &#8212; COUNT DISTINCT.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/9sk0062_jxg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2013/04/23/memsql-scales-out/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2013/04/23/memsql-scales-out/</feedburner:origLink></item>
		<item>
		<title>Notes on TokuDB and GenieDB</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/bhU3aG7eAJ4/</link>
		<comments>http://www.dbms2.com/2013/04/22/notes-on-tokudb-and-geniedb/#comments</comments>
		<pubDate>Mon, 22 Apr 2013 10:07:55 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[GenieDB]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NewSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Tokutek and TokuDB]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=7960</guid>
		<description><![CDATA[Last week, I edited press releases back-to-back-to-back for three clients, all with announcements at this week&#8217;s Percona Live. The ones with embargoes ending today are Tokutek and GenieDB. Tokutek&#8217;s news is that they&#8217;re open sourcing much of TokuDB, but holding back hot backup for their paid version. I approve of this strategy &#8212; &#8220;doesn&#8217;t lose [...]]]></description>
			<content:encoded><![CDATA[<p>Last week, I edited press releases back-to-back-to-back for three clients, all with announcements at this week&#8217;s Percona Live. The ones with embargoes ending today are Tokutek and GenieDB.</p>
<p>Tokutek&#8217;s news is that they&#8217;re open sourcing much of TokuDB, but holding back hot backup for their paid version. I approve of this strategy &#8212; &#8220;doesn&#8217;t lose data&#8221; is an important feature, and well worth paying for.</p>
<p><em>I kid, I kid. Any system has at least a bad way to do backups &#8212; e.g. one that involves slowing performance, or perhaps even requires taking applications offline altogether. So the real points of good backup technology are:</em></p>
<ul>
<li><em>To keep performance steady.</em></li>
<li><em>To make the whole thing as easy to manage as possible.</em></li>
</ul>
<p>GenieDB is announcing a Version 2, which is basically a performance release. So in lieu of pretending to have much article-worthy news, GenieDB is taking the opportunity to remind folks of its core marketing messages, with catchphrases such as &#8220;multi-regional self-healing MySQL&#8221;. Good choice; indeed, I wish more vendors would adopt that marketing tactic.</p>
<p>Along the way, I did learn a bit more about GenieDB. In particular:</p>
<ul>
<li>GenieDB is now just backed by a hacked version of InnoDB (no more Berkeley DB Java Edition).</li>
<li>Why hacked? Because GenieDB appends a Lamport timestamp to every row, which somehow leads to a need to modify how indexes and caching work.</li>
<li>Benefits of the chamge include performance and simpler (for the vendor) development.</li>
<li>An arguable disadvantage of the switch is that GenieDB no longer can use Berkeley DB&#8217;s key-value interface &#8212; but MySQL now has one of those too.</li>
</ul>
<p>I also picked up some GenieDB company stats I didn&#8217;t know before &#8212; 9 employees and 2 paying customers.</p>
<p><strong><em>Related links</em></strong></p>
<ul>
<li>Recent posts about <a href="http://www.dbms2.com/2013/01/15/tokutek-update/">TokuDB</a> and <a href="http://www.dbms2.com/2013/01/07/introduction-to-geniedb/">GenieDB</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/bhU3aG7eAJ4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2013/04/22/notes-on-tokudb-and-geniedb/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2013/04/22/notes-on-tokudb-and-geniedb/</feedburner:origLink></item>
		<item>
		<title>Notes on Teradata systems</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/cAwKEj2hoCM/</link>
		<comments>http://www.dbms2.com/2013/04/15/notes-on-teradata-systems/#comments</comments>
		<pubDate>Mon, 15 Apr 2013 06:53:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Pricing]]></category>
		<category><![CDATA[SAS Institute]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=7942</guid>
		<description><![CDATA[Teradata is announcing its new high-end systems, the Teradata 6700 series. Notes on that include: Teradata tends to get 35-55% (roughly speaking) annual performance improvements, as measured by its internal blended measure Tperf. A big part of this is exploiting new-generation Intel processors. This year the figure is around 40%. The 6700 is based on [...]]]></description>
			<content:encoded><![CDATA[<p>Teradata is announcing its new high-end systems, the Teradata 6700 series. Notes on that include:</p>
<ul>
<li>Teradata tends to get 35-55% (roughly speaking) annual performance improvements, as measured by its internal blended measure Tperf. A big part of this is exploiting new-generation Intel processors.</li>
<li>This year the figure is around 40%.</li>
<li>The 6700 is based on Intel&#8217;s Sandy Bridge.</li>
<li>Teradata previously told me that Ivy Bridge &#8212; the next one after Sandy Bridge &#8212; could offer <a href="http://www.dbms2.com/2012/03/09/hardware-and-components-lessons-from-teradata/">a performance &#8220;discontinuity&#8221;</a>. So, while this is just a guess, I expect that next year&#8217;s Teradata performance improvement will beat this year&#8217;s.</li>
<li>Teradata has now largely switched over to InfiniBand.</li>
</ul>
<p>Teradata is also talking about data integration and best-of-breed systems, with buzzwords such as:</p>
<ul>
<li>Teradata Unified Data Architecture.</li>
<li>Fabric-based computing, even though this isn&#8217;t really about storage.</li>
<li><a href="http://www.dbms2.com/2013/04/15/teradata-sql-h/">Teradata SQL-H</a>.</li>
</ul>
<p><span id="more-7942"></span>The upshot is that Teradata has at least 6 kinds of rack or cabinet it wants to sell you &#8212; along with software to connect them &#8212; of which it really thinks you should get at least 3:</p>
<ul>
<li>The 4 main Teradata-software appliances:
<ul>
<li>Active Enterprise Data Warehouse (the new 6700). Teradata thinks every sufficiently large enterprise should have one of these.</li>
<li>Extreme Performance Appliance (Teradata 4xxx), based on <a href="http://www.dbms2.com/2012/10/17/notes-on-analytic-hardware/">solid-state drives</a> (which are also used in the 6xxx systems). At least I think so; the 4xxx wasn&#8217;t in the most recent slide deck I saw.</li>
<li>Data Warehouse Appliance (Teradata 2700).</li>
<li>Extreme Data Appliance (Teradata 1650).</li>
</ul>
</li>
<li>The <a href="http://www.dbms2.com/2012/10/17/hadoop-teradata-aster-big-analytics-appliance/">Teradata Aster Big Analytics Appliance</a>, running Aster and Hadoop software. Teradata basically thinks everybody should have one of these too.</li>
<li>A separate cabinet for special-purpose &#8220;Teradata Managed Servers&#8221;. While there&#8217;s some space for Managed Servers in other Teradata appliances, Teradata now offers so many such capabilities that it thinks you will likely need a separate rack for those as well. These include (partial list):
<ul>
<li>Viewpoint system management.</li>
<li>Backup.</li>
<li><a href="http://www.dbms2.com/2011/10/03/teradata-unity-active-replication/">Teradata Unity</a>.</li>
<li>Data movement, which is not the same thing as Teradata Unity.</li>
<li>Data loading, which is yet something else.</li>
<li>Generic compute (notably, to run SAS).</li>
</ul>
</li>
</ul>
<p>Even that doesn&#8217;t exhaust the possibilities:</p>
<ul>
<li>The 36 InfiniBand ports Teradata can fit into a cabinet aren&#8217;t enough, it suggests and presumably will sell you free-standing Mellanox switches as an alternative.</li>
<li>That slide deck split the Big Analytics Appliance back out into Aster and Hadoop options.</li>
<li>There also seems to be a SAS-specific modeling appliance.</li>
</ul>
<p>And you can have &#8212; or in some cases must have &#8212; Teradata Managed Server nodes in other kinds of Teradata appliance.</p>
<p>Finally, Teradata also offers a stand-alone single- or several-node Teradata 670 Data Mart Appliance, notes on which include:</p>
<ul>
<li>The Teradata 670&#8242;s entry price is under $1/2 million, if you want to use it as your first Teradata system (something that evidently is happening, mainly outside the Americas).</li>
<li>Another use for the Teradata 670 is for physical &#8212; as opposed to virtual &#8212; data mart spin-out.</li>
<li>The primary use for the Teradata Data Mart Appliance, however, seems to be test/development for larger Teradata systems.</li>
<li>The Teradata Data Mart Appliance is one of the options for placing in a separate managed-server Teradata rack.</li>
</ul>
<p><strong><em>Related links</em></strong></p>
<ul>
<li>My recent musings on the variety of <a href="http://www.dbms2.com/2013/03/24/appliances-clusters-and-clouds/">clusters and appliances</a> an enterprise could have.</li>
<li>A March, 2012 post on various vendors&#8217; admissions that <a href="http://www.dbms2.com/2012/03/16/juggling-analytic-databases/">multiple analytic database systems are needed</a>.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/cAwKEj2hoCM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2013/04/15/notes-on-teradata-systems/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2013/04/15/notes-on-teradata-systems/</feedburner:origLink></item>
		<item>
		<title>Teradata SQL-H</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/lfmuqXbIwsE/</link>
		<comments>http://www.dbms2.com/2013/04/15/teradata-sql-h/#comments</comments>
		<pubDate>Mon, 15 Apr 2013 06:46:28 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data integration and middleware]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Emulation, transparency, portability]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=7934</guid>
		<description><![CDATA[As vendors so often do, Teradata has caused itself some naming confusion. SQL-H was introduced as a facility of Teradata Aster, to complement SQL-MR.* But while SQL-MR is in essence a set of SQL extensions, SQL-H is not. Rather, SQL-H is a transparency interface that makes Hadoop data responsive to the same code that would [...]]]></description>
			<content:encoded><![CDATA[<p>As vendors so often do, Teradata has caused itself some naming confusion. <a href="http://www.dbms2.com/2012/06/26/teradata-sql-h-using-hcatalog/">SQL-H was introduced as a facility of Teradata Aster</a>, to complement SQL-MR.* But while SQL-MR is in essence a set of SQL extensions, SQL-H is not. Rather, SQL-H is a transparency interface that makes Hadoop data responsive to the same code that would work on Teradata Aster &#8230;</p>
<p><em>*Speaking of confusion &#8212; Teradata Aster seems to use the spellings SQL/MR and SQL-MR interchangeably.</em></p>
<p>&#8230; except that now there&#8217;s also a SQL-H for regular Teradata systems as well. While it has the same general features and benefits as SQL-H for Teradata Aster, the details are different, since the underlying systems are.</p>
<p>I hope that&#8217;s clear. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/lfmuqXbIwsE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2013/04/15/teradata-sql-h/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2013/04/15/teradata-sql-h/</feedburner:origLink></item>
		<item>
		<title>Introduction to Deep Information Sciences and DeepDB</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/ErZSKoy_JTE/</link>
		<comments>http://www.dbms2.com/2013/04/14/deep-information-sciences-deepdb/#comments</comments>
		<pubDate>Sun, 14 Apr 2013 04:33:17 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Akiban]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[GenieDB]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NewSQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[NuoDB]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ScaleDB]]></category>
		<category><![CDATA[Schooner Information Technology]]></category>
		<category><![CDATA[Tokutek and TokuDB]]></category>
		<category><![CDATA[Transparent sharding]]></category>
		<category><![CDATA[VoltDB and H-Store]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=7907</guid>
		<description><![CDATA[I talked Friday with Deep Information Sciences, makers of DeepDB. Much like TokuDB &#8212; albeit with different technical strategies &#8212; DeepDB is a single-server DBMS in the form of a MySQL engine, whose technology is concentrated around writing indexes quickly. That said: DeepDB&#8217;s indexes can help you with analytic queries; hence, DeepDB is marketed as [...]]]></description>
			<content:encoded><![CDATA[<p>I talked Friday with Deep Information Sciences, makers of DeepDB. Much like <a href="http://www.dbms2.com/2013/01/15/tokutek-update/">TokuDB</a> &#8212; albeit with different technical strategies &#8212; DeepDB is a single-server DBMS in the form of a MySQL engine, whose technology is concentrated around writing indexes quickly. That said:</p>
<ul>
<li>DeepDB&#8217;s indexes can help you with analytic queries; hence, DeepDB is marketed as supporting OLTP (OnLine Transaction Processing) and analytics in the same system.</li>
<li>DeepDB is marketed as &#8220;designed for big data and the cloud&#8221;, with reference to &#8220;Volume, Velocity, and Variety&#8221;. What I could discern in support of that is mainly:
<ul>
<li>DeepDB has been tested at up to 3 terabytes at customer sites and up to 1 billion rows internally.</li>
<li>Like most other <a href="http://www.dbms2.com/2013/01/05/newsql-thoughts/">NewSQL</a> and NoSQL DBMS, DeepDB is append-only, and hence could be said to &#8220;stream&#8221; data to disk.</li>
<li>DeepDB&#8217;s indexes could at some point in the future be made to work well with non-tabular data.*</li>
<li>The Deep guys have plans and designs for scale-out &#8212; <a href="http://www.dbms2.com/2011/02/24/transparent-sharding/">transparent sharding</a> and so on.</li>
</ul>
</li>
</ul>
<p><em>*For reasons that do not seem closely related to product reality, DeepDB is marketed as if it supports &#8220;unstructured&#8221; data today.</em></p>
<p>Other NewSQL DBMS seem &#8220;designed for big data and the cloud&#8221; to at least the same extent DeepDB is. However, if we&#8217;re interpreting <a href="http://www.dbms2.com/2011/09/11/big-data-has-jumped-the-shark/">&#8220;big data&#8221;</a> to include <a href="http://www.dbms2.com/2012/01/08/big-data-terminology-and-positioning/">multi-structured data</a> support &#8212; well, only half or so of the NewSQL products and companies I know of share Deep&#8217;s interest in branching out. In particular:</p>
<ul>
<li><a href="http://www.akiban.com/">Akiban</a> definitely does. <em>(Note: Stay tuned for some next-steps company news about Akiban.)</em></li>
<li><a href="http://www.dbms2.com/2013/04/01/new-era-data-management/">Tokutek has planted a small stake there too</a>.</li>
<li>Key-value-store-backed <a href="http://www.dbms2.com/2013/01/12/introduction-to-nuodb/">NuoDB</a> <span style="text-decoration: underline;">and <a href="http://www.dbms2.com/2013/01/07/introduction-to-geniedb/">GenieDB</a></span> probably lean<em>s</em> that way. <em></em>(And <a href="http://www.dbms2.com/2012/06/27/schooner-got-acquired-by-sandisk/">SanDisk</a> evidently shut down Schooner&#8217;s RDBMS while keeping its key-value store.)</li>
<li>VoltDB, Clustrix, ScaleDB and MemSQL seem more strictly tabular, except insofar as text search is a requirement for everybody. (<em>Edit: Oops; I forgot about <a href="http://www.dbms2.com/2012/07/18/clustrix-4/">Clustrix&#8217;s approach to JSON support</a>.)</em></li>
</ul>
<p><em>Edit: MySQL has some sort of an optional NoSQL interface, and hence so presumably do MySQL-compatible TokuDB, GenieDB, Clustrix, and MemSQL.</em></p>
<p>Also, some of those products do not today have the transparent scale-out that Deep plans to offer in the future.</p>
<p><span id="more-7907"></span>Among the 10 people listed as part of <a href="http://deep.is/our-invention/">Deep Information Sciences&#8217; team</a>, I noticed 2 who arguably had DBMS industry experience, in that they worked at virtualization vendor Virtual Iron, and stayed on for a while after Virtual Iron was bought by Oracle. One of them, Chief Scientist &amp; Architect Tom Hazel, also was at Akiban for a few months, where he did actually work on a DBMS. Other Deep Information Sciences notes include:</p>
<ul>
<li>Deep has 25 or so people in all.</li>
<li>Deep had a recent $10 million funding round.</li>
<li>Deep Information Sciences is the former Cloudtree, which as of February, 2011 was pursuing <a href="http://www.boston.com/business/technology/innoeco/2011/02/stealthy_start-up_cloudtree_wa.html">quite a different strategy</a>. (Evidently there was a pivot.) Deep was founded in 2010.</li>
<li>There are 2 paying customers for DeepDB, even though it&#8217;s still in beta, and 8 trials. A similar number of trials and strategic partners are queued up.</li>
<li>DeepDB general availability is expected later this quarter.</li>
</ul>
<p>Although our call was blessedly technical, we didn&#8217;t have a chance to go through the DeepDB architecture in great detail. That said, DeepDB seems to store data in all of 3 ways:</p>
<ul>
<li>An in-memory row store.</li>
<li>An on-disk row store with a very different architecture.</li>
<li>Indexes, which can also serve as a column store.</li>
</ul>
<p>Notes on that include:</p>
<ul>
<li>DeepDB&#8217;s in-memory row store is designed to manage single rows as much as possible, rather than pages. Indeed, there are &#8220;aspects of tries&#8221;, although we didn&#8217;t drill down into what exactly that meant.</li>
<li>Indexes are streamed to disk no less than once every 15 seconds, by default, and perhaps with latency as low as 10 milliseconds.</li>
<li>Perhaps the most important point I didn&#8217;t grasp is &#8220;segments&#8221;. The data and indexes on disk are stored in segments, which can be of different sizes, and which may each carry some summary data/metadata/whatever. Somehow, this is central to DeepDB&#8217;s design.</li>
<li>In what is evidently a design focus, DeepDB tries to get the benefit of &#8220;in-memory data&#8221; that isn&#8217;t actually taking up RAM. B-trees can point at rows that aren&#8217;t actually in memory. Segments evicted from cache can leave some metadata or summary data behind.</li>
<li>DeepDB&#8217;s compression story seems to be a work in progress.
<ul>
<li>There&#8217;s prefix compression already, at least in the indexes, which Deep just calls &#8220;compaction&#8221;.</li>
<li>Other compression is working in the lab, but not scheduled for Version 1.0.
<ul>
<li>Block compression seems to be in play.</li>
<li>Delta compression was mentioned once</li>
<li>Dictionary compression wasn&#8217;t mentioned at all.</li>
</ul>
</li>
<li>DeepDB apparently will keep compressed data in cache, then decompress it to operate on it.</li>
<li>Different segments can be compressed/uncompressed differently.</li>
</ul>
</li>
<li>DeepDB&#8217;s on-disk row store is append-only. Time-travel is being worked on. While I forgot to ask, it seems likely that DeepDB has MVCC (Multi-Version Concurrency Control). <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
<p>And finally: DeepDB in its current form is a &#8220;drop-in&#8221; InnoDB replacement, but not necessarily bug-compatible.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/ErZSKoy_JTE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2013/04/14/deep-information-sciences-deepdb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2013/04/14/deep-information-sciences-deepdb/</feedburner:origLink></item>
	</channel>
</rss>
