<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>DBMS2 -- DataBase Management System Services</title>
	
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Sat, 06 Feb 2010 00:08:25 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/dbms2/feed" /><feedburner:info uri="dbms2/feed" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item>
		<title>The Sybase Aleri RAP</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/z1pzlnCfIwc/</link>
		<comments>http://www.dbms2.com/2010/02/05/sybase-aleri-rap/#comments</comments>
		<pubDate>Sat, 06 Feb 2010 00:05:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aleri and Coral8]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Sybase]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1545</guid>
		<description><![CDATA[Well, I got a quick Sybase/Aleri briefing, along with multiple apologies for not being prebriefed. (Main excuse: News was getting out, which accelerated the announcement.) Nothing badly contradicted my prior post on the Sybase/Aleri deal.
To understand Sybase&#8217;s plans for Aleri and CEP, it helps to understand Sybase&#8217;s current CEP-oriented offering, Sybase RAP. So far as [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Well, I got a quick Sybase/Aleri briefing, along with multiple apologies for not being prebriefed. <em>(Main excuse: News was getting out, which accelerated the announcement.)</em> Nothing badly contradicted my prior post on <a href="http://www.dbms2.com/2010/02/04/sybase-aleri-acquisitio/" >the Sybase/Aleri deal</a>.</p>
<p style="margin-bottom: 0in;">To understand Sybase&#8217;s plans for Aleri and CEP, it helps to understand Sybase&#8217;s current CEP-oriented offering, <strong>Sybase RAP.</strong> So far as I ca<span style="font-weight: normal;">n tell, Sybase RAP has to date only been sold in the form of</span><strong> Sybase RAP: The Trading Edition.</strong> In that guise, Sybase RAP has been sold to &gt;40 outfits since its May, 2008 launch, mainly big names in the investment banking and stock exchange sectors. If I understood correctly, the next target market for Sybase RAP is telcos, for real-time network tuning and management.</p>
<p style="margin-bottom: 0in;">In addition to any domain-specific applications, Sybase RAP has three layers:</p>
<ul>
<li><strong>CEP (Complex Event Processing).</strong> Sybase RAP CEP is based on a version of the Coral8 engine Sybase 	licensed and has been subsequently developing.</li>
<li><strong>In-memory DBMS.</strong> Sybase&#8217;s 	IMDB is part of (but I guess separable from) and has the same API as 	Sybase&#8217;s OLTP DBMS Adaptive Server Enterprise (ASE, aka Sybase 	Classic).</li>
<li><strong>Sybase IQ.</strong> Actually, Sybase 	used the phrase “based on Sybase IQ,” but I&#8217;m guessing it&#8217;s just 	Sybase IQ.</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-1545"></span>In theory, there could be a DBMS other than Sybase IQ, such as Sybase ASE or even Oracle, because Sybase IMDB can talk to a variety of DBMS. I didn&#8217;t get the impression, however, that in practice there were any Sybase RAP installations whose persistent DBMS was anything other than Sybase IQ.</p>
<p style="margin-bottom: 0in;">Aleri had all along had something called Project Ohio, to merge Coral8 with Aleri Classic.  Now Sybase&#8217;s own CEP engineering team is being added to the mix, schedules are being reconsidered and haven&#8217;t been disclosed yet. <em>(If one woman can produce one baby in nine months, how long does it take nine women to produce a baby?) </em>Apparently Sybase has a dozen programmers in the CEP area, plus ~20 more on Sybase RAP, not counting QA, documentation, etc.; that represents a significant bump to the overall Aleri development team.</p>
<p style="margin-bottom: 0in;">Sybase doesn&#8217;t seem to have decided what to do yet with the various <a href="../2008/10/20/coral8-proposes-cep-as-a-bi-data-platform/">business intelligence</a>/real-time OLAP engine products and technologies it is inheriting from Aleri.</p>
<p style="margin-bottom: 0in;">And finally, some metrics:</p>
<ul>
<li>The Sybase/Aleri guys estimate 	that 1/3 of of Aleri&#8217;s customers and even less of its revenue came 	from outside the financial services sector. They did say the 	non-financial-services business was “starting to pick up,” but 	not very convincingly.</li>
<li>Sybase IQ is now up to &gt;1800 	customers, with &gt;200 new ones in 2009.</li>
<li>Sybase IQ indeed has users taking 	in market feeds up to 3 terabytes a day, so it probably  matches 	Vertica in having at least several-hundred-terabyte databases in the 	financial sector.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/z1pzlnCfIwc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/05/sybase-aleri-rap/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/02/05/sybase-aleri-rap/</feedburner:origLink></item>
		<item>
		<title>Quick thoughts on Sybase/Aleri</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/COMMAV29uuA/</link>
		<comments>http://www.dbms2.com/2010/02/04/sybase-aleri-acquisitio/#comments</comments>
		<pubDate>Thu, 04 Feb 2010 16:15:19 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aleri and Coral8]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Sybase]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1530</guid>
		<description><![CDATA[Sybase announced an asset purchase that amounts to a takeover of CEP (Complex Event Processing) Aleri. Perhaps not coincidentally, Sybase already had technology under the hood from Aleri predecessor/acquiree Coral8, for financial services uses (notwithstanding that between Aleri Classic and Coral8, Aleri Classic was the one of the two more focused on financial services). Quick [...]]]></description>
			<content:encoded><![CDATA[<p>Sybase announced an asset purchase that amounts to a takeover of CEP (Complex Event Processing) <a href="http://www.dbms2.com/2009/03/25/aleri-update/" >Aleri</a>. Perhaps not coincidentally, <a href="http://magmasystems.blogspot.com/2009/03/sybase-and-coral8.html" onclick="javascript:pageTracker._trackPageview('/magmasystems.blogspot.com');">Sybase already had technology under the hood from Aleri predecessor/acquiree Coral8</a>, for financial services uses (notwithstanding that between Aleri Classic and Coral8, Aleri Classic was the one of the two more focused on financial services). Quick reactions include:</p>
<ul>
<li>The folks at Sybase still haven&#8217;t figured out when to prebrief me. <em>(Edit: I&#8217;ve been <a href="http://www.dbms2.com/2010/02/05/sybase-aleri-rap/" >briefed</a> subsequently.)</em></li>
<li>Sybase/Aleri is a potentially powerful combination, if they can effectively address the point I just made about <a href="http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/" >integrating disparate latencies</a>. That said, I&#8217;m not expecting a lot, because <a href="http://www.dbms2.com/2009/03/09/independent-cep-vendors-continue-to-flounder/" >the CEP industry always disappoints me</a>.</li>
<li><a href="http://www.dbms2.com/2009/05/13/microsoft-announced-cep-this-week-too/" >Microsoft</a>, <a href="http://www.dbms2.com/2009/05/13/ibm-system-s-infosphere-streams-processing/" >IBM</a>, and (somewhat less clearly) <a href="http://www.dbms2.com/2008/01/16/oracle-bea/" >Oracle</a> are all trying to do CEP inhouse. Sybase is making a good choice in having serious CEP inhouse itself</li>
<li>Surely the main focus and financial justification for the Sybase/Aleri acquisition is the financial services market.</li>
<li>Specifically, I expect the focus of technical integration between Aleri and Sybase&#8217;s DBMS products to start with Sybase IQ.</li>
<li>Coral8 had <a href="http://www.dbms2.com/2008/10/20/coral8-proposes-cep-as-a-bi-data-platform/" >some interesting ideas about how to integrate CEP with OLTP/operational BI</a>, but I&#8217;m not aware that they got much traction.</li>
<li>I bet there are use cases where Sybase tries and fails to sell <span style="text-decoration: line-through;">Adaptive Server</span> SQL Anywhere that CEP would be a better technical fit, but I don&#8217;t immediately see much practical business significance to that observation.</li>
<li>While this deal could easily strengthen the Vertica/StreamBase partnership, I don&#8217;t see any reason why it would lead those two companies to actually merge.</li>
</ul>
<p><em><strong>Related link</strong></em></p>
<ul>
<li><a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >Thinking about analytic latency</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/COMMAV29uuA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/04/sybase-aleri-acquisitio/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/02/04/sybase-aleri-acquisitio/</feedburner:origLink></item>
		<item>
		<title>Open issues in database and analytic technology</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/NpNRwplOxH8/</link>
		<comments>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 22:04:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[RDF and graphs]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1507</guid>
		<description><![CDATA[The last part of my New England Database Summit talk was on open issues in database and analytic technology. This was closely intertwined with the previous section, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">The last part of my <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >New England Database Summit</a> talk was on open issues in database and analytic technology. This was closely intertwined with the <a href="http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/" >previous section</a>, and also relied on a lot that I&#8217;ve posted here. So I&#8217;ll just put up a few notes on that part, with lots of linkage to prior discussion of the same points.<span id="more-1507"></span></p>
<p><!-- 		@page { margin: 0.79in } 		P { margin-bottom: 0.08in } --></p>
<ul>
<li>The most important issue in 	database and analytic technology, in my opinion, isn&#8217;t technological 	at all – rather, it&#8217;s the legal and political steps needed to <a href="http://www.dbms2.com/2010/01/31/data-based-snooping-threat-libert/" > preserve liberty</a> in the face of advancing, intrusive 	technology.</li>
<li>Another important issue for 	society – and this one does involve a lot of technology – is 	scientific number crunching. In particular, <a href="http://www.dbms2.com/2009/10/03/issues-in-scientific-data-management/" >database technology for 	scientific computing</a> needs to be developed much further. I&#8217;ll have 	more to say on all this soon.</li>
<li>More generally, technology needs 	to keep advancing for parallel analytics. Fortunately, it is. Watch 	this space over the next few weeks.</li>
<li>Oracle has said, in effect, that <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" > its most important technological challenge of the decade</a> is getting 	<a href="http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/" >solid-state memory</a> right. I agree.</li>
<li>Data volumes will keep going up, 	up, up. Technology needs to keep evolving accordingly. Much of what 	I write is on that subject.</li>
<li>Data needs to be processed and analyzed at <a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >very 	different latencies</a>. And there&#8217;s much further to go in integrating 	disparate latencies.</li>
<li>Analytic database management in 	the cloud hasn&#8217;t been solved yet, especially for Big Data. Among the 	reasons are the difficulty of moving data into the cloud (unless it 	originated there), the slowness of moving it from node to node in 	shared-nothing architectures (which reduces the elasticity benefit), 	and above all the long and unpredictable latencies of interprocessor 	communication while queries are running (a key subject of discussion 	at the <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >Boston Big Data Summit</a>).</li>
<li>Better business intelligence user 	interfaces are increasingly available. I&#8217;m thinking particularly of 	approaches with buzzwords like <a href="http://www.dbms2.com/2008/08/04/qliktech-qlikview-update/" >visualization/interactive exploration</a> or <a href="http://www.texttechnologies.com/2007/08/03/the-case-for-inxight-awareness-server/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">faceted</a>. But they aren&#8217;t well-integrated into the overall 	analytic stack, as big BI vendors are trailing the smaller ones in 	this regards. (Part of the problem relates to my previous point.)</li>
<li>Application development over text 	search isn&#8217;t in the same league as application development over 	relational DBMS. The choices are mainly XML (e.g., <a href="http://www.texttechnologies.com/2008/04/29/mark-logic-viewed-as-a-different-kind-of-text-search-technology-vendor/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">MarkLogic</a>), SQL 	for text integrated into RDBMS (limited by the weakness of those 	integrations), and something like <a href="http://www.texttechnologies.com/2008/09/20/attivio-update/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">Attivio&#8217;s Java SDK</a>. There&#8217;s a 	major conceptual barrier in building those apps, namely the 	unpredictability of query results. Still, it should be possible to 	do better.</li>
<li>Similarly, text analytics and 	conventional analytics exist well side by side. They can even be in 	the same database and/or dashboard, although in practice that is 	limited by the strong <a href="http://www.texttechnologies.com/2008/10/24/attensity-update-2/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">SaaS focus of text mining vendors and users</a>. But analytic 	integration of them is really hard. Linguistic imprecision is, in my 	opinion, only the #2 reason for this difficulty. The #1 reason is 	that trends detected by text analytics are much less precise than 	trends on tabular data – e.g., a 50% increase in a certain kind of 	complaint may be no more significant than a 5% change in a revenue 	variable.</li>
<li>I&#8217;m increasingly persuaded that <a href="http://www.dbms2.com/2009/08/21/social-network-analysis-aka-relationship-analytics/" > graph analytics</a> can be handled without a graph-centric data model. 	But right now, it isn&#8217;t being handled well at all. Lots more needs 	to be done – although when it is, it will just exacerbate the 	privacy/liberty dangers that so concern me.</li>
</ul>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Flash, other solid-state memory, and disk" href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, other solid-state memory, and disk</a></li>
<li><a title="Interesting trends in database and analytic technology" href="../2010/01/31/trends-database-aanalytic-technology/">Interesting trends in database and analytic technology</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/NpNRwplOxH8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/02/01/open-issues-in-database-and-analytic-technology/</feedburner:origLink></item>
		<item>
		<title>Interesting trends in database and analytic technology</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/jaeghU_J2-4/</link>
		<comments>http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 02:11:17 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1492</guid>
		<description><![CDATA[My project for the day is blogging based on my “Database and analytic technology: State of the union” talk of a few days ago. (I called it that because of when it was given, because it mixed prescriptive and descriptive elements, and because I wanted to call attention to the fact that I cover the [...]]]></description>
			<content:encoded><![CDATA[<p>My project for the day is blogging based on my “<a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >Database and analytic technology: </a><a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >State of the union</a>” talk of a few days ago. (I called it that because of when it was given, because it mixed prescriptive and descriptive elements, and because I wanted to call attention to the fact that I cover the <em>union</em> of database and analytic technologies – the <em>intersection</em> of those two sectors is an area of particular focus, but is far from the whole of my coverage.)</p>
<p>One section covered recent/ongoing/near-future trends that I thought were particularly interesting, including:<span id="more-1492"></span></p>
<p><strong>Simpler database technology,</strong> by which I mean DBMS that are:</p>
<ul>
<li>Easier 	to administer than market-leading systems &#8230;</li>
<li>… even if at the cost of being special-purpose</li>
<li>E.g.,
<ul>
<li>MySQL and older mid-tier RDBMS such as Progress</li>
<li>Many analytic DBMS and appliances, most notably Netezza&#8217;s</li>
</ul>
</li>
</ul>
<p>For general purpose or OLTP uses, I&#8217;m not a big fan of MySQL (not enough progress in making it industrial-strength), PostgreSQL (no good company behind it – I&#8217;m a non-fan of EnterpriseDB), or Ingres (open source or not, it&#8217;s an antiquated system that hasn&#8217;t been invested in as much as Oracle, DB2 or SQL Server).</p>
<p>But I get the impression there are a lot of contenders among small startups, featuring very new architectures for OLTP or general-purpose database management. VoltDB comes to mind. NimbusDB is finally within range of getting funded. Dan Weinreb told me Friday he knows of a bunch of others as well. And that&#8217;s all before we even get into the <a href="http://www.dbms2.com/2009/12/12/legit-nosql-key-value-store/" >NoSQL</a> kind of alternative.</p>
<p><strong>Flexible storage architectures.</strong> That&#8217;s starting out with an emphasis on hybrid columnar, as in the examples of <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/" >Vertica</a> and <a href="http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/" >Greenplum</a>. Oracle (to whom I&#8217;m under no NDA obligation) and other vendors (to whom I am) are going that way as well.</p>
<p><strong>Multi-tier database architectures,</strong> by which I mean at least two things:</p>
<ul>
<li>The database tier/server tier split of Exadata</li>
<li>Hybrid RAM/disk architectures, examples of which include
<ul>
<li>Vertica&#8217;s RAM-based write-optimized store</li>
<li><a href="http://www.dbms2.com/2009/10/18/introduction-to-sensage/" >Sensage&#8217;s CEP-in-the-DBMS</a></li>
<li>This in-memory analytics stuff we keep hearing about from the BI vendors</li>
<li>Any true in-memory/disk hybrid, such as the regrettably sidelined <a href="http://www.dbms2.com/2007/12/21/ibm-acquires-soliddb/" >solidDB</a></li>
<li>Smart thinking by numerous DBMS vendors about optimizing the use of RAM and/or Level 2 cache</li>
</ul>
</li>
</ul>
<p>Netezza is particularly interesting to watch in this regard because it:</p>
<ul>
<li>Had a pretty strict storage/other processing split in prior product generations and &#8230;</li>
<li>… <a href="http://www.dbms2.com/2009/07/30/netezza-new-product-family/" >ditched that in its latest generation</a> …</li>
<li>… which however is focused on optimizing the use of RAM cache</li>
</ul>
<p>Also noteworthy is Petascan, the stealth-mode –and therefore harder to watch right now <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  – company I keep teasing about, which makes a strong case for carrying the database/storage tier split into the flash/solid-state memory technology generation. <a href="../2009/04/20/calpont-update-you-read-it-here-first/">Calpont</a> also has a server/storage tier split, but that&#8217;s of mainly theoretical interest unless and until Calpont actually ships an MPP version of <a href="../2009/11/07/calponts-infinidb/">InfiniDB</a>.</p>
<p><strong>Cheaper parts,</strong> which have of course been a huge trend for decades.<a href="../2010/01/31/flash-pcmsolid-state-memory-disk/"> Solid-state memory</a> will soon conquer the world. Meanwhile, cheaper sensors drive that <a href="../2010/01/17/three-broad-categories-of-data/">machine-generated data</a> I keep talking about.</p>
<p>An ever-better understanding of <strong>scale-out technology,</strong> in several respects, including:</p>
<ul>
<li>Query, notably data movement for MPP DBMS</li>
<li>Update, especially minimalistic DBMS approaches, be they sharded MySQL or more NoSQLish</li>
<li>Number-crunching, especially via MapReduce and/or parallel analytic libraries integrated into DBMS</li>
</ul>
<p>Cool trends I touched on more briefly include:</p>
<ul>
<li>More data being available for analysis. This was a core theme of my <a href="http://www.dbms2.com/2009/07/30/netezza-enzee-universe/" >Enzee Universe keynote speeches</a>; there are also some notes on it in my 	post based on my <a href="http://www.dbms2.com/2009/11/23/boston-big-data-summit-keynote-outline/" >Boston Big Data Summit</a> talk.</li>
<li>More users being served by analytics. Ditto.</li>
<li>Data exploration/visualization, ala QlikView, Spotfire, or Tableau, and also the faceted stuff.</li>
<li>The democratization of data mining. But I&#8217;m not as sure of that one as of the others&#8230;</li>
</ul>
<p>One area I flat-out forgot to mention is <a href="http://www.dbms2.com/2009/06/08/the-future-of-data-marts/" >easy data mart spin-out</a>.</p>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Flash, other solid-state memory, and disk" href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, other solid-state memory, and disk</a></li>
<li><a title="Open issues in database and analytic technology" href="../2010/02/01/open-issues-in-database-and-analytic-technology/">Open issues in database and analytic technology</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/jaeghU_J2-4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/01/31/trends-database-aanalytic-technology/</feedburner:origLink></item>
		<item>
		<title>Flash, other solid-state memory, and disk</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/P5eACHuEw30/</link>
		<comments>http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 22:12:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Michael Stonebraker]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1469</guid>
		<description><![CDATA[If there&#8217;s one subject on which the New England Database Summit changed or at least clarified my thinking,* it&#8217;s future storage technologies. Here&#8217;s what I now think:

Solid-state memory will soon be 	the right storage technology for a large fraction of databases, OLTP and analytic alike. I&#8217;m not sure whether the initial cutoff in 	database size [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">If there&#8217;s one subject on which the <a href="http://www.dbms2.com/2009/11/25/new-england-database-summit-january-28-2010/" >New England Database Summit</a> changed or at least clarified my thinking,* it&#8217;s future storage technologies. Here&#8217;s what I now think:</p>
<ul>
<li><strong>Solid-state memory will soon be 	the right storage technology for a large fraction of databases,</strong> OLTP and analytic alike. I&#8217;m not sure whether the initial cutoff in 	database size is best thought of as terabytes or 10s of terabytes, 	but it&#8217;s in that range. And it will increase over time, for the 	usual cheaper-parts reasons.</li>
<li><strong>That doesn&#8217;t necessarily mean 	flash.</strong> <a href="http://en.wikipedia.org/wiki/Phase-change_memory" onclick="javascript:pageTracker._trackPageview('/en.wikipedia.org');">PCM</a> (Phase-Change Memory) is coming down the pike, with perhaps 100X the 	durability of flash, in terms of the total number of writes it can 	tolerate. On the other hand, PCM has issues in the face of heat. 	More futuristically, IBM is also high on <a href="http://www.almaden.ibm.com/spinaps/research/sd/?racetrack" onclick="javascript:pageTracker._trackPageview('/www.almaden.ibm.com');">magnetic racetrack 	memory</a>. IBM likes the term <em>storage-class memory</em> to 	cover all this &#8212; which I find regrettable, since the acronym SCM is 	way overloaded already. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li><strong>Putting a disk controller in 	front of solid-state memory is really wasteful.</strong> It wreaks havoc 	on I/O rates.</li>
<li><strong>Generic PCIe interfaces don&#8217;t 	suffice either,</strong> in many analytic use cases. Their I/O is better, 	but still not good enough. (Doing better yet is where Petascan – 	the stealth-mode company I keep teasing about – comes in.)</li>
<li><strong>Disk will long be useful for 	very large databases.</strong> Kryder&#8217;s Law, about disk <strong>capacity,</strong> has at 	least as high an annual improvement as Moore&#8217;s Law shows for chip 	capacity, the <a href="http://www.dbms2.com/2010/01/31/the-disk-rotation-speed-bottleneck/" >disk rotation speed bottleneck</a> notwithstanding. Disk 	will long be much cheaper than silicon for data storage. And cheaper 	silicon in sensors will lead to ever more <a href="http://www.dbms2.com/2010/01/17/three-broad-categories-of-data/" >machine-generated data</a> that fills up a lot of disks.</li>
<li><strong>Disk will long be useful for 	archiving.</strong> Disk is the new tape.</li>
</ul>
<p style="margin-bottom: 0in;"><em>*When the first three people to the question microphone include both Mike Stonebraker and Dave DeWitt, your thinking tends to clarify in a hurry.</em></p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li><span style="font-style: normal;"><span style="font-weight: normal;">A 	<a href="http://drona.csa.iisc.ernet.in/%7Egopi/west10/HPCA-WEST-SCMandSoftware.pdf" onclick="javascript:pageTracker._trackPageview('/drona.csa.iisc.ernet.in');">slide 	deck by C. Mohan of IBM</a> similar to the one he presented at the 	NEDB Summit about storage-class memories.</span></span></li>
<li><span style="font-style: normal;"><span style="font-weight: normal;">A 	much more detailed <a href="http://www.usenix.org/events/fast/tutorials/T3.pdf" onclick="javascript:pageTracker._trackPageview('/www.usenix.org');">IBM 	presentation</a> on storage-class memories.</span></span></li>
</ul>
<p><em><strong>Other posts based on my January, 2010 New England Database Summit keynote address</strong></em></p>
<ul>
<li><a title="Data-based snooping — a huge threat to liberty that we’re all helping make worse" href="../2010/01/31/data-based-snooping-threat-libert/">Data-based snooping — a huge threat to liberty that we’re all helping make worse</a></li>
<li><a title="Interesting trends in database and analytic technology" href="../2010/01/31/trends-database-aanalytic-technology/">Interesting trends in database and analytic technology</a></li>
<li><a title="Open issues in database and analytic technology" href="../2010/02/01/open-issues-in-database-and-analytic-technology/">Open issues in database and analytic technology</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/P5eACHuEw30" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/01/31/flash-pcmsolid-state-memory-disk/</feedburner:origLink></item>
		<item>
		<title>The disk rotation speed bottleneck</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/Gmpln0D9VyE/</link>
		<comments>http://www.dbms2.com/2010/01/31/the-disk-rotation-speed-bottleneck/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 22:01:58 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1472</guid>
		<description><![CDATA[I&#8217;ve been referring to the disk (rotation) speed bottleneck for years, but I don&#8217;t really have a clean link for it. Let me fix that right now.
The first hard disks ever were introduced by IBM in 1956. They rotated 1,200 times per minute. Today&#8217;s state-of-the-art disk drives rotate 15,000 times per minute. That&#8217;s a 12.5-fold [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been referring to the <a href="http://www.dbms2.com/2007/03/26/index-light-mpp-data-warehouse-appliances/" >disk (rotation) speed bottleneck</a> for years, but I don&#8217;t really have a clean link for it. Let me fix that right now.</p>
<p>The first hard disks ever were introduced by IBM in 1956. They rotated 1,200 times per minute. Today&#8217;s state-of-the-art disk drives rotate 15,000 times per minute. That&#8217;s <strong>a 12.5-fold improvement since the first term of the Eisenhower Administration. </strong>(I understand that the reason for this slow improvement is aerodynamic &#8212; a disk that spins too fast literally flies off the spindle.)</p>
<p>Unfortunately, random seek time is bounded below, on average, by 1/2 of a disk&#8217;s rotation time. Hence disk seek times can never get below 2 milliseconds.</p>
<p>From that, much about modern analytic DBMS design follows.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/Gmpln0D9VyE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/31/the-disk-rotation-speed-bottleneck/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/01/31/the-disk-rotation-speed-bottleneck/</feedburner:origLink></item>
		<item>
		<title>Data-based snooping — a huge threat to liberty that we’re all helping make worse</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/UpEgdFghnT8/</link>
		<comments>http://www.dbms2.com/2010/01/31/data-based-snooping-threat-libert/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 18:41:07 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Presentations]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1460</guid>
		<description><![CDATA[Every year or two, I get back on my soapbox to say:

Database 	and analytic technology, as they evolve, will pose tremendous danger 	to individual liberties.
We 	in the industry who are creating this problem also have a duty to 	help fix it.
Technological 	solutions alone won&#8217;t suffice. Legal changes are needed.
The 	core of the needed legal changes [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Every year or two, I get back on my soapbox to say:</p>
<ul>
<li>Database 	and analytic technology, as they evolve, will pose tremendous danger 	to individual liberties.</li>
<li>We 	in the industry who are creating this problem also have a duty to 	help fix it.</li>
<li>Technological 	solutions alone won&#8217;t suffice. Legal changes are needed.</li>
<li>The 	core of the needed legal changes are tight restrictions on 	governmental use of data, because relying on restrictions about data acquisition and 	retention clearly won&#8217;t suffice.</li>
</ul>
<p style="margin-bottom: 0in;">But this time I don&#8217;t plan to be so quick to shut up.</p>
<p style="margin-bottom: 0in;">My best writing about the subject of liberty to date is probably in a <a href="http://www.networkworld.com/community/node/35626" onclick="javascript:pageTracker._trackPageview('/www.networkworld.com');">November, 2008 blog post</a>. My best public speaking about the subject was undoubtedly last Thursday, early in my <a href="../2009/11/25/new-england-database-summit-january-28-2010/">New England Database Summit</a> keynote address; I got a lot of favorable feedback on that part from the academics and technologists in attendance.</p>
<p style="margin-bottom: 0in;">My emphasis is on data-based snooping rather than <a href="http://www.texttechnologies.com/category/categorization-filtering/censorship/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">censorship</a>, for several reasons:</p>
<ul>
<li>My work and audience are mainly in the database and analytics sectors. Censorship is more a concern for security, networking, and internet-technology folks.</li>
<li>After censorship, I think data-based snooping is the second-worst technological threat to liberty.</li>
<li>In the US and other fairly free countries, data-based snooping may well be the #1 threat.</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-1460"></span>Below are the actual notes I spoke from. (Not typed out in the notes is the part about how I think there are a lot of precedents in US law for my suggestions, specifically in how the Fourth and Fifth Amendments are handled, but you can find that part in the blog linked above.) My call to action is still pretty vague, along the lines of “Think about this stuff, and talk about it too.” (And please don&#8217;t be as pig-headed as the Slashdotters mentioned <a href="http://www.monashreport.com/2009/01/19/when-law-meets-technology-and-you-can-help/" onclick="javascript:pageTracker._trackPageview('/www.monashreport.com');">here</a>. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  ) Beyond that, simpler and more immediate – well, this is one blog post I&#8217;m REALLY hoping will get a rich comment thread. Please pitch in!</p>
<p style="margin-bottom: 0in;">My notes, with minor edits for blog-posting clarity:</p>
<ul>
<li>Tremendous 	amounts of information are being or can be electronically recorded 	about everybody
<ul>
<li>All 	our credit card purchases</li>
<li>All 	our web access activity
<ul>
<li>For 	at least two different reasons – child porn and internet file 	sharing &#8212; governments around the world are finding reasons to 	mandate that this be stored …</li>
<li>… even 	as governments also sometimes try to mandate that it be deleted</li>
</ul>
</li>
<li>All 	our email and other internet communication information, either in 	full content or at least from a traffic analysis standpoint
<ul>
<li>Many 	of our movements</li>
<li>Electronic 	tollbooths</li>
<li>Location-aware 	mobile devices</li>
<li>Police 	cameras</li>
<li>I 	doubt much of this is retained today except the camera part, but it 	could be</li>
</ul>
</li>
<li>And 	by the way, it is not illegal in the US for the government to put 	cameras or other electronic sensors outside your house to peer in, 	which offers all sorts of other intrusive possibilities as that 	technology improves.</li>
</ul>
</li>
<li>Analysis 	tools are improving in steps
<ul>
<li>Data 	mining</li>
<li>Social 	graph analysis</li>
</ul>
</li>
<li>If 	misused, this technology is an enormous threat to liberty …
<ul>
<li>… and 	we&#8217;re all complicit in it (because we help advance the technology).</li>
</ul>
</li>
<li>So 	it&#8217;s our duty to also work to forestall the threat.</li>
<li>Unfortunately, 	most technological solutions have no hope of working
<ul>
<li>The 	data WILL be retained.</li>
<li>The 	data WILL be possible to correlate and de-anonymize.</li>
<li>Any 	technology that purports otherwise is merely a stopgap at best
<ul>
<li>But 	that doesn&#8217;t meant such stopgap technology isn&#8217;t worth developing 	and deploying</li>
</ul>
</li>
</ul>
</li>
<li>The 	only realistic long-term solution is legal, which is to say 	political
<ul>
<li>If 	regulating information gathering and retention is doomed to failure 	…
<ul>
<li>… regulate 	information use instead
<ul>
<li>Make 	it illegal to admit this kind of information in court</li>
<li>Make 	it illegal to use this kind of information to even track all but the 	most terrorist of criminal suspects</li>
<li>Certainly 	make it illegal to reveal much of this information to the private 	sector, or in many cases for the private sector to use it</li>
</ul>
</li>
</ul>
</li>
<li>Obviously, 	there have to be huge carve-outs for the ordinary business of 	marketing, government, medicine, whatever</li>
<li>But 	that&#8217;s the direction we need to go</li>
<li>And 	I&#8217;m convinced that we in the technology community need to show the 	way</li>
</ul>
</li>
</ul>
<p><em><strong>More technical posts based on the same talk<br />
</strong></em></p>
<ul>
<li><a title="Flash, other solid-state memory, and disk" href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Flash, other solid-state memory, and disk</a></li>
<li><a title="Interesting trends in database and analytic technology" href="../2010/01/31/trends-database-aanalytic-technology/">Interesting trends in database and analytic technology</a></li>
<li><a title="Open issues in database and analytic technology" href="../2010/02/01/open-issues-in-database-and-analytic-technology/">Open issues in database and analytic technology</a></li>
</ul>
<p><em><strong>Related links (updates)</strong></em></p>
<ul>
<li>CNet reports that <a href="http://news.cnet.com/8301-13578_3-10448060-38.html" onclick="javascript:pageTracker._trackPageview('/news.cnet.com');">the FBI is asking internet service providers in the US to keep website access data for 2 years</a></li>
<li>CNN reports that <a href="http://edition.cnn.com/2010/HEALTH/02/04/baby.dna.government/?hpt=C1" onclick="javascript:pageTracker._trackPageview('/edition.cnn.com');">US states are storing DNA for all babies born</a></li>
<li></li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/UpEgdFghnT8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/31/data-based-snooping-threat-libert/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/01/31/data-based-snooping-threat-libert/</feedburner:origLink></item>
		<item>
		<title>Netezza Skimmer</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/XRTeymar6uw/</link>
		<comments>http://www.dbms2.com/2010/01/25/netezza-skimmer/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 14:39:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Pricing]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1441</guid>
		<description><![CDATA[As I previously complained, last week wasn&#8217;t a very convenient time for me to have briefings. So when Netezza emailed to say it would release its new entry-level Skimmer appliance this morning, while I asked for and got a Friday afternoon briefing, I kept it quick and basic.
That said, highlights of my Netezza Skimmer briefing [...]]]></description>
			<content:encoded><![CDATA[<p>As I previously <a href="http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/" >complained</a>, last week wasn&#8217;t a very convenient time for me to have briefings. So when Netezza emailed to say it would release its new entry-level Skimmer appliance this morning, while I asked for and got a Friday afternoon briefing, I kept it quick and basic.</p>
<p>That said, highlights of my Netezza Skimmer briefing included:</p>
<ul>
<li>In essence, Netezza Skimmer is 1/3 of Netezza&#8217;s previously smallest appliance, for 1/3 the price.</li>
<li>I.e., Netezza Skimmer has 1 S-blade and 9 disks, vs. 3 S-blades and 24 disks on the Netezza TwinFin 3.</li>
<li>With 1 disk reserved as a hot spare, that boils down to a 1:1:1 ratio among CPU cores, FPGA cores, and 1-terabyte disks on Netezza skimmer. The same could pretty much be said of Netezza TwinFin, the occasional hot-spare disk notwithstanding.</li>
<li>Netezza Skimmer costs $125K.</li>
<li>With 2.8 or so TB of space for user data before compression, that&#8217;s right in line with the <a href="http://www.dbms2.com/2009/07/30/the-netezza-price-point/" >Netezza price point</a> of slightly &lt;$20K/terabyte of user data.</li>
<li>That assumes Netezza&#8217;s usual 2.25X compression. I forgot to ask when 4X compression was actually being shipped.</li>
<li>I forgot to ask, but it seems obvious that Netezza Skimmer uses identical or substantially similar components to Netezza TwinFin&#8217;s.</li>
<li>Netezza Skimmer is 7 rack units high.</li>
<li>In place of the SMP hosts on TwinFin Systems, Netezza Skimmer has a host blade.</li>
<li>Netezza (specifically Phil Francisco) mentioned that when Kalido uses Netezza Skimmer for its appliance, there will be an additional host computer, but when it uses TwinFin for the same software, the built-in host will suffice. (Even so, I suspect it might be too strong to say that Skimmer&#8217;s built-in host computer is underpowered.)</li>
<li>Netezza also suggested that more appliance OEMs are coming down the pike specifically focused on the affordable Skimmer.</li>
</ul>
<p><span id="more-1441"></span>Obviously, Netezza Skimmer isn&#8217;t breaking any new technical ground. If Netezza had just called Skimmer &#8220;TwinFin 1,&#8221; nobody should have objected. So the main news here is that you can buy a Netezza box for $125K, plug it in, load a few terabytes of data, and be good to go with a pretty solid data warehouse.  For enterprises and data mart outsourcers with databases of the appropriate size, that could be a pretty attractive deal.</p>
<p>Is Netezza Skimmer as cheap as buying your own hardware and putting (free) <a href="http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/" >Greenplum Single-Node Edition</a> software on it? Not even close, especially since Greenplum&#8217;s free option limits you to lower overall compute power. Does Netezza Skimmer have as high availability as more expensive alternatives? In some cases, surely not. Skimmer is neither the cheapest thing around nor an utterly high-end product.</p>
<p>But Netezza Skimmer belongs on a lot of short lists even so.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/XRTeymar6uw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/25/netezza-skimmer/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/01/25/netezza-skimmer/</feedburner:origLink></item>
		<item>
		<title>Two cornerstones of Oracle’s database hardware strategy</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/UFA2s-U-29E/</link>
		<comments>http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 08:59:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cache]]></category>
		<category><![CDATA[DBMS product categories]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1429</guid>
		<description><![CDATA[After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:

Oracle      thinks flash memory is the most important hardware technology of the [...]]]></description>
			<content:encoded><![CDATA[<p>After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:</p>
<ul>
<li>Oracle      thinks <strong>flash memory is the most important hardware technology of the      decade,</strong> one that could lead to Oracle being “bumped off” if they don’t      get it right.</li>
<li>Juan      believes <strong>the “bulk” of Oracle’s business will move over to Exadata-like      technology over the next 5-10 years. </strong>Numbers-wise, this seems to be based more      on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database      management tasks.</li>
</ul>
<p>And by the way, Oracle doesn’t make its storage-tier software available to run on anything than Oracle-designed boxes.  At the moment, that means Exadata Versions 1 and 2. Since Exadata is by far Oracle’s best DBMS offering (at least in theory), that means <strong>Oracle’s best database offering only runs on specific Oracle-sold hardware platforms.<span id="more-1429"></span></strong> <em></em></p>
<p><em>*E.g., I was sitting upstairs in my parents’ apartment in </em><em>Columbus</em><em>, </em><em>OH</em><em> having the call while their doctor, who I’ve never met, was visiting downstairs. He offered to make a special trip back Saturday afternoon because he missed me Wednesday, but he’s notorious for not coming when he says he will.</em> <em>Update: He didn&#8217;t come Saturday. On Saturday he said he&#8217;d come Sunday. He didn&#8217;t do that either. </em></p>
<p>Other high- and lowlights of our conversation included:</p>
<ul>
<li>Flash      is the main new hardware element in Exadata Version 2. Otherwise, Exadata      2 is just an annual refresh of Exadata Version 1 to include updated      components (Nehalem chips, bigger disk drives, etc.)</li>
<li>Juan      thinks it’s suboptimal to use flash memory through the bottleneck of disk      controllers, favoring PCIe cards instead. (I emphatically agree.)</li>
<li>Juan      resolutely ducked questions about <a href="../../../../../2009/09/25/the-hunt-for-oracle-exadata-production-references/">actual      Exadata production deployment</a>. Literally the only fact he shared in      that regard is that there are at least 2 Exadata production systems      running that each have 2 or more racks cabled together.</li>
<li>Juan      stressed that Exadata runs apps written over Oracle DBMS unchanged.</li>
<li>When      making mixed-workload claims for Exadata 2, Juan stressed consolidation of      multiple databases, some OLTP and some analytic. He didn’t really argue      with my skepticism about <a href="../../../../../2009/09/29/integration-oltp-data-warehousing-exadata-2/">integrating      OLTP and analytics in the same database</a>, with one exception:</li>
<li>Juan      pointed out that in major OLTP apps such as ERP systems, there often is      actually more processing going on in reporting and other batch stuff than      there is in true OLTP.</li>
<li>Exadata      2’s flash memory is designed as a disk cache, smarter than LRU (Least      Recently Used). The two examples Juan gave of “smarter than LRU” are that      backups and table scans don’t flush the cache.</li>
<li>I      forget whether this is new in Exadata 2 (I think it is), but anyhow –      Exadata has a “Storage Index” that’s a lot like a <a href="../../../../../2006/09/20/netezza-vs-conventional-data-warehousing-rdbms/">Netezza      zone map</a>. I.e., for each megabyte or so of data it stores the min and      max value of every column; if a query predicate rules out those ranges,      that megabyte is never retrieved.</li>
<li>Oracle      has long offered what sounds like flexible workload management capability,      and this has now been extended to specifically include I/O resources on      the storage tier.</li>
<li>This      isn’t Exadata-specific, but Oracle has built a file system on top of its      DBMS, optimized for speed, which helps with, e.g., ELT      (Extract/Load/Transform). Evidently, it’s not at all the same thing as      Mark Benioff’s 1990s Microsoft-annoying IFS (Internet File System)      project, which seems to have morphed into a content management SDK.</li>
</ul>
<p>Highlights specifically in the area of parallelization included:</p>
<ul>
<li>Juan      stressed that all databases consolidated onto an Exadata machine      are/should be striped across all storage units.</li>
<li>On the      other hand, Juan said that different databases should be confined to      specific cores or CPUs on the database tier.</li>
<li>But on      the third hand, Juan also stressed – in what could be called a “private      cloud” pitch – that there’s great elasticity as to which databases are      matched to which server CPUs.</li>
<li>Contrary      to what <a href="../../../../../2008/09/28/exadata-oracle-database-machine-parallelization/">I      thought he and/or his colleagues told me a year ago</a>, Juan said RAC      (Real Application Clusters) is a big part of Oracle’s data warehouse      processing.</li>
<li>However,      Juan says that what I regard(ed) as a major objection to Oracle’s      database-tier parallelization &#8212; the need to manually specify “degrees of      parallelism” &#8212; has now been obviated by automation. Juan thinks that few      data warehouse DBAs will now need to manually tune parallelism, with minor      exceptions. One exception he cites is that if a nightly report really is      non-urgent, it can just be forced to run on a single core with no chance      to grab more resources. (However, Juan thinks manual tuning of parallelism      will continue to play a greater role in OLTP.)</li>
</ul>
<p>OK. That’s all I can get done tonight (see above re: inconvenience of timing). Follow-on subjects I’d like to and indeed plan to post about include:</p>
<ul>
<li>What      Juan said about hybrid columnar compression</li>
<li>Oracle’s      delightfully non-confidential slide deck, and a few comments about same</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/UFA2s-U-29E" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/</feedburner:origLink></item>
		<item>
		<title>Three broad categories of data</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/Xnh_Ptwkb_g/</link>
		<comments>http://www.dbms2.com/2010/01/17/three-broad-categories-of-data/#comments</comments>
		<pubDate>Sun, 17 Jan 2010 15:31:24 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Database diversity]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1421</guid>
		<description><![CDATA[People often try to draw a distinction between:

Traditional data of the sort 	that&#8217;s stored in relational databases, aka “structured.”
Everything else, aka 	“unstructured” or “semi-structured” or “complex.”

There are plenty of problems with these formulations, not the least of which is that the supposedly “unstructured” data is the kind that actually tends to have interesting internal structures. [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">People often try to draw a distinction between:</p>
<ul>
<li>Traditional data of the sort 	that&#8217;s stored in relational databases, aka “structured.”</li>
<li>Everything else, aka 	“unstructured” or “semi-structured” or “complex.”</li>
</ul>
<p style="margin-bottom: 0in;">There are plenty of problems with these formulations, not the least of which is that the supposedly “unstructured” data is the kind that actually tends to have interesting internal structures. But of the many reasons why these distinctions don&#8217;t tend to work very well, I think the most important one is that:</p>
<p><strong>Databases shouldn&#8217;t be divided into just two categories. </strong><span style="font-weight: normal;"> Even as a rough-cut approximation, </span><strong>they should be divided into three,</strong><span style="font-weight: normal;"> namely:</span></p>
<ul>
<li><strong>Human/Tabular</strong> data &#8211;i.e., human-generated data that fits well 	into relational tables or arrays</li>
<li><strong>Human/Nontabular</strong> data &#8212; i.e., all other data generated by humans</li>
<li><strong>Machine-Generated</strong> data</li>
</ul>
<p style="margin-bottom: 0in;">Even that trichotomy is grossly oversimplified, for reasons such as:</p>
<ul>
<li>These categories overlap.</li>
<li>There are kinds of data that get 	into fuzzy border zones.</li>
<li>Not all data in each category has 	all the same properties.</li>
</ul>
<p style="margin-bottom: 0in;">But at least as a starting point, I think this basic categorization has some value.<span id="more-1421"></span></p>
<p style="margin-bottom: 0in;">By <strong>human-generated data that fits well into relational tables or arrays,</strong> what I really mean is: <strong>the input from most conventional kinds transactions</strong> – purchase/sale, inventory/manufacturing, employment status change, etc. This is the core data managed by OLTP relational DBMS everywhere. It is also the core data in analytic relational or MOLAP databases. The vast majority of what we think or know about “database management” applies primarily to data of this kind, in large part because of two fundamental properties of this information:</p>
<ul>
<li>It is meaningful to contemplate 	this data as being 100% accurate and complete (even if that goal is 	difficult to achieve in the real world).</li>
<li>This data is precise – i.e., one 	can check predicates against it and (give or take regrettable data 	imperfections) get inarguable yes/no answers.</li>
</ul>
<p style="margin-bottom: 0in;">For most enterprises, this is the most important data they have. It was created as a result of expensive business activities. It deals directly with money, employees, physical goods, and the rest of the things that make an enterprise go. It can be fruitfully analyzed in ever more ways, which is why it should never be thrown out or even entirely relegated to tape, now that data warehouse software, hardware, and storage has become so cheap. (“Disk is the new tape.”) And because of the importance of both preserving and accessing it, it should often be stored in multiple copies – OLTP, data warehouse, data mart, in-memory analytics, near-line quasi-archive, MOLAP cubes (if you must) and so on, plus of course replicas for high throughput and availability.</p>
<p style="margin-bottom: 0in;">But <strong>humans generate many other kinds of data as well,</strong> especially in a form directly suitable for <strong>communication</strong> – text (in many formats), documents (text or otherwise), pictures, videos, etc. <a href="../2005/12/09/relational-dbms-versus-text-data/">Traditional relational databases are a poor home for this kind of data</a> because:</p>
<ul>
<li>This data often deals with 	opinions or aesthetic judgments – there is little concept of 	perfect accuracy.</li>
<li>Similarly, there is little concept 	of perfect completeness.</li>
<li>There&#8217;s also little concept of 	perfectly, unarguably accurate query results – different people 	will have different opinions as to what comprises good results for a 	search.</li>
<li>Queries don&#8217;t lend themselves to 	binary answers; rather, documents can have differing degrees of 	relevancy.</li>
</ul>
<p style="margin-bottom: 0in;">Systems for managing this kind of data are much less advanced than relational database managers. Nobody knows how to get all the information out of a text document, or query all of it if they could, and the story is even worse for non-text examples. The systems that give the best query results aren&#8217;t necessarily the same ones that have the best database administration features. Basically, this area is still a mess, and it&#8217;s a mess that consumes a huge fraction of all the data storage products sold today.</p>
<p style="margin-bottom: 0in;">But give or take questions of storage efficiency and deduplication, if humans created that kind of data, they put a lot of effort into it, so it&#8217;s worth keeping. Besides, compliance regulations commonly mandate that we do so – except, perhaps, when they mandate that we throw it away.</p>
<p style="margin-bottom: 0in;"><strong>Machine-generated data</strong> is a whole other can of worms. Paradigmatic examples of what I mean by “machine-generated data” include:</p>
<ul>
<li>Computer, network, and other 	equipment logs</li>
<li>Satellite and similar telemetry 	(whether for espionage or science)</li>
<li>Location data such as RFID chip 	readings, GPS system output, etc.</li>
<li>Temperature and other 	environmental sensor readings</li>
<li>Sensor readings from factories, 	pipelines, etc.</li>
<li>Output from many kinds of medical 	device, in hospitals and (increasingly) homes alike</li>
</ul>
<p style="margin-bottom: 0in;">Unlike human-generated data, whose growth is constrained by macro factors such as population and total level of economic activity, <strong>machine-generated data will continue to grow as fast as Moore&#8217;s Law lets it. </strong><span style="font-weight: normal;">That fact has two profound consequences:</span></p>
<ul>
<li><strong>It is unrealistic to hope ever 	to keep most or all machine-generated data,</strong><span style="font-weight: normal;"> whereas I think that&#8217;s exactly what should and will happen with human-generated data</span></li>
<li><span style="font-weight: normal;">Before 	long, </span><strong>most data (by volume) will be machine-generated</strong></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-weight: normal;">And so it is not really an exaggeration to say that <strong>machine-generated data is the future of data management.</strong></span></p>
<p style="margin-bottom: 0in;"><span style="font-weight: normal;">I&#8217;d like to close this long post by immediately pointing out some of the flows in this simple trichotomy. One obvious gray area lies in<strong> hybrid human/machine-generated data,</strong> three big examples of which are:</span></p>
<ul>
<li><span style="font-weight: normal;">Web 	clickstreams</span></li>
<li><span style="font-weight: normal;">Call 	detail records (CDR)</span></li>
<li><span style="font-weight: normal;">Stock 	trades</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-weight: normal;">In all three cases, we are quickly getting to the point where this data is preserved in its entirety (even if the network event data associated with the web logs is reduced before storage). And in each case it fits pretty well into RDBMS, although Hadoop has a role to play as well. So pretending it&#8217;s purely human-generated probably isn&#8217;t all that misleading.<br />
</span>
</p>
<p style="margin-bottom: 0in;"><span style="font-weight: normal;">Another gray area lies in text that gets linguistically processed – i.e. via <a href="http://www.texttechnologies.com/2007/12/23/text-mining-myths-realities/" onclick="javascript:pageTracker._trackPageview('/www.texttechnologies.com');">text-mining</a> tools – with the output placed into a relational database. I don&#8217;t immediately see a workaround for that flaw in my labeling scheme.  So let&#8217;s just say no taxonomy is perfect.*</span></p>
<p style="margin-bottom: 0in;"><em><span style="font-weight: normal;">*Come to think of it, that&#8217;s one of the problems holding back text-mining technology. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </span></em></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;"><span style="font-weight: normal;">And of course some of the <a href="../2009/12/12/legit-nosql-key-value-store/">NoSQL</a> folks would note that I was oversimplifying when I tied my first category specifically to relational DBMS. So would the folks at <a href="../2010/01/15/intersystems-cache-highlights/">Intersystems</a>.</span></span></p>
<p style="margin-bottom: 0in; font-style: normal; font-weight: normal;">But the biggest oversimplification stems from this:</p>
<p style="margin-bottom: 0in;"><span style="font-weight: normal;">As Mike Stonebraker* and I argued a couple of years ago, I really <a href="../2008/04/10/my-own-data-management-software-taxonomy/">think that database management technologies should be divided into 10+ categories.</a> </span></p>
<p style="margin-bottom: 0in;"><em><span style="font-weight: normal;">*Note: The links to Stonebraker&#8217;s own posts will be broken until Vertica&#8217;s webmaster gets his/her act together. But you can find them under other URLs via web search.)</span></em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/Xnh_Ptwkb_g" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/01/17/three-broad-categories-of-data/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2010/01/17/three-broad-categories-of-data/</feedburner:origLink></item>
	</channel>
</rss><!-- Dynamic page generated in 0.376 seconds. --><!-- Cached page generated by WP-Super-Cache on 2010-02-06 18:58:50 -->
