<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>DBMS 2 : DataBase Management System Services</title>
	
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Wed, 08 Feb 2012 22:51:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/dbms2/feed" /><feedburner:info uri="dbms2/feed" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Comments on SAS</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/yGDdJUSTQTY/</link>
		<comments>http://www.dbms2.com/2012/02/08/comments-on-sas/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 22:51:11 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[KXEN]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[SAS Institute]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5939</guid>
		<description><![CDATA[A reporter interviewed me via IM about how CIOs should view SAS Institute and its products. Naturally, I have edited my comments (lightly) into a blog post. They turned out to be clustered into three groups, as follows: SAS faces a number of challenges, not unlike those faced by other high-priced legacy technology vendors. It [...]]]></description>
			<content:encoded><![CDATA[<p>A reporter interviewed me via IM about how CIOs should view SAS Institute and its products. Naturally, I have edited my comments (lightly) into a blog post. They turned out to be clustered into three groups, as follows:</p>
<ul>
<li>SAS faces a number of challenges, not unlike those faced by other high-priced legacy technology vendors.
<ul>
<li>It is used by organizations who have large budgets to pay for the product and to pay people to be expert on the product&#8217;s intricacies.</li>
<li>SAS has not integrated with scale-out analytic DBMS technologies as well or quickly as had been hoped, or as earlier marketing suggested was likely.</li>
<li>SAS has not been strong in helping its users do <a href="http://www.dbms2.com/2011/11/28/agile-predictive-analytics-the-easy-parts/">agile predictive analytics</a>.</li>
</ul>
</li>
<li>SAS&#8217; strengths are concentrated in product breadth:
<ul>
<li>Lots of statistical algorithms.</li>
<li>Various vertical products that make the modeling techniques more accessible in specific application domains.</li>
<li><a href="http://www.dbms2.com/2011/04/21/sas-hpa-does-make-sense-after-all/">Various approaches to engineering for scalability</a> &#8212; no one of those has been a table-thumping success to date, but SAS has the resources to keep trying.</li>
<li>Some level of integration with its own business intelligence and text analytics products.</li>
</ul>
</li>
<li>For any particular use case, the burden of proof is on SAS alternatives to show that they have enough pieces in the toolkit to meet the needs.
<ul>
<li>SPSS (now owned by IBM) also has legacy issues.</li>
<li>KXEN is focused on marketing use cases.</li>
<li>Mahout has been one of the less successful Hadoop-related open source projects.</li>
<li>R-based technology is still maturing.</li>
<li>The modeling capabilities (as opposed to just scoring) bundled into RDBMS and well-parallelized tend to be pretty limited. Apparent exceptions tend to just be R repackaged.</li>
</ul>
</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/yGDdJUSTQTY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/08/comments-on-sas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2012/02/08/comments-on-sas/</feedburner:origLink></item>
		<item>
		<title>Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/RGu7HMGkLRY/</link>
		<comments>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 17:17:32 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehouse appliances]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Exasol]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Kognitio]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[illuminate Solutions]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5926</guid>
		<description><![CDATA[This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying: In general, I regard Gartner Magic [...]]]></description>
			<content:encoded><![CDATA[<p>This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the <a href="http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/">2010</a>, <a href="../../../../../2010/02/10/gartner-magic-quadrant-data-warehouse-2009-2010/">2009</a>, <a href="../../../../../2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">2008</a>, <a href="../../../../../2007/10/19/gartner-2007-magic-quadrant-for-data-warehouse-database-management-systems/">2007</a>, and <a href="../../../../../2006/10/03/vendor-segmentation-for-data-warehouse-dbms/">2006</a> Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:</p>
<ul>
<li>In general, I regard Gartner Magic Quadrants as a bad use of good research.</li>
<li>Illustrating the uselessness of &#8212; or at least poor execution on &#8212; the  overall quadrant metaphor, a large majority of the vendors covered are  lined up near the line x = y, each outpacing the one below in both of  the quadrant&#8217;s dimensions.</li>
<li>I find fewer specifics to disagree with in this Gartner Magic Quadrant than in previous year&#8217;s versions. Two factors jump to mind as possible reasons:
<ul>
<li>This year&#8217;s Gartner Magic Quadrant for Data Warehouse Database Management Systems is somewhat less ambitious than others; while it gives as much company detail as its predecessors, it doesn&#8217;t add as much discussion of overall trends. So there&#8217;s less to (potentially) disagree with.</li>
<li><a href="http://www.dbms2.com/2010/12/28/evolving-definitions-and-technology-categories-for-2011/">Merv Adrian is now at Gartner</a>.</li>
</ul>
</li>
<li>Whatever the problems may be with Gartner&#8217;s approach, the whole thing comes out better than do <a href="http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/">Forrester&#8217;s failed imitations</a>.</li>
</ul>
<p><em>*At the time of this posting, I don&#8217;t yet have a link. However, I expect that to change quickly, and I plan to edit this paragraph accordingly. If nothing else, I hope people will drop links into the comment thread. </em></p>
<p>Specific company comments, roughly in line with Gartner&#8217;s rough single-dimensional rank ordering, include: <span id="more-5926"></span></p>
<ul>
<li>The Gartner Magic Quadrant&#8217;s comments on Teradata seem pretty fair. I don&#8217;t think I&#8217;m much in disagreement when I say:
<ul>
<li>Teradata has the richest, most mature analytic DBMS offering.</li>
<li>Teradata has an outstanding track record both for <a href="http://www.dbms2.com/2011/09/24/confusion-about-teradatas-big-customers/">managing large data volumes</a> and for high-concurrency mixed workloads.</li>
<li>Aster Data was a cool Teradata acquisition, even if Teradata/Aster synergies or integration have been nominal to date.</li>
<li>Teradata still needs to get out of its own way in marketing, positioning, packaging, and/or defining its premium-priced system vs. its more moderately-priced alternatives. Indeed, as necessary as this approach may have been to fending off encroachments by Netezza and others, what Teradata really needs to do is evolve to a more pick-your-own-node-combination mix-match kind of offering.</li>
</ul>
</li>
<li>Gartner has talked with a lot of Oracle Exadata users who say that the product works; Gartner has also stopped beating Oracle up for <a href="http://www.dbms2.com/2010/06/14/best-practices-analytic-database-poc/">its previous policy of almost never doing onsite POCs (Proofs of Concept)</a>; both parts of that ring true with me. But Gartner also rightly dings Oracle for various issues in cost and cumbersomeness. Overall, while I agree there are organizations for which Oracle should indeed be a top-ranked choice, there are many others who shouldn&#8217;t put Oracle on their short list.</li>
<li>Third in the Gartner MQ rankings is IBM.
<ul>
<li>Gartner gets so caught up in reciting the names of various IBM product offerings that it neglects to say much good about DB2 itself. (I tend to have a similar problem.)</li>
<li>But Gartner does mention concurrency as a strength. I agree, especially if we presume that that was a reference to DB2 rather than Netezza.</li>
<li>Gartner cites Netezza&#8217;s post-acquisition annual growth rate as 30%. Gartner seems to think this is a good number. I disagree, but in Netezza&#8217;s defense, it has had to endure IBM&#8217;s post-acquisition on-boarding process.</li>
</ul>
</li>
<li>Arguably fourth in the Gartner Data Warehouse Magic Quadrant rankings is EMC/Greenplum.
<ul>
<li>In general, Gartner likes the taste of Greenplum Kool-Aid.</li>
<li>Gartner neglects to ding Greenplum for concurrency challenges, which I view as an oversight given Gartner&#8217;s general stress on that area.</li>
<li>Gartner does ding Greenplum for support challenges.</li>
<li>Gartner neglects to praise Greenplum for true <a href="http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/">hybrid row/columnar data management</a>, a feature shared by <a href="http://www.dbms2.com/2011/09/22/teradata-columnar-compression/">Teradata</a> and <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/">Vertica</a>, among others, but not by <a href="http://www.dbms2.com/2011/02/06/columnar-compression-database-storage/">Oracle</a>, DB2, or Netezza.</li>
<li>Gartner located a half-petabyte Greenplum database. This doesn&#8217;t surprise me, even though Greenplum has frequently made exaggerated claims about large-size database successes in the past.</li>
<li>Gartner reports a &gt;400 figure for Greenplum customers, which is plausible.</li>
</ul>
</li>
<li>In its first deviation from strict one-dimensional rank ordering, the Gartner Magic Quadrant ranks Sybase ahead of Greenplum in completeness of vision but behind in &#8220;ability to execute&#8221;.
<ul>
<li>If that were the other way around, it might make more sense. Greenplum promises anything and everything you might ever want for analytic data management or the associated analysis; but Sybase has vastly more analytic DBMS users than Greenplum does, running a variety of demanding workloads.</li>
<li>Gartner appears to think that Sybase IQ requires less database administration than I do.</li>
<li>Gartner seems concerned that SAP will position HANA and Sybase ASE as, between them, the only DBMS you&#8217;ll ever need, casting doubt on Sybase IQ&#8217;s future. I wouldn&#8217;t worry about that if you have a problem you want to solve today.</li>
</ul>
</li>
<li>The Gartner Magic Quadrant for Data Warehouse Database Management Systems ranks Microsoft sixth overall, despite noting that there isn&#8217;t a single production reference for Microsoft&#8217;s Parallel Data Warehouse. In support of this ranking, it for example cites the compression feature, which distinguishes Microsoft SQL Server from no other product on the list except Kognitio. If you have such an undemanding data warehousing problem that many different analytic DBMS could meet your needs, there&#8217;s a good chance Microsoft SQL Server can also do the job; and if you&#8217;ve bought into the Microsoft technology stack, you might as well keep going down that path. Otherwise, I don&#8217;t know why somebody should adopt Microsoft&#8217;s offering at this time.</li>
<li>Seventh along the main diagonal path in the Gartner Magic Quadrant is HP Vertica. I&#8217;d rank Vertica higher than that, but in fairness I note two execution concerns. First, HP has a lousy track record, both in acquisitions and in data warehousing/analytics. Second, Vertica is bad about answering my email. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Anyhow, Gartner doesn&#8217;t seem to have given Vertica credit either for <a href="http://www.dbms2.com/2011/06/20/columnar-dbms-vendor-customer-metrics/">its full customer count or for the multiple petabyte-scale databases Vertica runs</a>.</li>
<li>1010data is an outlier, with Gartner noting that it only partly fits in with other &#8220;Data Warehousing Database Management&#8221; companies, and hence kind of confessing that 1010data on the Magic Quadrant is somewhat arbitrary. Stuff like that is bound to happen, given <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">the inherent difficulties of defining market categories</a>. Anyhow, my thoughts on 1010data include:
<ul>
<li>I&#8217;m nervous about the fact that 1010data doesn&#8217;t actually control its own DBMS technology, but rather relies on old code from the small private company KX Systems.</li>
</ul>
<ul>
<li> There are three main reasons to consider 1010data:
<ul>
<li>You want to enter the data mart outsourcing business in a casual way, and you like its SaaS offering.</li>
<li>You want to engage in <a href="http://www.dbms2.com/2010/05/15/stakeholder-facing-analytics/">stakeholder-facing analytics</a> in a casual way, and you like its SaaS offering.</li>
<li>You love 1010data&#8217;s particular set of interactive analytic features and performance.</li>
</ul>
</li>
</ul>
</li>
<li>Back to the main path winding along the Gartner Magic Quadrant main diagonal &#8212; next up is ParAccel. While I question some of the peripheral comments, I agree with Gartner&#8217;s core messages that:
<ul>
<li>ParAccel, the product, is blazingly fast in certain use cases.</li>
<li>ParAccel, the company, is dangerously small.</li>
</ul>
</li>
<li>Eighth on the Gartner MQ&#8217;s main path is Kognitio. This is too high. Kognitio positions itself as offering in-memory DBMS, yet stubbornly refuses to do any kind of data compression. That&#8217;s an awful combination of choices. As for using Kognitio&#8217;s data warehousing SaaS offering &#8212; why would you do that, when more modern products are available on a SaaS/cloud basis as well?</li>
<li>Ninth in the Gartner Magic Quadrant main rankings is SAND.
<ul>
<li>The SAND section is not a triumph of Gartner accuracy. For example:
<ul>
<li><a href="http://www.dbms2.com/2011/11/12/clarifying-sands-customer-metrics-positioning-and-technical-story/">Gartner completely missed the errors in SAND&#8217;s reported customer counts</a>.</li>
<li>Gartner refers to SAND as being &#8220;in existence for approximately nine years&#8221;, which is too low by at least a factor of 2.</li>
<li>Gartner says &#8220;SAND is a privately held company&#8221;, even though <a href="http://itmarketstrategy.com/2009/06/07/sand-technology-a-risky-bet/">Merv knows better than that</a>.</li>
</ul>
</li>
<li>Otherwise, Gartner&#8217;s opinion on SAND seems to boil down to &#8220;Interesting technology and ideas, but dangerously small company.&#8221; I agree.</li>
</ul>
</li>
<li>Tenth and too low in the Gartner MQ main rankings is Infobright.
<ul>
<li>At least by some metrics (e.g. customer count), Infobright isn&#8217;t as dangerously small as ParAccel, SAND, Kognitio, et al.</li>
<li>That said, Infobright is small and focused on <a href="http://www.dbms2.com/2010/12/30/examples-and-definition-of-machine-generated-data/">machine-generated data</a>. So I wouldn&#8217;t be confident in Infobright&#8217;s future technology path for human-generated data use cases.</li>
<li>Infobright&#8217;s performance is uneven &#8212; blazing in cases where the Knowledge Grid helps, but not necessarily stellar by analytic DBMS standards when full table scans are called for.</li>
<li>I agree with Gartner that the possibility of Oracle/MySQL future shenanigans is a concern. But while the energy behind MySQL forking efforts doesn&#8217;t seem too great right now, I&#8217;d expect them to revive and offer a successful escape path if it seemed Oracle was going to indeed play hardball.</li>
<li>Also, given that it&#8217;s already an open source vendor, there are various kinds of assurances Infobright could give that would also help alleviate customer concerns.</li>
</ul>
</li>
<li>Actian, formerly Ingres, took a big tumble in Gartner&#8217;s rankings versus last year, when I simply wrote &#8220;<a href="http://www.dbms2.com/2011/02/05/gartner-magic-quadrant-data-warehouse-database-management-2010/">What Gartner said in connection with <strong>Ingres</strong> is too inaccurate to deserve detailed attention</a>.&#8221; I&#8217;m even a little harsher about <a href="http://www.dbms2.com/2011/09/25/ingres-actian/">Ingres/Actian&#8217;s DBMS products and prospects</a> than Gartner is, but at least now we&#8217;re in the same ballpark.</li>
<li>Along with Infobright, ParAccel, and SAND, <a href="http://www.dbms2.com/2011/11/12/exasol-update/">Exasol</a> appears to be another of the &#8220;good columnar technology/small company&#8221; crowd. As with other such products, one should be careful about fit-and-finish features that are missing today, as there is no assurance they&#8217;ll be added in a timely manner going forward.</li>
<li>illuminate Solutions, which was on last year&#8217;s Gartner list, <a href="http://www.dbms2.com/2012/01/16/has-illuminate-solutions-joined-the-choir-invisible/">now appears to be an ex-company</a>.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/RGu7HMGkLRY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2012/02/08/gartner-magic-quadrant-data-warehouse-2011-2012/</feedburner:origLink></item>
		<item>
		<title>Hadoop-related market categorization</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/UonOkqo65dE/</link>
		<comments>http://www.dbms2.com/2012/02/07/hadoop-related-market-categorization/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 06:49:30 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5914</guid>
		<description><![CDATA[I wasn&#8217;t the only one to be dubious about Forrester Research&#8217;s Hadoop taxonomy (or lack thereof). GigaOm&#8217;s Derrick Harris was as well, and offered a much superior approach of his own. In Derrick&#8217;s view, there&#8217;s Hadoop, Hadoop distributions, Hadoop management, and Hadoop applications. Taking those out of order, and recalling that no market categorization is [...]]]></description>
			<content:encoded><![CDATA[<p>I wasn&#8217;t the only one to be <a href="http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/">dubious about Forrester Research&#8217;s Hadoop taxonomy</a> (or lack thereof). GigaOm&#8217;s Derrick Harris was as well, and offered <a href="http://gigaom.com/cloud/what-it-really-means-when-someone-says-hadoop/">a much superior approach of his own</a>. In Derrick&#8217;s view, there&#8217;s Hadoop, Hadoop distributions, Hadoop management, and Hadoop applications. Taking those out of order, and recalling that <a href="http://www.strategicmessaging.com/no-market-categorization-is-ever-precise/2011/03/01/">no market categorization is ever precise</a>:</p>
<ul>
<li>&#8220;Hadoop applications&#8221; is a catch-all category. Since Derrick offered suitable caveats around the label, I&#8217;m fine with what he said.</li>
<li>Hadoop management software commonly comes in the form of suites. Derrick&#8217;s discussion was solid.</li>
<li>Derrick seems to want to define &#8220;Hadoop&#8221; as being whatever is in the relevant Apache projects. Cool. He does seem to wind up on both sides of the &#8220;MapR and DataStax put Hadoop MapReduce on top of something that isn&#8217;t HDFS &#8212; so is that Hadoop or isn&#8217;t it?&#8221; question, but that&#8217;s a tough ambiguity to avoid.</li>
<li>Derrick could have been a little clearer on the subject of Hadoop distributions.</li>
</ul>
<p>Let&#8217;s drill down into that last one. Derrick refers to Hadoop distributions as &#8220;products&#8221; that:</p>
<blockquote><p>package a set of Hadoop projects (MapReduce, Hive, Sqoop, Pig, etc.) in a  way that in theory makes them integrate more naturally, and to run both  smoothly and securely.</p></blockquote>
<p>While that&#8217;s a reasonable recitation of the idea&#8217;s benefits, I&#8217;d rather say that a &#8220;distribution&#8221; of open source software comprises:<span id="more-5914"></span></p>
<ul>
<li>Open source software, in selected versions.</li>
<li>(Possibly) additional code.</li>
<li>(Likely) documentation.</li>
<li>(Possibly) legal assurances such as intellectual property indemnification.</li>
</ul>
<p>In the case of Hadoop:</p>
<ul>
<li> The version selection is a relatively big deal. There are a lot of Hadoop sub-projects. There&#8217;s been some splitting and forking and recombination. Testing a specific set of  point releases for integration and bugs is a non-trivial user benefit.</li>
<li>The additional code is generally focused on installation or whatever, because the rest is bundled into separately identified management software. Even so, because of the large number of moving parts, this is a good thing to have.</li>
<li>What&#8217;s more, in the case of Cloudera, using a particular distribution (theirs) is a prerequisite to getting the most widely adopted Hadoop management software (also theirs), which in turn is required if you want the industry&#8217;s most widely adopted Hadoop support (ditto). Similar things are apt to be true of rival distributions.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/UonOkqo65dE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/07/hadoop-related-market-categorization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2012/02/07/hadoop-related-market-categorization/</feedburner:origLink></item>
		<item>
		<title>WibiData, derived data, and analytic schema flexibility</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/3pO1CtRpP-E/</link>
		<comments>http://www.dbms2.com/2012/02/06/wibidata-derived-data-and-analytic-schema-flexibility/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 03:18:25 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data models and architecture]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Odiago and WibiData]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5907</guid>
		<description><![CDATA[My clients at Odiago, vendors of WibiData, have changed their company name simply to WibiData. Even better, they blogged with more detail as to how WibiData works, in what is essentially a follow-on to my original WibiData post last October. Among other virtues, WibiData turns out to be a poster child for my views on [...]]]></description>
			<content:encoded><![CDATA[<p>My clients at Odiago, vendors of WibiData, have changed their company name simply to WibiData. Even better, they blogged with more detail as to <a href="http://www.wibidata.com/2012/02/07/how-wibidata-works/">how WibiData works</a>, in what is essentially a follow-on to <a href="../../../../../2011/11/02/5576/">my original WibiData post</a> last October. Among other virtues, WibiData turns out to be a poster child for my views on <a href="../../../../../2011/09/06/derived-data-progressive-enhancement-and-schema-evolution/">derived data and the corresponding schema evolution</a>.</p>
<p>Interesting quotes include:</p>
<blockquote><p>WibiData is designed to store &#8230; transactional data side-by-side with profile and other derived data attributes.</p></blockquote>
<blockquote><p>&#8230; the ability to add new ad-hoc columns to a table enables more flexible analysis: output data that is the result of one analytic pipeline is stored adjacent to its input data, meaning that you can easily use this as input to second- or third-order derived data as well.</p></blockquote>
<blockquote><p>schemas can vary over time; you can easily add a field to a record, or delete a field. &#8230; But even though you start collecting that new data, your existing analysis pipelines can treat records like they always did; programs that don’t yet know about the new cookie are still compatible with both the old records already collected, and the new records with the additional field. New programs fill in default values for old data recorded before a field was added, applying the new schema at read time.</p></blockquote>
<blockquote><p>schemas for every column are stored in a data dictionary that matches column names with their schemas, as well as human-readable descriptions of the data.</p></blockquote>
<p>Interesting aspects of the post that don&#8217;t lend themselves as well to being excerpted include:</p>
<ul>
<li>How the Produce-Gather &#8220;analysis calculus&#8221; &#8212; i.e. framework &#8212; works.</li>
<li>How this all ties into Apache projects (and sub-projects) such as Hadoop, HBase, and Avro.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/3pO1CtRpP-E" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/06/wibidata-derived-data-and-analytic-schema-flexibility/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2012/02/06/wibidata-derived-data-and-analytic-schema-flexibility/</feedburner:origLink></item>
		<item>
		<title>Sumo Logic and UIs for text-oriented data</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/GLqo9wlowlY/</link>
		<comments>http://www.dbms2.com/2012/02/06/sumo-logic-and-uis-for-text-oriented-data/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 13:27:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Text]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5897</guid>
		<description><![CDATA[I talked with the Sumo Logic folks for an hour Thursday. Highlights included: Sumo Logic does SaaS (Software as a Service) log management. Sumo Logic is text indexing/Lucene-based. Thus, it is reasonable to think of Sumo Logic as &#8220;Splunk-like&#8221;. (However, Sumo Logic seems to have a stricter security/trouble-shooting orientation than Splunk, which is trying to [...]]]></description>
			<content:encoded><![CDATA[<p>I talked with the Sumo Logic folks for an hour Thursday. Highlights included:</p>
<ul>
<li>Sumo Logic does SaaS (Software as a Service) log management.</li>
<li>Sumo Logic is text indexing/Lucene-based. Thus, it is reasonable to think of Sumo Logic as &#8220;Splunk-like&#8221;. (However, Sumo Logic seems to have a stricter security/trouble-shooting orientation than Splunk, which is trying to <a href="../../../../../2012/01/10/splunk-update/">branch out</a>.)</li>
<li>Sumo Logic has hacked Lucene for faster indexing, and says 10-30 second latencies are typical.</li>
<li>Sumo Logic&#8217;s main differentiation is <strong>automated classification of events. </strong></li>
<li>There&#8217;s some kind of streaming engine in the mix, to update counters and drive alerts.</li>
<li>Sumo Logic has around 30 &#8220;customers,&#8221; free (mainly) or paying (around 5) as the case may be.</li>
<li>A truly typical Sumo Logic customer has single to low double digits of gigabytes of log data per day. However, Sumo Logic seems highly confident in its ability to handle a terabyte per customer per day, give or take a factor of 2.</li>
<li>When I asked about the implications of shipping that much data to a remote data center, Sumo Logic observed that log data compresses really well.</li>
<li>Sumo Logic recently raised a bunch of venture capital.</li>
<li>Sumo Logic&#8217;s founders are out of ArcSight, a log management company HP paid a bunch of money for.</li>
<li>Sumo Logic coined a marketing term &#8220;LogReduce&#8221;, but it has nothing to do with &#8220;MapReduce&#8221;. Sumo Logic seems to find this amusing.</li>
</ul>
<p>What interests me about Sumo Logic is that automated classification story. I thought I heard Sumo Logic say:<span id="more-5897"></span></p>
<ul>
<li>It&#8217;s largely unsupervised machine learning.</li>
<li>It&#8217;s specific to a particular user/data set.</li>
<li>It can be up and running and classifying things effectively almost instantly (i.e., on seconds&#8217; or minutes&#8217; worth of data).</li>
<li>It&#8217;s informed by what different users tag as false positives. (Or maybe that is planned for future versions.)</li>
</ul>
<p><em>I have a little trouble seeing how all those points fit exactly together, so perhaps I got some details wrong.</em></p>
<p>The payoff is that <strong>machine learning directly informs the Sumo Logic user interface</strong>. In particular, large numbers of events are bundled into a small number of categories, hopefully making it much easier for network operations types to scan the UI and pick out what&#8217;s important.</p>
<p>In general, the idea of machine-learning informing analytic UIs via some sort of classification is common in text-oriented technologies, notably in:</p>
<ul>
<li>Good ol&#8217; text search.</li>
<li>Text mining vendors&#8217; approaches to clustering hits on words or phrases that say substantially the same thing.</li>
</ul>
<p>But otherwise it seems kind of rare, if we stipulate that ad-serving/general internet personalization isn&#8217;t really an analytic UI &#8212; but I&#8217;d love to hear of any interesting examples I&#8217;ve overlooked.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/GLqo9wlowlY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/06/sumo-logic-and-uis-for-text-oriented-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2012/02/06/sumo-logic-and-uis-for-text-oriented-data/</feedburner:origLink></item>
		<item>
		<title>Comments on the 2012 Forrester Wave: Enterprise Hadoop Solutions</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/ngGyObPAoMM/</link>
		<comments>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 05:16:20 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[MapR]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Pentaho]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5886</guid>
		<description><![CDATA[Forrester has released its Q1 2012 Forrester Wave: Enterprise Hadoop Solutions. (Googling turns up a direct link, but in case that doesn&#8217;t prove stable, here also is a registration-required link from IBM&#8217;s Conor O&#8217;Mahony.) My comments include: The Forrester Wave&#8217;s relative vendor rankings are meaningless, in that the document compares apples, peaches, almonds, and peanuts. [...]]]></description>
			<content:encoded><![CDATA[<p>Forrester has released its Q1 2012 Forrester Wave: Enterprise Hadoop Solutions. (Googling turns up a <a href="http://www.forrester.com/rb/go?docid=60755&amp;oid=1-K07LCA&amp;action=5">direct link</a>, but in case that doesn&#8217;t prove stable, here also is <a href="http://database-diary.com/2012/02/02/get-a-free-copy-of-the-forrester-wave-for-enterprise-hadoop-solutions/">a registration-required link from IBM&#8217;s Conor O&#8217;Mahony</a>.) My comments include:</p>
<ul>
<li>The Forrester Wave&#8217;s <strong>relative vendor rankings are meaningless,</strong> in that the document compares apples, peaches, almonds, and peanuts. Apparently, it covers any vendor that includes a distribution of Apache Hadoop MapReduce into something it offers, and that offered at least two (not necessarily full production) references for same.</li>
<li>The Forrester Wave for &#8220;enterprise Hadoop&#8221; contradicts itself on the subject of Hortonworks.
<ul>
<li>The Forrester Wave for &#8220;enterprise Hadoop&#8221; is correct when it says <strong>&#8220;Hortonworks &#8230; has Hadoop training and professional services offerings that are still embryonic.&#8221;</strong></li>
</ul>
<ul>
<li>Peculiarly, the Forrester Wave for &#8220;enterprise Hadoop&#8221; also says &#8220;Hortonworks offers an impressive Hadoop professional services portfolio&#8221;. Hortonworks will likely win one or more nice partnership deals with vendors in adjacent fields, but even so its professional services capabilities are &#8230; well, a good word might be &#8220;embryonic&#8221;.</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2011/02/11/comments-on-the-2011-forrester-wave-for-enterprise-data-warehouse-platforms/">Forrester Waves always seem to have weird implicit definitions of &#8220;data warehousing&#8221;</a>. This one is no exception.</li>
<li>Forrester gave top marks in &#8220;Functionality&#8221; to 11 of 13 &#8220;enterprise Hadoop&#8221; vendors. This seems odd.</li>
<li>I don&#8217;t know why MapR, which doesn&#8217;t like HDFS (Hadoop Distributed File System), got top marks in &#8220;Subproject integration&#8221;.</li>
<li>Forrester gave top marks in &#8220;Storage&#8221; to Datameer. It also gave higher marks to MapR than to EMC Greenplum, even though EMC Greenplum&#8217;s technology is a superset of MapR&#8217;s. Very strange. <em>(Edit: Actually, as per a comment below, there is some uncertainty about the EMC/MapR relationship.)</em></li>
<li>Forrester gave higher marks in &#8220;Acceleration and optimization&#8221; to Hortonworks than to Cloudera and IBM, and higher marks yet to Pentaho. Very odd.</li>
<li>I&#8217;m not sure what Forrester is calling a &#8220;Distributed EDW file store connector&#8221;, but it sounds like something that Cloudera has provided via partnership to a number of analytic DBMS vendors.</li>
<li>Forrester&#8217;s &#8220;Strategy&#8221; rankings seem to correlate to a metric of &#8220;We&#8217;re a large enough vendor to go in N directions at once&#8221;, for various values of N.</li>
<li>Forrester is correct to rank Cloudera&#8217;s &#8220;Adoption&#8221; as being stronger than EMC/Greenplum&#8217;s or MapR&#8217;s. But Hortonworks&#8217; strong mark for &#8220;Adoption&#8221; baffles me.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/ngGyObPAoMM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions/</feedburner:origLink></item>
		<item>
		<title>Couchbase update</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/uRXjJFh1Jqc/</link>
		<comments>http://www.dbms2.com/2012/02/01/couchbase-update/#comments</comments>
		<pubDate>Thu, 02 Feb 2012 04:00:24 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Basho and Riak]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[Market share and customer counts]]></category>
		<category><![CDATA[MongoDB and 10gen]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Parallelization]]></category>
		<category><![CDATA[Web analytics]]></category>
		<category><![CDATA[Zynga]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5877</guid>
		<description><![CDATA[I checked in with James Phillips for a Couchbase update, and I understand better what&#8217;s going on. In particular: Give or take minor tweaks, what I wrote in my August, 2010 Couchbase updates still applies. Couchbase now and for the foreseeable future has one product line, called Couchbase. Couchbase 2.0, the first version of Couchbase [...]]]></description>
			<content:encoded><![CDATA[<p>I checked in with James Phillips for a Couchbase update, and I understand better what&#8217;s going on. In particular:</p>
<ul>
<li>Give or take minor tweaks, what I wrote in my <a href="../../../../../2011/08/13/couchbase-business-update/">August, 2010 Couchbase updates</a> still applies.</li>
<li>Couchbase now and for the foreseeable future has one product line, called Couchbase.</li>
<li>Couchbase 2.0, the first version of Couchbase (the product) to use CouchDB for persistence, has slipped &#8230;</li>
<li>&#8230; because more parts of CouchDB had to be rewritten for performance than Couchbase (the company) had hoped.</li>
<li>Think mid-year or so for the release of Couchbase 2.0, hopefully sooner.</li>
<li>In connection with the need to rewrite parts of CouchDB, Couchbase has:
<ul>
<li><a href="../../../../../2012/01/18/notes-from-the-couch-blogs/">Gotten out of the single-server CouchDB business</a>.</li>
<li>Donated its proprietary single-sever CouchDB intellectual property to the Apache Foundation.</li>
</ul>
</li>
<li>The 150ish new customers in 2011 Couchbase brags about are real, subscription customers.</li>
<li>Couchbase has 60ish people, headed to &gt;100 over the next few months.</li>
</ul>
<p><span id="more-5877"></span><em>If you previously heard the brand names Couchbase Single or Couchbase Mobile, pay no further attention to them. Couchbase Single was CouchDB; Couchbase Mobile is part of Couchbase&#8217;s feature set.</em></p>
<p>The current product is Couchbase 1.8, which is a whole lot like what previously was called Membase. New features in Couchbase 1.8 (versus prior versions of Membase) were concentrated in client libraries/SDK (Software Development Kit). Not coincidentally, Couchbase has hired developer evangelists who are in charge of making Couchbase play nicely with various specific languages (e.g. C/C++)</p>
<p>Drilling down further into the CouchDB part of the story:</p>
<ul>
<li>Couchbase 2.0 will replace Couchbase 1.8/Membase&#8217;s SQLite back-end with CouchDB.</li>
<li>Parts of CouchDB that do things like read, write, or compact data have been rewritten from Erlang to C.</li>
<li>Couchbase still uses other Erlang parts of Apache CouchDB, and would be delighted if the community were to usefully enhance them.</li>
<li>Couchbase&#8217;s heavy contributions to development of open source CouchDB will, for the most part, continue.</li>
<li>CouchDB stuff donated to the Apache Foundation includes:
<ul>
<li>Documentation</li>
<li>Packaging</li>
<li>Performance enhancements</li>
</ul>
</li>
</ul>
<p>There&#8217;s at least one Couchbase user with &gt;1000 nodes (at a guess, <a href="../../../../../2011/09/05/zynga-linkedin-data-warehous/">Zynga</a>).  More typical might be 20 nodes or less. This led me to wonder how much data one puts on a Couchbase node anyway. The answer turns out to vary widely, in that you want your working set to be in RAM, and whether that&#8217;s your entire database or just a slice of it depends on the nature of the application.</p>
<p>James echoed a trend I&#8217;ve heard elsewhere as well, in which products one things of as being internet-specific are also sold in a few cases to conventional enterprises for &#8212; you guessed it! &#8212; their internet operations. I also asked him about competition, and he asserted:</p>
<ul>
<li>MongoDB is the big competition. He believes Couchbase has an excellent win rate vs. 10gen for actual paying accounts.</li>
<li>DataStax/Cassandra wins over Couchbase only when multi-data-center capability is important. Naturally, multi-data-center capability is planned for Couchbase. (Indeed, that&#8217;s one of the benefits of swapping in CouchDB at the back end.)</li>
<li>Redis has &#8220;dropped off the radar&#8221;, presumably because there&#8217;s no particular persistence strategy for it.</li>
<li>Riak doesn&#8217;t show up much.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/uRXjJFh1Jqc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/02/01/couchbase-update/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2012/02/01/couchbase-update/</feedburner:origLink></item>
		<item>
		<title>Departmental analytics — best practices</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/uLI_DYkJFa4/</link>
		<comments>http://www.dbms2.com/2012/01/25/departmental-analytics-best-practices/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 16:47:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[EAI, EII, ETL, ELT, ETLT]]></category>
		<category><![CDATA[Predictive modeling and advanced analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5867</guid>
		<description><![CDATA[I believe IT departments should support and encourage departmental analytics efforts, where &#8220;support&#8221; and &#8220;encourage&#8221; are not synonyms for &#8220;control&#8221;, &#8220;dominate&#8221;, &#8220;overwhelm&#8221;, or even &#8220;tame&#8221;. A big part of that is: Let, and indeed help, departments have the data they want, when they want it, served with blazing performance. Three things that absolutely should NOT [...]]]></description>
			<content:encoded><![CDATA[<p><a href="../../../../../2012/01/23/departmental-analytics-general-observations/">I believe IT departments should support and encourage departmental analytics efforts</a>, where &#8220;support&#8221; and &#8220;encourage&#8221; are not synonyms for &#8220;control&#8221;, &#8220;dominate&#8221;, &#8220;overwhelm&#8221;, or even &#8220;tame&#8221;. A big part of that is:<br />
<strong>Let, and indeed help, departments have the data they want, when they want it, served with blazing performance.</strong></p>
<p>Three things that absolutely should NOT be obstacles to these ends are:</p>
<ul>
<li>Corporate DBMS standards.</li>
<li>Corporate data governance processes.</li>
<li>The difficulties of ETL.</li>
</ul>
<p><span id="more-5867"></span>Reasons they shouldn&#8217;t or don&#8217;t need to be obstacles include:</p>
<ul>
<li>Analytic DBMS are often vastly more cost-effective than general-purpose ones.</li>
<li>In particular, analytic DBMS are often much easier to install and manage than general-purpose ones.</li>
<li>Heavy data governance bureaucracy is often unnecessary because:
<ul>
<li>The department should know what the limitations on the data&#8217;s accuracy are.</li>
<li>The department should know how much data accuracy is required.</li>
<li>The side-effects on other departments of any data inaccuracy would be minimal.</li>
</ul>
</li>
<li>There are multiple good schemes for populating data marts, managed by cost-effective analytic DBMS, with data from integrated data warehouses.
<ul>
<li>ELT (Extract/Load/Transform) almost always works, because data cleaning/data quality was handled at or before the IDW level, and because the analytic DBMS has the processing power to pull it off.</li>
<li>ETL (Extract/Transform/Load) should be easy as well. (If isn&#8217;t, something may be lacking in your ETL set-up.)</li>
<li>Analytic DBMS are increasingly adding capabilities for easy spin-out of real or virtual data marts. Other kinds of technology (e.g. virtualization) are having their database spin-out capabilities upgraded as well.</li>
</ul>
</li>
</ul>
<p>One point to remember in support of departmental autonomy <strong>is that departments&#8217; views of what data to use may be more expansive than central IT&#8217;s.</strong> One reason is that important data may be external to the company, outside IT&#8217;s natural realm  of concern. Examples of this include but are hardly limited to:</p>
<ul>
<li>Anything like &#8220;market data&#8221;.</li>
<li>Anything like &#8220;sentiment analysis&#8221;.</li>
<li>Data owned by supply chain partners.</li>
</ul>
<p>Further, even the more innovative internal data sources are commonly departmental, for example various kinds of multi-structured data (text verbatims from customers, log file data, and so on).</p>
<p>Whatever is true of data management (and ETL) is true for metadata management, even if it&#8217;s done by some kind of business intelligence tool. What I mean by that is:</p>
<ul>
<li><strong>Whoever manages data is also responsible for ingesting and emitting it &#8230;</strong></li>
<li>&#8230; and specifically for emitting it in<strong> understandable, well-organized, well-named formats, &#8230;</strong></li>
<li><strong>&#8230; </strong>so that <strong>departments can take responsibility for</strong> what amounts to <strong>lightweight analytic application development.</strong></li>
</ul>
<p>As for the &#8220;application development&#8221; itself, I&#8217;m envisioning at least three things:</p>
<ul>
<li>Math.</li>
<li>Sophisticated relational query.</li>
<li>Data visualization.</li>
</ul>
<p>I.e., I&#8217;m talking about what &#8220;analysts&#8221; and &#8220;quants&#8221; do. So to put the point even more simply:</p>
<ul>
<li><strong>Analysts and quants should be able to consume data that&#8217;s organized in a friendly manner.</strong></li>
<li><strong>Central IT should be friendly in how it serves data.</strong></li>
</ul>
<p>One corollary of this approach is that departments should try to adhere to corporate BI standards, at least for routine dashboard and reporting. Indeed, if a department brings in a business intelligence tool different from the corporate standard, there are three main possibilities:</p>
<ul>
<li>The tool is integrated with something else it makes sense to bring in, such as a third-party data supply or application.</li>
<li>The tool has an important capability the corporate standard doesn&#8217;t have, such as more flexible visualization and drilldown.</li>
<li>Central IT screwed up, making things much more difficult than they needed to be.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/uLI_DYkJFa4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/25/departmental-analytics-best-practices/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2012/01/25/departmental-analytics-best-practices/</feedburner:origLink></item>
		<item>
		<title>Microsoft SQL Server 2012 and enterprise database choices in general</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/KdoQCkoVN7s/</link>
		<comments>http://www.dbms2.com/2012/01/24/microsoft-sql-server-2012/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 14:42:34 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Microsoft and SQL*Server]]></category>
		<category><![CDATA[Mid-range]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5859</guid>
		<description><![CDATA[Microsoft is launching SQL Server 2012 on March 7. An IM chat with a reporter resulted, and went something like this. Reporter: [Care to comment]? CAM: SQL Server is an adequate product if you don&#8217;t mind being locked into the Microsoft stack. For example, the ColumnStore feature is very partial, given that it can&#8217;t be [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.sqlserverlaunch.com/ww/Home">Microsoft is launching SQL Server 2012 on March 7</a>. An IM chat with a reporter resulted, and went something like this.</p>
<p><strong>Reporter: [Care to comment]?</strong><br />
<strong>CAM:</strong> SQL Server is an adequate product if you don&#8217;t mind being locked into the Microsoft stack. For example, the ColumnStore feature is very partial, given that <a href="http://msdn.microsoft.com/en-us/library/gg492088%28v=sql.110%29.aspx#Update">it can&#8217;t be updated</a>; but Oracle doesn&#8217;t have columnar storage at all.</p>
<p><strong>Reporter: Is the lock-in overall worse than IBM DB2, Oracle?</strong><br />
<strong>CAM:</strong> Microsoft locks you into an operating system, so yes.</p>
<p><strong>Reporter: Is this release something larger Oracle or IBM shops could consider as a lower-cost alternative a co-habitation scenario, in the event they&#8217;re mulling whether to buy more Oracle or IBM licenses?</strong><br />
<strong>CAM:</strong> If they have a strong Microsoft-stack investment already, sure. Otherwise, why?</p>
<p><strong>Reporter: [How about] just cost?</strong><br />
<strong>CAM:</strong> DB2 works just as well to keep Oracle honest as SQL Server does, and without a major operating system commitment. For analytic databases you want an analytic DBMS or appliance anyway.</p>
<p>Best is to have one major vendor of OTLP/general-purpose DBMS, a web DBMS, a DBMS for disposable projects (that may be the same as one of the first two), plus however many different analytic data stores you need to get the job done.</p>
<p>By &#8220;web DBMS&#8221; I mean MySQL, NewSQL, or NoSQL. Actually, you might need more than one product in that area.</p>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/KdoQCkoVN7s" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/24/microsoft-sql-server-2012/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2012/01/24/microsoft-sql-server-2012/</feedburner:origLink></item>
		<item>
		<title>Departmental analytics — general observations</title>
		<link>http://feedproxy.google.com/~r/dbms2/feed/~3/QK1A2errlA0/</link>
		<comments>http://www.dbms2.com/2012/01/23/departmental-analytics-general-observations/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 14:29:06 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Data warehousing]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=5843</guid>
		<description><![CDATA[Department-level adoption of analytic technology isn&#8217;t the exception; it&#8217;s the norm. Reasons include: Many analytic challenges are inherently departmental. In many cases, central IT control of analytics isn&#8217;t needed. Departments move ahead without central approval or involvement because they can. That said, arguments for centralizing analytic technology include: A lot of data is used by [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.softwarememories.com/2012/01/17/historical-notes-on-the-departmental-adoption-of-analytics/">Department-level adoption of analytic technology isn&#8217;t the exception; it&#8217;s the norm</a>. Reasons include:</p>
<ul>
<li><strong>Many analytic challenges are inherently departmental.</strong></li>
<li>In many cases,<strong> central IT control of analytics isn&#8217;t needed.</strong></li>
<li>Departments move ahead without central approval or involvement because they can.</li>
</ul>
<p>That said, arguments for centralizing analytic technology include:</p>
<ul>
<li><strong>A lot of data is used by more than one department</strong>, for example:
<ul>
<li>Financial transactions (one or more affected departments and also the central accounting group).</li>
<li>Web logs (marketing and IT/web operations).</li>
</ul>
</li>
<li><strong>Departments may not have the requisite technical expertise </strong>(and it may be redundant/cost-ineffective for them to acquire it).</li>
</ul>
<p>What&#8217;s more, there are IT best practices to support department-level analytics. Some of the key ones boil down to:</p>
<ul>
<li>Be <strong>flexible</strong> in your <strong>analytic DBMS support.</strong></li>
<li>Be <strong>responsive</strong> to requests for <strong>ETL.</strong></li>
</ul>
<p>My conclusion is that <strong>central IT should encourage (and aid) departmental analytics. </strong>Let&#8217;s look at some details.</p>
<p><span id="more-5843"></span>I think two huge categories of analytic problem are inherently departmental:</p>
<ul>
<li><a href="../../../../../2011/03/03/investigative-analytics/">Investigative analytics</a> (pretty much all of it).</li>
<li>Routine monitoring/dashboarding if the data is tracked just by one department.</li>
</ul>
<p>Investigative analytics is a kind of research activity &#8212; you&#8217;re looking to discover previously unrecognized patterns. There are two approaches to this &#8212; you can do it in the department that has the relevant business knowledge, or you can outsource it to a special group of &#8220;discoverers&#8221; (commonly statisticians).* Either way, this is a small team/departmental kind of activity.</p>
<p><em>*Combining the two approaches is common &#8212; a department can have its own analytically adept discoverers, whether they&#8217;re call &#8220;quants&#8221; or just &#8220;business analysts&#8221;.</em></p>
<p>Reporting/monitoring BI at least has the potential to be enterprise-wide &#8212; but commonly it isn&#8217;t, as each department has its own operational data sources and metrics. Marketing departments may watch external data that the rest of the company doesn&#8217;t worry about. But it can be true across the board. Factory operations folks may track machine tool data the rest of us barely understand.</p>
<p>Even if a business need is strictly departmental, there can be at least two reasons to centralize technology implementation:</p>
<ul>
<li>The department doesn&#8217;t have the critical mass of IT expertise.</li>
<li>Departmental IT has side effects on the rest of the company.</li>
</ul>
<p>Whether those reasons hold up depends a lot on what kind of analytic scenario we&#8217;re talking about.</p>
<p>Let&#8217;s organize that part of this discussion in line with the taxonomy from my <a href="../../../../../2011/07/05/eight-kinds-of-analytic-database-part-1/">eight kinds of analytic database</a> posts last July.</p>
<ul>
<li><strong>Enterprise data warehouses</strong> fall under the purview of major IT organizations. That remains true even if we pivot to the more realistic concept of <a href="../../../../../2011/11/28/terminology-data-mustering/">integrated data warehouse</a>. However, less stuff needs to be protected in an EDW/IDW than some data authoritarians like to think.</li>
<li>I wrote that the stresses on <strong>traditional data marts</strong> were &#8220;performance, concurrency, TCO.&#8221; This is a clue that the more demanding examples are right in IT&#8217;s wheelhouse. As for the less demanding cases &#8212; IT should be able to meet those needs without breaking a sweat.</li>
<li><strong>Agile investigative data marts</strong> are inherently departmental. If you have the talent to use one, you also have the talent to, for example, train into being a part time Netezza DBA. Who cares if you don&#8217;t have the expertise to do sophisticated tuning? Analytic DBMS are fast enough &#8212; and hardware is cheap enough &#8212; that you don&#8217;t that skill set anyway.</li>
<li><strong>Big investigative data marts</strong> can go either way. They&#8217;re technically challenging, so IT certainly has a claim on them. But in cases where the data, while big, is fairly homogeneous, it&#8217;s also not unrealistic for departments to handle the mart themselves.</li>
<li><strong>Bit buckets</strong> are often departmental today, with the department in question happening to be central IT. And central IT is where they&#8217;re likely to flourish, as the data they hold becomes ever more diverse.</li>
<li><strong>Archival data stores</strong> are a central IT matter. Nobody else is likely to care enough to do it right.</li>
<li><strong>Outsourced data marts,</strong> by definition, don&#8217;t live inside conventional enterprises. But they are often a way for business units to get access to data and analytics without relying on central IT.</li>
<li><strong>Operational analytics servers</strong> are likely to be sufficiently mission-critical that you want them handled by IT.</li>
</ul>
<p>So in most cases I&#8217;d say: <strong>Departments can manage their own investigative data marts</strong>, and so of course can SaaS vendors and third-party data providers;<strong> other analytic databases should be run by central IT.</strong> (And of course, large departments with serious local IT can fuzz those distinctions up.) Beyond that, it would seem that whoever administers the database should administer the rest of the analytic stack as well.</p>
<p>That still leaves us with some practical questions, such as:</p>
<ul>
<li>Exactly what products should IT departments buy for which purposes? I hope a lot of posts in this blog are helpful in that consideration.</li>
<li>How should development tasks be split between departments and central IT? It may take me a while to get a post together on that the subject, since in general the analytics-development picture is pretty complicated to lay out.</li>
<li>How should departments and central IT work together to manage departmental investigative data marts? I hope to post on that subject soon.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/dbms2/feed/~4/QK1A2errlA0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2012/01/23/departmental-analytics-general-observations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.dbms2.com/2012/01/23/departmental-analytics-general-observations/</feedburner:origLink></item>
	</channel>
</rss><!-- Dynamic page generated in 0.347 seconds. --><!-- Cached page generated by WP-Super-Cache on 2012-02-08 17:51:21 -->

