<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:thr="http://purl.org/syndication/thread/1.0">
    <title>Clear Information Management</title>
    
    
    <link rel="alternate" type="text/html" href="http://clearim.typepad.com/clearim/" />
    <id>tag:typepad.com,2003:weblog-1746585</id>
    <updated>2009-03-06T15:45:46-06:00</updated>
    
    <generator uri="http://www.typepad.com/">TypePad</generator>
    <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/ClearInformationManagement" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="clearinformationmanagement" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://hubbub.api.typepad.com/" /><entry>
        <title>IM This Week: Time for Business Intelligence</title>
        <link rel="alternate" type="text/html" href="http://clearim.typepad.com/clearim/2009/03/im-this-week-time-for-business-intelligence.html" />
        <link rel="replies" type="text/html" href="http://clearim.typepad.com/clearim/2009/03/im-this-week-time-for-business-intelligence.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-63742877</id>
        <published>2009-03-06T15:45:46-06:00</published>
        <updated>2009-03-06T15:45:46-06:00</updated>
        <summary>Business Week published a Special Report entitled "The Ceo Guide to Business Intelligence". The report discusses how BI is helping to reduce costs at Brinker International and Welch's and reduce fraud and wasted time at IMV Projects and Best Buy....</summary>
        <author>
            <name>Stuart Mullins</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Business Intelligence" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="IBM" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Information Management" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Netezza" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="Business Intelligence" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Economy" />
        <category scheme="http://sixapart.com/ns/types#tag" term="IBM" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Information Management" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Netezza" />
        
<content type="xhtml" xml:lang="en-US" xml:base="http://clearim.typepad.com/clearim/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>Business Week published a Special Report entitled "<a href="http://www.businessweek.com/technology/content/mar2009/tc2009032_101762.htm?chan=technology_ceo+guide+to+tech_ceo+guide+to+tech+--+consumer+tech+at+work">The Ceo Guide to Business Intelligence</a>". The report discusses how BI is helping to reduce costs at Brinker International and Welch's and reduce fraud and wasted time at IMV Projects and Best Buy.  We also get a nice little commentary about how data is transformed into useful information citing health care as an example.  Overall, a high level piece directed to those less familiar with BI, but it's good to see that BI is being looked to in troubling economic times.  </p><p>Another example of how BI is being seen as a way out of the recession can be found at <a href="http://www.technewsworld.com/story/must-read/66385.html?wlc=1236359788">TechNewsWorld</a> (republished from <a href="http://www.crmbuyer.com/">CRM Buyer</a>).  Some good points are made here about increasing volumes of data and how companies are struggling to turn it into useful insights.  Yet, the answer is better business intelligence, getting better information to decision makers more quickly.  </p><p>Following along the same line, here's an article on <a href="http://www.sdcexec.com/web/online/Decision-Support-Trends/Analyst-Comment--Abstinence--When-It-Comes-to-Business-Intelligence-Projects--Is-Not-a-Good-Idea/16$11101">Supply and Demand Chain Executive</a> discussing Performance Management applications and how CPM/BI is more important than ever.</p><p><a href="http://www.intelligententerprise.com/blog/archives/2009/03/resetting_prior.html#more">Doug Henschen</a>, editor of <a href="http://www.intelligententerprise.com">Intelligent Enterprise</a>, talks about executive priorities and refers to a white paper recently published by IE entitled "<a href="http://business-optimization.techweb.com/util/download.jhtml;jsessionid=FLANZPXJGAS0UQSNDLPSKH0CJUNN2JVN?id=174900001&amp;cat=whitepaper">Attitudes And Imperatives: Intelligent Enterprise Reader Priorities Survey 2009</a>". This report presents a number of opportunities for information management but is also a warning that we need to continue to improve our methods and technologies.  I have some additional thoughts around this that I will cover in a separate entry.</p><p>Among other news: Netezza <a href="http://www.netezza.com/releases/2009/release030309.htm%27">announced</a> strong results for Fiscal 2008 and 1Q2009. Vertica <a href="http://news.thomasnet.com/fullstory/556512">unveiled</a> a virtualized appliance for VMWare. And from the WTF file...IBM adds data warehouse features to <a href="http://www.eweek.com/c/a/Database/IBM-Adds-Informix-Data-Warhouse-Feature-to-Bolster-Database-Software/">Informix</a>?</p><p /></div>
</content>



    </entry>
    <entry>
        <title>History in the Data Warehouse</title>
        <link rel="alternate" type="text/html" href="http://clearim.typepad.com/clearim/2009/03/historicity-of-the-data-warehouse.html" />
        <link rel="replies" type="text/html" href="http://clearim.typepad.com/clearim/2009/03/historicity-of-the-data-warehouse.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-63654063</id>
        <published>2009-03-05T14:50:30-06:00</published>
        <updated>2009-03-05T14:50:31-06:00</updated>
        <summary>This is the next post in a series of discussions about data warehousing best practices based on this blog entry from February 11. Today, we look at capturing history in a data warehouse. "The data warehouse is able to present...</summary>
        <author>
            <name>Stuart Mullins</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Modeling" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Warehousing" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="Data Modeling" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Data Warehousing" />
        
<content type="xhtml" xml:lang="en-US" xml:base="http://clearim.typepad.com/clearim/">
<div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://clearim.typepad.com/.a/6a010535118625970b011168c34cee970c-pi" style="float: left;"><img alt="History" border="0" class="at-xid-6a010535118625970b011168c34cee970c " src="http://clearim.typepad.com/.a/6a010535118625970b011168c34cee970c-800wi" style="margin: 5px;" title="History" /></a>
 This is the next post in a series of discussions about data warehousing best practices based on <a href="http://clearim.typepad.com/clearim/2009/02/data-warehouse-basics.html">this</a> blog entry from February 11. Today, we look at capturing history in a data warehouse.  "The data warehouse is able to present a view of the business at a
particular point in time and track Key Performance Indicators (KPI's)
over time." There's not too much to argue here.  </p><p>One of the primary functions of a data warehouse is to capture history and perform point-in-time reporting and analysis.  There are a couple of ways to do this depending on the architecture and requirements.  The most common method in dimensional modeling is the Type 2 Slowly Changing Dimension (SCD).  Type 2 SCD's track tie a transactional or fact record to it's dimension attributes, as it was at the time it occurred.  Variations on the type 2 SCD will allow us to query the transaction as it was at the time it occurred or as it appears now.  If we don't care how it was in the past and only care how it looks now, we'll use a Type 1 SCD.  </p><p>Let's look at an example using a hotel transaction.  Suppose we are looking at a transaction that occurred on 12/20/08.  The transaction includes the room rate plus taxes that the guest paid, the type of payment he used, the guest's name and rewards number, and the hotel brand name and location.  </p><p>Let's say on 1/1/09, the hotel company changed the brand name of that property to Fairview Inn from Village Suites.  Hotel companies, such as Hilton and Marriott, own a number of brands, and there are occassions that they may convert one brand into another.  </p><p>On 2/1/09, we run a report showing sales by brand for each month in 2008.  Do we include that hotel's 2008 transactions under Fairview or Village?  That probably depends on the use.  If we are tracking revenue in order to calculate bonuses for the Village Suites VP, then he would certainly want to us to include those transactions in his bucket (as-was).  However, if we are using last year sales to project Fairview sales for 2009, then we want to include the changed properties sales with Fairview (as-is).  </p><p>In this case, we need to be able to query it both "as-is" and "as-was".  A flexibile architecture will allow us to do either one.  Much of the time, the "as-was" serves no purpose.  For example, if we have status codes, and we change the description of a particular code.  Most of the time, we'll want to see all records associated with the new description.  </p><p>Determing which method we use to track history is one of the most important decisions we make in the data warehouse design.  It affects the flexibility of the warehouse, as well as performance.  We have to balance each.  However, it is also one way to derive the greatest value from the solution, since this historical information is rarely available in any consummable form anywhere else.</p></div>
</content>



    </entry>
    <entry>
        <title>Granularity of a Data Warehouse</title>
        <link rel="alternate" type="text/html" href="http://clearim.typepad.com/clearim/2009/02/granularity-of-a-data-warehouse.html" />
        <link rel="replies" type="text/html" href="http://clearim.typepad.com/clearim/2009/02/granularity-of-a-data-warehouse.html" thr:count="1" thr:updated="2010-09-19T13:32:48-05:00" />
        <id>tag:typepad.com,2003:post-63440191</id>
        <published>2009-02-27T15:29:14-06:00</published>
        <updated>2009-02-27T15:37:34-06:00</updated>
        <summary>We are continuing a series of discussions about data warehousing best practices based on my blog entry here. This time we take on choosing the right granularity in a data warehouse. The rule of thumb, and the point I made...</summary>
        <author>
            <name>Stuart Mullins</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Best Practices" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Modeling" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Warehousing" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="Data Modeling" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Data Warehousing" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Granularity" />
        
<content type="html" xml:lang="en-US" xml:base="http://clearim.typepad.com/clearim/">
&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;We are continuing a series of discussions about data warehousing best practices based on my blog entry &lt;a href='http://clearim.typepad.com/clearim/2009/02/data-warehouse-basics.html'&gt;here&lt;/a&gt;.&amp;nbsp; This time we take on choosing the right granularity in a data warehouse.&amp;nbsp; The rule of thumb, and the point I made in the original blog entry was that you should always choose the lowest level of granularity.&amp;nbsp; That makes sense in that you can always aggregate to a higher level, but you can't go back down to detail if the data is not stored.&amp;nbsp; &lt;/p&gt;&lt;p&gt;I can give an example of one data warehouse that I am familiar with that has won awards and accolades for best practices.&amp;nbsp; However, a closer look reveals several trade-offs that were made during design that have had far and long lasting effects.&amp;nbsp; During the design process, the decision was made to drop a lot of detail data in order to save space.&amp;nbsp; This data, which many would consider essential to any data warehouse, contained transactional details and customer information.&amp;nbsp; The data warehouse was built for financial use, and the finance organization was unconcerned about these details.&amp;nbsp; They were looking at metrics by products and the chart of accounts.&amp;nbsp; &lt;/p&gt;&lt;p&gt;Was this necessary?&amp;nbsp; Arguably, it was.&amp;nbsp; Since IT projects were funded by the business organization, the finance organization had no incentive to pay for storage or CPU required to process more data than was absolutely required.&amp;nbsp; They simply made the decision to design for their own users' requirements.&amp;nbsp; &lt;/p&gt;&lt;p&gt;However, this decision also had several long term effects that may have not been taken into consideration.&amp;nbsp; Since the warehouse did not contain details that were important to other departments, such as marketing, sales, and operation, the warehouse was rendered useless for a large segment of the company.&amp;nbsp; These organizations, in turn, were then forced to build their own data warehouses from the same source data in order to satisfy their users' needs.&amp;nbsp; The end result being, as a company, they spent more building and maintaining multiple data warehousing systems than by investing more in the first one.&amp;nbsp; &lt;/p&gt;&lt;p&gt;Another effect that had considerable cost was the other organizations' difficulty reconciling their numbers to the finance organization.&amp;nbsp; Since the detail source data was not maintained by finance, and their business rules were buried in thousands of lines of code, users in the other departments spent countless hours trying to resolve differences between the systems.&amp;nbsp; &lt;/p&gt;&lt;p&gt;So, was the right choice to build one true enterprise data warehouse to serve the entire organization?&amp;nbsp; &lt;/p&gt;&lt;p&gt;In theory, the enterprise data warehouse is the best solution.&amp;nbsp; However, we recognize that the larger the organization, the more difficult this is to accomplish.&amp;nbsp; Enterprise level designs in 50 billion dollar + companies become almost a fantasy.&amp;nbsp; There have been a few documented successes, but the kinds of investment involved in these projects take years and millions of dollars.&amp;nbsp; &lt;/p&gt;&lt;p&gt;It is important that there be a chief architect overseeing the development of all information systems.&amp;nbsp; That person must have a handle on the current environment and a vision for integration and reduced costs.&amp;nbsp; I have met a few of these guys who had the title and responsibility but lacked the vision.&amp;nbsp; I have also met a few who were convinced that silos were simply more cost effective and maintainable.&amp;nbsp; Every organization is different and what works for one might not work for another.&lt;/p&gt;&lt;p&gt;So back to the original consideration for this post.&amp;nbsp; Should a data warehouse reflect the lowest level of granularity?&amp;nbsp; In general, yes, it should, but there are a number of other factors that must&amp;nbsp; be considered when making the decision.&amp;nbsp; Ultimately, the answer is that the granularity of the data should reflect the lowest level that has the greatest use to the greatest number of users.&lt;/p&gt;&lt;/div&gt;
</content>



    </entry>
    <entry>
        <title>Conformity of a Data Warehouse</title>
        <link rel="alternate" type="text/html" href="http://clearim.typepad.com/clearim/2009/02/conformity-of-a-data-warehouse.html" />
        <link rel="replies" type="text/html" href="http://clearim.typepad.com/clearim/2009/02/conformity-of-a-data-warehouse.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-62698969</id>
        <published>2009-02-19T16:32:05-06:00</published>
        <updated>2009-02-19T16:32:05-06:00</updated>
        <summary>In an earlier post, Data Warehouse Basics, I reprinted something that I written a while back about what makes a data warehouse. In a series of entries, I am exploring each point in further detail and see if the meaning...</summary>
        <author>
            <name>Stuart Mullins</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Modeling" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Warehousing" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="Conformed Dimensions" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Data Modeling" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Data Warehousing" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Dimensional Modeling" />
        
<content type="xhtml" xml:lang="en-US" xml:base="http://clearim.typepad.com/clearim/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>In an earlier post, <a href="http://clearim.typepad.com/clearim/2009/02/data-warehouse-basics.html">Data Warehouse Basics</a>, I reprinted something that
I written a while back about what makes a data warehouse. In a series of entries, I am
exploring each point in further detail and see if the meaning still holds
up.  In this post, we take on "Data is conformed". </p><p>Conformity in a data warehouse occurs means essentially two things, conformed dimensions and conformed facts.  Conformity allows data warehouses to be distributed across the enterprise and unanticipated new sources to be added.  Conformity assures that definitions and uses are consistent throughout the warehouse.</p><p>Conformed dimensions assure that users will be able to drill across data sources and arrive at conclusions that were not possible without a data warehouse.  This is simply assuring that customer id 1234 for John L Smith in one fact table is the same customer id 1234/John L Smith in another fact table.  This example certainly seems obvious.  However, determining conformed dimensions is not always easy.  </p><p>Two departments may view the information very differently.  This is often the case between finance and sales or marketing.  Finance typically views things as they are tracked in the accounting system.  While marketing or sales may view them entirely different.  For example, when tracking sales by location hierarchy, finance may have a hierarchy that is based on legal entities.  However, marketing may not make any distinction between these legal entities and may be more concerned with brand or geography.  These cases may present challenges to modelers and to arriving at conformed dimensions.</p><p>Conformed facts ensure that terminology for measures in one fact table mean the same in another.  Revenue always means revenue, right?  Well, again finance's definition of revenue may be different than marketing's.  Once again, we have to ensure that our terminology is consistent.  Doing so, will allow us to derive meaningful results across fact tables.  </p><p>The concepts of conformed dimensions and facts are the cornerstone concepts to dimensional methodology. Without them, the data warehouse becomes a database of siloed data marts.  </p><p>I extend the concept of conformity to include data naming standards.  Consistent naming schemes make the database understandable and usable .  So perhaps, we should call this one conformity and consiststency of a data warehouse.  </p></div>
</content>



    </entry>
    <entry>
        <title>DataValidator for Informatica Introduced</title>
        <link rel="alternate" type="text/html" href="http://clearim.typepad.com/clearim/2009/02/datavalidator-for-informatica-introduced.html" />
        <link rel="replies" type="text/html" href="http://clearim.typepad.com/clearim/2009/02/datavalidator-for-informatica-introduced.html" thr:count="3" thr:updated="2011-01-11T15:22:15-06:00" />
        <id>tag:typepad.com,2003:post-62764199</id>
        <published>2009-02-12T16:44:54-06:00</published>
        <updated>2009-02-12T16:44:55-06:00</updated>
        <summary>Last week, DVOSoftware announced the first release of DataValidator. The company's founder, Val Rayzman, is a former executive at Informatica.DataValidator that seems so obviously useful that it is curious that no one else has done of it before. It compares...</summary>
        <author>
            <name>Stuart Mullins</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Quality" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="DataValidator" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Informatica" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="Data Quality" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Data Validation" />
        <category scheme="http://sixapart.com/ns/types#tag" term="DataValidator" />
        <category scheme="http://sixapart.com/ns/types#tag" term="DVOSoftware" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Informatica" />
        
<content type="xhtml" xml:lang="en-US" xml:base="http://clearim.typepad.com/clearim/">
<div xmlns="http://www.w3.org/1999/xhtml"><p><span style="text-decoration: underline;"><a href="http://www.dvosoftware.com" onclick="window.open(this.href,'_blank','scrollbars=no,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false" style="float: left;"><img alt="DVOSoft" border="0" class="at-xid-6a010535118625970b0111685e56df970c " src="http://clearim.typepad.com/.a/6a010535118625970b0111685e56df970c-800wi" title="DVOSoft" /></a>
 </span> Last week, <a href="http://www.dvosoftware.com">DVOSoftware</a> announced the first release of <a href="http://www.dvosoft.com/PR_DVrelease.html">DataValidator</a>.  The company's founder, Val Rayzman, is a former executive at <a href="http://www.informatica.com">Informatica</a>.DataValidator that seems so obviously useful that it is curious that no one else has done of it before. It compares two data sources and finds differences based on the evaluation criteria you select.  </p><p>For example, consider an ETL process that loads 5000 records from an operational system into a data warehouse.  We want to know that all of the records successfully loaded and that none of the values were unexpectedly truncated or altered.  In a few clicks, we can create and execute a test in DataValidator that will show any variances between the two tables.  Of course, DataValidator can also handle more complex tests as well, including set tests (i.e. show all values that occur in one set but not the other), multiple tables, incremental values, lookups, etc. </p><div style="text-align: left;"><div style="text-align: center;"><a href="http://clearim.typepad.com/files/informatica.png"><span class="at-xid-6a010535118625970b010537239f30970b" /></a><a href="http://www.informatica.com" onclick="window.open(this.href,'_blank','scrollbars=no,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img alt="Informatica" border="0" class="at-xid-6a010535118625970b010537239f52970b " src="http://clearim.typepad.com/.a/6a010535118625970b010537239f52970b-800wi" title="Informatica" /></a>
 <br />
</div>DataValidator uses the Informatica engine to build and execute Mappings and Workflows.  Technically, it compares any set of tables.  The tables themselves do not have to have been loaded by an Informatica process.  </div><p>The real value of data validator is in the reduced amount of time it takes to perform data validation.  Data validation is a highly manual process in most organizations.  Some have developed scripts which are then automated and run following each load cycle.  However, many organizations, short on time and resources, never get this far.  Instead, they end up in a reactionary mode, correcting data errors after they have been discovered by users.  The irony here is that organizations purchase Informatica to speed up and automate load processes, but then are still required to manually write code for validation.  DataValidator fills that void.  </p><p>There are a few features that would make DataValidator a more complete tool, such as scheduling.  Val assures that these additional features are in the plan for future releases.  The idea is to get the product into the market and learn from its users which features are needed most.  That said, I believe that DataValidator offers a lot of value as it exists now.  I don't think it is difficult to make an ROI case for something like this.  But the challenges are the same as the data quality and profiling face, convincing organizations that data quality has a financial impact on the bottom line.</p><p>I am hoping to work with DataValidator on a live project soon and evaluate it more completely.  I will be certain to share my experiences here. </p></div>
</content>



    </entry>
    <entry>
        <title>Comprehensiveness of a Data Warehouse</title>
        <link rel="alternate" type="text/html" href="http://clearim.typepad.com/clearim/2009/02/data-warehousing-basics-continued.html" />
        <link rel="replies" type="text/html" href="http://clearim.typepad.com/clearim/2009/02/data-warehousing-basics-continued.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-62696637</id>
        <published>2009-02-11T11:08:18-06:00</published>
        <updated>2009-02-11T11:11:26-06:00</updated>
        <summary>In my previous post, Data Warehouse Basics, I reprinted something that I written a while back about what makes a data warehouse. I want to explore each point in further detail and see if the meaning still holds up. In...</summary>
        <author>
            <name>Stuart Mullins</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Best Practices" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Warehousing" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="Data Warehousing" />
        
<content type="html" xml:lang="en-US" xml:base="http://clearim.typepad.com/clearim/">
&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;In my previous post, &lt;a href="http://clearim.typepad.com/clearim/2009/02/data-warehouse-basics.html"&gt;Data Warehouse Basics&lt;/a&gt;, I reprinted something that I written a while back about what makes a data warehouse. I want to explore each point in further detail and see if the meaning still holds up.&amp;nbsp; In this post, we take on "Data is comprehensive".&lt;/p&gt;&lt;p&gt;The description says that data must consolidated from multiple systems.&amp;nbsp; True, but does that in itself make a data warehouse comprehensive?&amp;nbsp; &lt;/p&gt;&lt;p&gt;Not all data warehouses should include all subjects.&amp;nbsp; It is
only realistic that some data warehouses are directed towards finance,
while others are focused on marketing.&lt;/p&gt;&lt;p&gt;Perhaps, it would be more accurate to say that the data is comprehensive around a particular subject. Our data warehouse should reflect all known details about a particular entity so that analysis produced from it is accurate.&amp;nbsp; &lt;/p&gt;&lt;p&gt;For example, we might be performing customer valuation.&amp;nbsp; However, if we have consolidated data from our only 3 of our 4 customer facing systems, how can be sure that our results are correct?&amp;nbsp; &lt;/p&gt;&lt;p&gt;This happens often in the hospitality industry where the parent company operates several brands and many offer reward programs, in which the customer earns points for stays at any of the chains' brands.&amp;nbsp; In order to evaluate the true value of a customer, we must be able to consolidate information from each of the company's brands.&amp;nbsp; Often, these brands operate on entirely different point-of-sale or property-management-systems.&amp;nbsp; The data warehouse must capture customer transactions from every system in order to place a true valuation on customers.&lt;/p&gt;&lt;p&gt;So from that respect, data must be comprehensive.&amp;nbsp; However, there is another point of comprehensiveness that we should consider.&amp;nbsp; Yes, we should capture every transaction for a customer from every system, but should we store every attribute from the transaction if we are not required to do so?&lt;/p&gt;&lt;p&gt;Experienced data warehouse designers know this can be a fine line to walk.&amp;nbsp; On the one hand, we want to maintain the usability of the warehouse by not over complicating it with clutter that has no business relevance or use.&amp;nbsp; However, when we omit certain data elements, it usually isn't long before someone comes along and requests that the data be added to the warehouse.&amp;nbsp; &lt;/p&gt;&lt;p&gt;The rule of thumb is to include all business information available during the initial design.&amp;nbsp; The cost of additional storage usually pales in comparison to the cost to modify, load, and test additions at a later time.&amp;nbsp; &lt;/p&gt;&lt;p&gt;So to wrap up, yes a data warehouse should be comprehensive around a subject.&amp;nbsp; However, that does not mean that it must include every subject relevant to the company.&amp;nbsp; &lt;/p&gt;&lt;/div&gt;
</content>



    </entry>
    <entry>
        <title>Data Warehouse Basics</title>
        <link rel="alternate" type="text/html" href="http://clearim.typepad.com/clearim/2009/02/data-warehouse-basics.html" />
        <link rel="replies" type="text/html" href="http://clearim.typepad.com/clearim/2009/02/data-warehouse-basics.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-62696423</id>
        <published>2009-02-11T10:17:39-06:00</published>
        <updated>2009-02-11T10:17:39-06:00</updated>
        <summary>A year and a half ago, I posted ITtoolbox about the basics of data warehouse. That post got a few interesting comments even one as recent as last week. I thought I would re-post it here then further explore the...</summary>
        <author>
            <name>Stuart Mullins</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Modeling" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Warehousing" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="Data Warehousing " />
        
<content type="xhtml" xml:lang="en-US" xml:base="http://clearim.typepad.com/clearim/">
<div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://clearim.typepad.com/.a/6a010535118625970b010537203582970b-pi" style="display: inline;" /><span style="text-decoration: underline;"><a href="http://clearim.typepad.com/.a/6a010535118625970b0111685aed3b970c-pi" style="float: left;"><img alt="StarSchema" border="0" class="at-xid-6a010535118625970b0111685aed3b970c " src="http://clearim.typepad.com/.a/6a010535118625970b0111685aed3b970c-800wi" style="margin: 5px;" title="StarSchema" /></a>
 </span>A year and a half ago, I posted <a href="http://it.toolbox.com/blogs/dw-cents/data-warehouse-data-model-design-18699">ITtoolbox</a> about the basics of data warehouse.  That post got a few interesting comments even one as recent as last week.  I thought I would re-post it here then further explore the topic in subsequent posts. <br />...</p><p>A friend of mine recently asked me to help her develop some ideas for
explaining data warehousing to a group of business users. It seems
there was some confusion in her company regarding why their so-called
data warehouse was really just a data dumping ground and not a true
best practices data warehouse. The list below details what I think
separates a data warehouse from an archive database, data mart, ODS,
etc.</p><p /><p> The data warehouse data model should be designed for the following:</p><p>1) Data is comprehensive - Data is captured and consolidated from multiple systems.</p><p>2)
Data is conformed - This is the famous line "single version of the
truth". Data elements are conformed so that the definitions of
"customer" or "revenue" mean the same thing no matter which system it
originated. Tables are conformed when they can be queried across
dimensions and facts without changing the meaning of the results. This
is what is needed to truly integrate data in the warehouse.</p><p>3)
Data is granular - Ideally, we capture and store data at its lowest
level of granularity. You can always aggregate up, but you can't drill
down if the data isn't stored that way.</p><p>4) Data is historical -
The data warehouse is able to present a view of the business at a
particular point in time and track Key Performance Indicators (KPI's)
over time. </p><p>5) Data is shared - A data warehouse that cannot be queried or otherwise accessed has little value. </p> </div>
</content>



    </entry>
    <entry>
        <title>Vendor Stability and the Challenge for Start Ups</title>
        <link rel="alternate" type="text/html" href="http://clearim.typepad.com/clearim/2009/01/vendor-stability-and-the-challenge-for-start-ups.html" />
        <link rel="replies" type="text/html" href="http://clearim.typepad.com/clearim/2009/01/vendor-stability-and-the-challenge-for-start-ups.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-62159898</id>
        <published>2009-01-30T13:29:17-06:00</published>
        <updated>2009-01-30T14:45:51-06:00</updated>
        <summary>Seth Grimes has a great post on Intelligent Enterprise about vendor stability. He makes the point that while technology is important, the stability of the vendor is too and should be considered. Seth presents two companies that have compelling technology...</summary>
        <author>
            <name>Stuart Mullins</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="IBM" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Information Management" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Oracle" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="IBM" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Information Management" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Oracle" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Vendors" />
        
<content type="xhtml" xml:lang="en-US" xml:base="http://clearim.typepad.com/clearim/">
<div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://clearim.typepad.com/.a/6a010535118625970b010536fb51dd970b-pi" style="display: inline;"><img alt="Lights" border="0" class="at-xid-6a010535118625970b010536fb51dd970b image-full " src="http://clearim.typepad.com/.a/6a010535118625970b010536fb51dd970b-800wi" title="Lights" /></a>
 Seth Grimes has a great post on Intelligent Enterprise about <a href="http://www.intelligententerprise.com/blog/archives/2009/01/vendors_stabili.html"> vendor stability</a>. 
He makes the point that while technology is important, the stability of
the vendor is too and should be considered.  Seth presents two
companies that have compelling technology but could face uncertain
futures for other reasons.  </p><p>I agree with him that vendor stability does matter and have made similar points before when talking about <a href="http://clearim.typepad.com/clearim/2009/01/the-buzz-around-open-source-bi.html">open source technology</a>. 
Newer entrants into the market face steep uphill battles to get their
products into the mainstream and become profitable.  Every now and
then, one breaks through and becomes a star, but most will either be
acquired (best case) or die a slow death (worst).   Either way,
customers face risks if the technology is critical to their
business.  </p><p>A number of analysts in the industry are quick to
get fired up by new
technology and vendors as they enter the market.  Typically, these
companies are very accessible and offer compelling stories.  You will
see press releases, blog entries, white papers, etc. touting the
benefits of the new technology.  All the publicity is great, but it
doesn't necessarily translate into genuine growth.  </p><p>I
would not suggest that these vendors be ignored by the industry, nor
should they be denied consideration by businesses looking for
solutions.  In fact, I would urge companies to look at all the options
and measure them according to their requirements.  Many offer technical breakthroughs and exceptional value.  </p><p>The
challenge for start ups is that the large majority of IT decisions
are made for reasons that go beyond simply cost and performance. 
Personal relationships, experience, and existing skill sets factor into
decisions as well.  While names of established vendors like Oracle and
IBM aren't as sexy and may not be as fun to talk about in the
blogosphere, the fact is that these companies will continue to dominate
the mass market for information management software.  My point is not to dismiss the smaller guys but to point out that stability is a reasonable factor for choosing an established vendor. </p></div>
</content>



    </entry>
    <entry>
        <title>Latest Gartner Reports</title>
        <link rel="alternate" type="text/html" href="http://clearim.typepad.com/clearim/2009/01/latest-gartner-reports.html" />
        <link rel="replies" type="text/html" href="http://clearim.typepad.com/clearim/2009/01/latest-gartner-reports.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-61781982</id>
        <published>2009-01-22T17:27:10-06:00</published>
        <updated>2009-01-22T17:27:10-06:00</updated>
        <summary>The latest Gartner Magic Quadrants for Data Warehouse Database Management Systems, Business Intelligence Platforms, CPM Suites have been released. You can find them respectively, here, here, and here. I don't have too much to say about these. Curt Monash has...</summary>
        <author>
            <name>Stuart Mullins</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Business Intelligence" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Warehousing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Gartner" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="HP" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Sybase" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="Business Intelligence" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Data Warehousing" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Gartner Magic Quadrant" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Neoview" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Sybase" />
        
<content type="xhtml" xml:lang="en-US" xml:base="http://clearim.typepad.com/clearim/">
<div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.gartner.com" onclick="window.open(this.href,'_blank','scrollbars=no,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false" style="float: left;"><img alt="Gartnerlogo" border="0" class="at-xid-6a010535118625970b010536e5a410970b " src="http://clearim.typepad.com/.a/6a010535118625970b010536e5a410970b-800wi" style="margin: 4px;" title="Gartnerlogo" /></a>
 The latest Gartner Magic Quadrants for Data Warehouse Database Management Systems, Business Intelligence Platforms, CPM Suites have been released.  You can find them respectively, <a href="http://mediaproducts.gartner.com/reprints/sas/vol5/article8/article8.html">here</a>, <a href="http://mediaproducts.gartner.com/reprints/oracle/article51/article51.html">here</a>, and <a href="http://mediaproducts.gartner.com/reprints/microsoft/vol3/article7/article7.html">here</a>.  I don't have too much to say about these.  Curt Monash has some interesting comments <a href="http://www.dbms2.com/2009/01/22/gartners-2009-magic-quadrant-for-business-intelligence/">here</a> and <a href="http://www.dbms2.com/2009/01/12/gartners-2008-data-warehouse-database-management-system-magic-quadrant-is-out/">here</a>.  For the most part, I agree with Curt.  These reports are interesting but not always reflective of what is happening in real world implementations.  </p><p>For example, Sybase is ranked only slighly above HP in ability to execute, yet Sybase has about 1500 Sybase IQ customers and HP might have as many as 30 Neoview customers.  And how many of those are in production?  Maybe, none of them are.  </p><p>Yes, Sybase IQ and HP Neoview are two different products with different strategies and target markets.  That is exacltly the point.  If you only looked at the MQ, then you would think these two products are close, but there would be nothing further from the truth.  </p></div>
</content>



    </entry>
    <entry>
        <title>A Brief Introduction to Columnar Databases Part 2</title>
        <link rel="alternate" type="text/html" href="http://clearim.typepad.com/clearim/2009/01/a-brief-introduction-to-columnar-databases-part-2.html" />
        <link rel="replies" type="text/html" href="http://clearim.typepad.com/clearim/2009/01/a-brief-introduction-to-columnar-databases-part-2.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-61725598</id>
        <published>2009-01-22T16:11:15-06:00</published>
        <updated>2009-01-30T14:56:38-06:00</updated>
        <summary>In the last post, I discussed the basics of column-store databases, and how they differ from traditional row oriented databases. Here, we'll discuss the benefits that column orientation provides. Column stores provide two primary advantages for analytical queries over row...</summary>
        <author>
            <name>Stuart Mullins</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Column Stores" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Warehousing" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="Analytic Database" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Column Stores" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Columnar Database" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Database" />
        <category scheme="http://sixapart.com/ns/types#tag" term="DBMS" />
        
<content type="xhtml" xml:lang="en-US" xml:base="http://clearim.typepad.com/clearim/">
<div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://clearim.typepad.com/.a/6a010535118625970b010536eb7a6f970c-pi" style="float: left;"><img alt="Parthenon" border="0" class="at-xid-6a010535118625970b010536eb7a6f970c " src="http://clearim.typepad.com/.a/6a010535118625970b010536eb7a6f970c-800wi" style="margin: 5px; width: 289px; height: 213px;" title="Parthenon" /></a>
 In the last post, I discussed the basics of column-store databases, and how they differ from traditional row oriented databases.  Here, we'll discuss the benefits that column orientation provides.</p><p>Column stores provide two primary advantages for analytical queries over row stores, compression ability and query performance.</p><p><strong>Compression</strong></p><p>Column orientation allows compression algorithms that are not possible or have limited value in a row store.  For example, run-length encoding replaces a sequence of repeated values with a number and a count.  The sequence, (10, 10, 12, 12, 12, 12) could be replaced with (10*2, 12*4).  This not only saves space but allows operations directly against the compressed values, saving CPU time that would normally be spent on decompression.  In addition, query operations are being performed on multiple values at once saving even more time.  Other compression schemes, such as null suppression, bit-vector, and dictionary encoding may also be used to some extent, depending on the package.</p><p>Column store compression may vary between 3 times and 20 times the size of the raw data.  It all depends on the profile of the data and the type of compression used.  In one case study, Par Accel customer Merkle, claims 6 times compression from the raw data and 100X query performance over Oracle (12 minutes --&gt; 7 seconds).  </p><p><strong>Query Performance</strong></p><p>I touched on the query performance advantages in the last post.  Because analytical queries tend to select a few columns over a number of records, a column store can retrieve records more quickly because it doesn't have to scan as much of the disk as a row store.  Most row stores process a row at a time, meaning the query executor reads the entire row and then extracts the needed attributes.  The column store can treat the entire column as an array, iterating through the column very quickly.  </p><p><strong>What's the Catch?</strong></p><p>You might be thinking, "that sounds great.  Why haven't column stores taken over data warehousing?"  Well, the trade-off for column stores is that single row selects and updates are not very quick.  For that matter, neither are trickle feeds found in near real-time data warehouses.  Column stores operate best in batch loads using their own optimized load utilities.  For many organizations that tend to standardize on one of the big three database vendors, the need for flexibility outweighs the need for performance.</p><p>Most data warehouses serve multiple purposes.  Yes, they are used for analytic queries, but they are also used for research, deconstructing history, fraud detection, and many other applications that may not be optimal for a column store.  </p><p>Recently, we have begun seeing a shift in the market toward creating separate analytic data marts for intense, ad-hoc queries.  Data warehouse appliances have seemingly opened the door, as companies are looking for low cost alternatives to beefing up their existing data warehousing hardware.  Several column store vendors have come to market in the past 2 or 3 years.  It is still early in their evolution, but it will be quite interesting to see if these products begin to reach mainstream status. </p><p /></div>
</content>



    </entry>
 
</feed><!-- ph=1 -->

