<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" gd:etag="W/&quot;DUANRnk4cCp7ImA9WxBWEk0.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672</id><updated>2010-02-03T09:36:37.738-05:00</updated><title>Star Schema Central</title><subtitle type="html">Information and Resources for Data Warehousing Professionals</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://blog.oaktonsoftware.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/" /><link rel="hub" href="http://pubsubhubbub.appspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;orderby=published&amp;v=2" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>34</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/StarSchemaCentral" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="starschemacentral" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><entry gd:etag="W/&quot;DUAASXk_fyp7ImA9WxBWEk0.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-7622150904118707925</id><published>2010-02-03T09:32:00.002-05:00</published><updated>2010-02-03T09:35:48.747-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-03T09:35:48.747-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Basics" /><title>Dimensional and Relational: Not Opposites</title><content type="html">A common misconception holds that the terms &lt;i&gt;dimensional &lt;/i&gt;and &lt;i&gt;relational &lt;/i&gt;are opposites. They are not.&lt;br /&gt;
&lt;br /&gt;
The word "dimensional" describes a design method. The word "relational" describes a data storage technology.&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;A &lt;b&gt;dimensional model &lt;/b&gt;is a design approach that describes&amp;nbsp; a process in terms of measurements (known as &lt;i&gt;facts&lt;/i&gt;) and their context (&lt;i&gt;dimensions&lt;/i&gt;)&lt;/li&gt;
&lt;li&gt;A &lt;b&gt;star schema &lt;/b&gt;is a dimensional model implemented using relational storage technology—that is, in a relational database (RDBMS)&lt;/li&gt;
&lt;li&gt;A &lt;b&gt;cube &lt;/b&gt;is a dimensional model implemented using multi-dimensional storage technology—that is, in a multidimensional database (MDB)&lt;/li&gt;
&lt;/ul&gt;This simple diagram illustrates these concepts:&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_oaFlRokTed0/S2mFQLuMauI/AAAAAAAAAHU/u2KMj9l58yU/s1600-h/Clipboard01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="272" src="http://4.bp.blogspot.com/_oaFlRokTed0/S2mFQLuMauI/AAAAAAAAAHU/u2KMj9l58yU/s320/Clipboard01.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;As you can see from the diagram, a star schema is &lt;i&gt;both &lt;/i&gt;relational &lt;i&gt;and &lt;/i&gt;dimensional. So is a snowflake schema.&lt;br /&gt;
&lt;br /&gt;
By the way, don't let this confuse you:&amp;nbsp; most modern day DBMS products accommodate both kinds of storage.&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-7622150904118707925?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=7622150904118707925&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/7622150904118707925?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/7622150904118707925?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2010/02/dimensional-and-relational-not.html" title="Dimensional and Relational: Not Opposites" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/_oaFlRokTed0/S2mFQLuMauI/AAAAAAAAAHU/u2KMj9l58yU/s72-c/Clipboard01.png" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CEIHRXc-eip7ImA9WxBQFEQ.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-4925853330755481035</id><published>2010-01-14T13:05:00.001-05:00</published><updated>2010-01-14T13:08:54.952-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-14T13:08:54.952-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Q and A" /><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>Accumulating Snapshots: Are They Necessary? [Q&amp;A]</title><content type="html">A reader asks the following question:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;Q: We have an accumulating snapshot that tracks trouble tickets from submission to closure.  The fact table has one row for each trouble ticket.  Why do we need it? Can't we just place all this information in the trouble ticket dimension?&lt;br /&gt;
&lt;/blockquote&gt;A: You probably do need the fact table, but it might be worth putting some of the information in the dimension table as well.&lt;br /&gt;
&lt;br /&gt;
Before I explain, here is a quick refresher on the accumulating snapshot.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Accumulating Snapshot&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
An accumulating snapshot is a kind of fact table used to correlate activities surrounding a specified item.  In this case, that item is a trouble ticket.  &lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_oaFlRokTed0/S09Yfcj9pQI/AAAAAAAAAFY/dJ6Ci367z7g/s1600-h/3520691955_6d3cbdf630_m.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_oaFlRokTed0/S09Yfcj9pQI/AAAAAAAAAFY/dJ6Ci367z7g/s200/3520691955_6d3cbdf630_m.jpg" /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;/div&gt;Accumulating snapshots are useful because they simplify the queries necessary to study elapsed time between events.  In the reader's case, that might be the time between submission and closure of a ticket (there are probably some interim steps as well).  &lt;br /&gt;
&lt;br /&gt;
On the operational side, these events are probably recorded in different rows, and maybe even in different systems.  Figuring out the days between these events can be messy, especially if averaging a results for a large number of tickets.  Correlated subqueries will be reqiured, and performance may be poor.&lt;br /&gt;
&lt;br /&gt;
The accumulating snapshot does the correlation of these activities ahead of time, as part of the ETL process.  In this case, it has one row for each ticket.  Presumably, there are foreign keys that identify the ticket, the date submitted, and the date closed.  An optional fact can record the elapsed time between these dates.  This fact is very easy to aggregate and group by any dimension of interest.  No need for for correlated subqueries.&lt;br /&gt;
&lt;br /&gt;
You can read more about accumulating snapshots in a &lt;a href="http://blog.oaktonsoftware.com/2007/03/accumulating-snapshot-use-accumulating.html"&gt;prior post&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;The Fact Table is Necessary&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The reader has noted that the fact table has one row per trouble ticket.  (That's a good &lt;a href="http://blog.oaktonsoftware.com/2009/12/rule-1-state-your-grain.html"&gt;statement of grain&lt;/a&gt;, by the way.)  Since there is only one row for each ticket, it might seem that the content of the fact table could be simply be placed in the ticket dimension.&lt;br /&gt;
&lt;br /&gt;
This is probably not a good idea.  Though your fact table has one row per ticket, your ticket dimension may have more than one, due to &lt;a href="http://blog.oaktonsoftware.com/2007/10/for-slowly-changing-dimensions-change.html"&gt;type 2 changes&lt;/a&gt; that track history.&amp;nbsp; For a particular trouble ticket, when a type 2 change occurs, a new row is added for the new version of the ticket. A given ticket may therefore have more than one row in the dimension.&lt;br /&gt;
&lt;br /&gt;
If you were to place a fact in this dimension table, like the days between submission and closure, it would be repeated for each version of the same ticket.  Aggregating it would result in double counting.  &lt;br /&gt;
&lt;br /&gt;
The fact is better placed in the fact table, where they can safely be summed, averaged, or whatever is necessary.&lt;br /&gt;
&lt;br /&gt;
(By the way, the same would occur if you placed the two dates in the dimension table.  They would repeat for each version of the ticket.  If you try to aggregate the elapsed time between the dates, the same ticket would be counted multiple times -- once for each version.)&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;&lt;i&gt;&lt;/i&gt;Place Information in the Dimension &lt;i&gt;Also &lt;/i&gt;(Not Instead)&lt;br /&gt;
&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Information tracked by the accumulating snapshot may &lt;i&gt;also &lt;/i&gt;be useful in the dimension table. But here, its purpose is to serve as dimensional data.&amp;nbsp; Not as a replacement for the fact table.&lt;br /&gt;
&lt;br /&gt;
For example, the dates the ticket was opened and closed could make helpful type 1 attributes in the ticket dimension.&amp;nbsp; This is especially useful if the dimension is connected to other fact tables, such as one that tracks phone calls to the support center.  The dates can be used to group call counts in interesting ways.  For example, "How many support calls were generated by tickets opened on January 1 vs. January 2."&lt;br /&gt;
&lt;br /&gt;
If you want to, you can also place the elapsed days fact in the dimension table.  Here, it should be used strictly as a &lt;a href="http://blog.oaktonsoftware.com/2009/01/q-customers-with-no-sales-in-fact-table.html"&gt;behavioral dimension&lt;/a&gt; -- not as a fact.&amp;nbsp;  Use it to group results.  Don't try to use it as a fact; that would cause double counting.  You might even want to convert it into bands to avoid confusion (1-5 days, 6-10 days, over 10 days.)&lt;br /&gt;
&lt;br /&gt;
- Chris&lt;br /&gt;
&lt;div style="text-align: right;"&gt;Image Credit:&amp;nbsp; &lt;a href="http://www.flickr.com/photos/36894712@N04/3520691955/" target="_blank"&gt;Creativity103&lt;/a&gt; via &lt;a href="http://creativecommons.org/licenses/by/2.0/" target="_blank"&gt;Creative Commons&lt;/a&gt;&lt;br /&gt;
&lt;/div&gt;&lt;br /&gt;
Do you have a question of your own?  Send it in.  I'd be happy to answer.&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-4925853330755481035?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=4925853330755481035&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/4925853330755481035?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/4925853330755481035?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2010/01/accumulating-snapshots-are-they.html" title="Accumulating Snapshots: Are They Necessary? [Q&amp;A]" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/_oaFlRokTed0/S09Yfcj9pQI/AAAAAAAAAFY/dJ6Ci367z7g/s72-c/3520691955_6d3cbdf630_m.jpg" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;DUANRnk_fip7ImA9WxBWEk0.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-6846365549434298003</id><published>2010-01-14T12:00:00.003-05:00</published><updated>2010-02-03T09:36:37.746-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-03T09:36:37.746-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Announcements" /><title>Mobile Safari Glitch Corrected [Announcements]</title><content type="html">I received a report that &lt;a href="http://www.starschemacentral.com/"&gt;www.StarSchemaCentral.com&lt;/a&gt; was not redirecting properly for some readers using the mobile version of the Safari browser.&lt;br /&gt;
&lt;br /&gt;
I believe the issue has been corrected.&lt;br /&gt;
&lt;br /&gt;
If you continue to have problems, you can always use &lt;a href="http://blog.oaktonsoftware.com/"&gt;blog.oaktonsoftware.com&lt;/a&gt; to get here instead.&lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
-Chris&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-6846365549434298003?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=6846365549434298003&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/6846365549434298003?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/6846365549434298003?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2010/01/mobile-safari-glitch-corrected.html" title="Mobile Safari Glitch Corrected [Announcements]" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CEACQHY_eyp7ImA9WxBQEk4.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-3108198484393523822</id><published>2010-01-11T12:51:00.004-05:00</published><updated>2010-01-11T12:59:21.843-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-11T12:59:21.843-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Announcements" /><title>TDWI Members: Read Chris's "Ten Mistakes To Avoid"</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.tdwi.org/Publications/TenMistake/display.aspx?id=9692" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" target="_blank"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_oaFlRokTed0/S0tkY_szjbI/AAAAAAAAAFQ/KqRXYX9GXSU/s200/Top10_Q409.gif" /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;/div&gt;If you are a member of The Data Warehousing Institute, be sure to check out your Q4 2009 benefits.&lt;br /&gt;
&lt;br /&gt;
I wrote an installment of the "Ten Mistakes" series entitled &lt;a href="http://www.tdwi.org/Publications/TenMistake/display.aspx?id=9692" target="_blank"&gt;Ten Mistakes to Avoid in Dimensional Design&lt;/a&gt;.&amp;nbsp; It covers common blunders in the use of dimensional design, both technical and non-technical.&lt;br /&gt;
&lt;br /&gt;
This publication is only available to TDWI members.&lt;br /&gt;
&lt;br /&gt;
- Chris&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-3108198484393523822?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=3108198484393523822&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/3108198484393523822?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/3108198484393523822?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2010/01/tdwi-members-read-chriss-ten-mistakes.html" title="TDWI Members: Read Chris's &quot;Ten Mistakes To Avoid&quot;" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/_oaFlRokTed0/S0tkY_szjbI/AAAAAAAAAFQ/KqRXYX9GXSU/s72-c/Top10_Q409.gif" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;D0MNQ3Y5eyp7ImA9WxBREUU.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-685806220469979556</id><published>2009-12-16T11:10:00.010-05:00</published><updated>2009-12-30T10:04:52.823-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-30T10:04:52.823-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>Dimensional Designs are not Denormalized</title><content type="html">I don’t usually bother to take a stand on issues surrounding terminology or theory.&amp;nbsp; For this post I am going to make an exception.&lt;br /&gt;
&lt;br /&gt;
I often hear people say that a star schema is a &lt;i&gt;denormalized &lt;/i&gt;data model.&amp;nbsp; There is an element of truth hiding in those words, but I think it describes how some people design stars, and not what a star is.&amp;nbsp; A better description is &lt;i&gt;unnormalized &lt;/i&gt;or &lt;i&gt;not normalized&lt;/i&gt;. But even these words do not describe all stars.&lt;br /&gt;
&lt;br /&gt;
Regardless of the words you choose, using the vocabulary of normalization is generally a disservice to the novice designer.&amp;nbsp; And &lt;i&gt;none &lt;/i&gt;of these words should ever by used to &lt;i&gt;define&lt;/i&gt; the star schema.&lt;br /&gt;
&lt;br /&gt;
Continue reading to learn why.&lt;br /&gt;
&lt;b&gt;&lt;/b&gt;&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;b&gt;Normal Forms&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The word “normalized” is used in connection with various properties that &lt;i&gt;may&lt;/i&gt; be exhibited by a relational set, or table.&amp;nbsp; (I’m going to gloss over the difference here, and just hit the highlights.)&amp;nbsp; These properties describe specific characteristics of the structure of the table or of the relationships between its attributes.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
It’s important to understand that we are talking about &lt;i&gt;properties&lt;/i&gt;, or &lt;i&gt;characteristics &lt;/i&gt;here.&amp;nbsp; In fact, these properties have more specific names: &lt;i&gt;first normal form&lt;/i&gt;, &lt;i&gt;second normal form&lt;/i&gt; and so forth.&amp;nbsp; For operational systems, designers generally strive to achieve &lt;i&gt;third normal form&lt;/i&gt;.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
If you want to learn about normal forms, you can start with &lt;a href="http://en.wikipedia.org/wiki/Database_normalization" target="_blank"&gt;Wikipedia&lt;/a&gt;.&amp;nbsp; Better yet, consult &lt;a href="http://www.amazon.com/gp/product/0321197844?ie=UTF8&amp;amp;tag=oaktonsoftwar-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0321197844" target="_blank"&gt;An Introduction to Database Systems&lt;/a&gt;&lt;img alt="" border="0"  height="1" src="http://www.assoc-amazon.com/e/ir?t=oaktonsoftwar-20&amp;amp;l=as2&amp;amp;o=1&amp;amp;a=0321197844" style="border: medium none ! important; margin: 0px ! important;" width="1" /&gt;, an essential volume by Chris Date.&amp;nbsp; But you don’t need to understand normal forms to finish reading this article.&lt;br /&gt;
&lt;br /&gt;
What you do need to understand is that a table design that satisfies third normal form is devoid of several characteristics that are typically found in the dimension tables of a star schema.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
For this reason, people familiar with the principles of normalization, or who have backgrounds in E/R modeling, often refer to dimension tables as “denormalized.”&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Denormalized?&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
To say that a something is &lt;i&gt;denormalized &lt;/i&gt;implies that it was once &lt;i&gt;normalized&lt;/i&gt;, and that someone acted to reverse that characteristic.&amp;nbsp; Some people arrive at their dimensional designs in this way, but not all of us do.&lt;br /&gt;
&lt;br /&gt;
Defining a star schema as denormalized suggests that one of the following happened:&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;The designer first produced a normalized “logical design” to fit the business requirements, then denormalized it to create the “physical design”(presumably for performance reasons)&lt;/li&gt;
&lt;li&gt;The designer took the normalized design of a source system, and denormalized it for the target system.&lt;/li&gt;
&lt;/ul&gt;Certainly, both of these techniques can be used to produce star schema designs.&amp;nbsp; However, I’ve never put a design together in either of these ways.&amp;nbsp; (Well, almost never.)&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
You &lt;i&gt;can &lt;/i&gt;(and usually &lt;i&gt;should&lt;/i&gt;) design a star schema without ever denormalizing anything.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
Start with a business process and observe how it is measured.&amp;nbsp; Identify the measurements (facts) and their context (dimensions).&amp;nbsp; Group the dimensions together, add some keys.&amp;nbsp; The result is a star schema.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
I do this all the time, and have no qualms with signing my name to the following statement:&lt;br /&gt;
&lt;blockquote&gt;No tables were denormalized during the design of this schema.&lt;br /&gt;
&lt;/blockquote&gt;The result is not normalized, to be sure.&amp;nbsp; But it has not been actively denormalized, either.&amp;nbsp; In fact, steps are usually taken &lt;i&gt;toward &lt;/i&gt;one of the normal forms, not away.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
A better term is &lt;i&gt;unnormalized&lt;/i&gt;, or simply &lt;i&gt;not normalized&lt;/i&gt;. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;A Non-essential Characteristic&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Whatever term you use—denormalized, not normalized, unnormalized—it is important to recognize that it describes &lt;i&gt;some &lt;/i&gt;stars, but not &lt;i&gt;all&lt;/i&gt;.&amp;nbsp; The non-normal format of a star is &lt;i&gt;not an essential characteristic&lt;/i&gt; of its design.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
It is possible for a star schema to be in normal form.&amp;nbsp; Most stars violate first normal form, but there are some that do not.&amp;nbsp; Of those, some may also be considered to satisfy second normal form, though this will depend on what you are willing to accept as a candidate key.&amp;nbsp; And I have seen stars that fit the definition of third normal form as well.&lt;br /&gt;
&lt;br /&gt;
It is clearly incorrect to define a star schema as a group of tables where the dimensions are not normalized.&amp;nbsp; In New York, most taxis are yellow, but New Yorkers do not define taxis as “yellow cars.”&amp;nbsp; Nor should a star schema be defined as a denormalized table design.&lt;br /&gt;
&lt;br /&gt;
So what is a star, then?&amp;nbsp; It is the implementation of a dimensional model in relational tables.&amp;nbsp; A dimensional model describes a business process in terms of measurements (facts) and their context (dimensions.)&amp;nbsp; It can also be implemented in a multidimensinal database, where the result is a &lt;i&gt;cube&lt;/i&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;An Alternate Vocabulary&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
We get wrapped around the axle in conversations like this because we are really trying to apply principles from once discipline to activities in another.&amp;nbsp; To someone with a hammer, everything looks like a nail. But sometimes it’s not the right tool for the job. &lt;br /&gt;
&lt;br /&gt;
The vocabulary of normalization includes terms like &lt;i&gt;candidate key&lt;/i&gt;, &lt;i&gt;superkey&lt;/i&gt;, &lt;i&gt;non-prime attribute&lt;/i&gt;.&amp;nbsp; These terms are used to define principles of normalization that help us develop &lt;i&gt;operational systems &lt;/i&gt;which maintain integrity in the face of insert, update and delete activities.&lt;br /&gt;
&lt;br /&gt;
Dimensional design has its own vocabulary, which is suited to the design of tables for &lt;i&gt;analytic systems&lt;/i&gt;.&amp;nbsp; This vocabulary includes terms like &lt;i&gt;natural key&lt;/i&gt;, &lt;i&gt;surrogate key&lt;/i&gt;, &lt;i&gt;grain&lt;/i&gt;, &lt;i&gt;type 1 attribute&lt;/i&gt;, and &lt;i&gt;type 2 attribute&lt;/i&gt;.&amp;nbsp; They are used to form a set of principles that guide the development systems where the primary interaction with is query, and where insert update and delete anomalies can be controlled programmatically. The best thing you can do when designing dimensional data structures is to forget about the vocabulary of normalization. &lt;br /&gt;
&lt;br /&gt;
These two vocabularies are not opposites, they are simply different.&amp;nbsp; Each is useful in designing a particular kind of database system.&amp;nbsp; Neither is “right” or “wrong.”&amp;nbsp; And most importantly, neither is more or less “relational.”&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Why This Long Rant?&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
As I said at the top of this post, I usually don't like to get involved in questions of theory or terminology.&amp;nbsp; My focus is usually on getting things done.&lt;br /&gt;
&lt;br /&gt;
But I teach dimensional design, and I find that students who cannot separate these two disciplines have the most trouble.&lt;br /&gt;
&lt;br /&gt;
Do everyone a service, and stop using the word "denormalized" to define a star schema.&lt;br /&gt;
&lt;br /&gt;
(End of rant.)&lt;br /&gt;
&lt;br /&gt;
-Chris&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-685806220469979556?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=685806220469979556&amp;isPopup=true" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/685806220469979556?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/685806220469979556?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/12/dimensional-designs-are-not.html" title="Dimensional Designs are not Denormalized" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total></entry><entry gd:etag="W/&quot;DEAHSHc6cSp7ImA9WxBTF0Q.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-7901019044133469894</id><published>2009-12-09T12:09:00.003-05:00</published><updated>2009-12-14T08:18:59.919-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-14T08:18:59.919-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Basics" /><title>Rule 1: State Your Grain</title><content type="html">&lt;i&gt;Make sure you have a statement of grain for each fact table or cube in your dimensional design.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://2.bp.blogspot.com/_oaFlRokTed0/Sx_Y2kAyp2I/AAAAAAAAAFI/7Rkg1VE56kc/s1600-h/271893974_d7a19e1d02_m.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_oaFlRokTed0/Sx_Y2kAyp2I/AAAAAAAAAFI/7Rkg1VE56kc/s200/271893974_d7a19e1d02_m.jpg" /&gt;&lt;/a&gt;I receive lots of questions from people who are working through an issue with a star or cube.   Most of the time, I must counter with a question of my own:  “What is the grain?”   Without this basic information, it is usually impossible to comment on whatever design issue the person is facing.&lt;br /&gt;
&lt;br /&gt;
Being able to state grain is important, and not just for its value as a conversation starter.  Here’s a brief look at what grain is, how to define it, and what happens if you don’t.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;What is Grain&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
A statement of grain identifies what is represented by a single, granular row in a fact table (or an un-aggregated measure in a cube.)   Each and every row in the fact table should meet this definition, with no wiggle-room.&lt;br /&gt;
&lt;br /&gt;
When grain is not explicitly defined, or is defined in an ambiguous way, all manner of problems may arise.  In addition to hampering your ability to talk about the design with someone else, ill-defined grain can cause to severe technical challenges in the reporting process.&amp;nbsp; It may even lead to reports that are just plain &lt;i&gt;wrong&lt;/i&gt;.  &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Defining Grain&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
There are two ways to state the grain of a fact table.  The first way is to use business language, referencing a specific, well-understood artifact of the activity described by the star.  For example, a star that describes orders may have the following grain:  “Order measurements at the order-line level of detail.”  That sums things up pretty well.  Each row of the fact table corresponds to a single order line.&lt;br /&gt;
&lt;br /&gt;
Sometimes it is not easy to state grain in this manner.  When that’s the case, you use the dimensions in a star to indicate what each unique row in the fact table represents.  For example, a star that tracks the processing milestones of mortgage applications might have the following grain: “Processing measurements by application and status.”  This fact table will have a new row each time an application (one dimension) undergoes a change in status (another dimension.)&lt;br /&gt;
&lt;br /&gt;
When using dimensional terms to define grain, do not simply rattle off all the dimensions present in the star.  Instead, list only those that are necessary to define a unique row.  In the mortgage status change example, it is not necessary to mention that the star will also contain dimensions for date, customer, mortgage officer, mortgage product, and so forth.&lt;br /&gt;
&lt;br /&gt;
Business definitions of grain are usually used for transaction fact tables.&amp;nbsp; Dimensional grain definitions are commonly used for snapshots, accumulating snapshots, derived schemas and aggregates.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Fuzzy Grain&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Poorly defined grain can lead to trouble.  Ill-defined grain can mask a situation where a fact table is actually being used to track two or more processes.  It can also lead to situations in which the fact table contains two or more levels of aggregation.  In the former case, single-process reporting will be hampered. BI developers will be bending over backwards to focus on the relevant subset of data.&amp;nbsp;  In the latter case, double-counting, triple-counting or worse is possible.&lt;br /&gt;
&lt;br /&gt;
This is why “sate your grain” is rule #1 in any dimensional design.&lt;br /&gt;
&lt;br /&gt;
-Chris&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;div style="text-align: right;"&gt;Image by  &lt;a href="http://www.flickr.com/photos/gx9/" rel="cc:attributionURL" target="_blank"&gt; GravityX9&lt;/a&gt; licensed under &lt;a href="http://creativecommons.org/licenses/by/2.0/" rel="license" target="_blank"&gt;Creative Commons 2.0&lt;/a&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-7901019044133469894?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=7901019044133469894&amp;isPopup=true" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/7901019044133469894?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/7901019044133469894?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/12/rule-1-state-your-grain.html" title="Rule 1: State Your Grain" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_oaFlRokTed0/Sx_Y2kAyp2I/AAAAAAAAAFI/7Rkg1VE56kc/s72-c/271893974_d7a19e1d02_m.jpg" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></entry><entry gd:etag="W/&quot;DEMDQHszeSp7ImA9WxNbFkk.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-3990369708756735192</id><published>2009-11-19T10:47:00.006-05:00</published><updated>2009-11-19T11:01:11.581-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-11-19T11:01:11.581-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="People" /><category scheme="http://www.blogger.com/atom/ns#" term="Strategy" /><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>The Tolkien Effect</title><content type="html">&lt;i&gt;Schema designers must be on the lookout for data elements that are known by more than one name.&amp;nbsp; Equally common is the use of a single name to signify very different things.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
It may surprise you to learn that there is an important connection between data warehouse design and &lt;a href="http://www.amazon.com/gp/product/0345340426?ie=UTF8&amp;amp;tag=oaktonsoftwar-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=0345340426" target="_blank"&gt;The Lord of the Rings.&lt;/a&gt;&lt;img alt="" border="0" class=" trjzsgtjeevpvwqrkgss trjzsgtjeevpvwqrkgss trjzsgtjeevpvwqrkgss trjzsgtjeevpvwqrkgss trjzsgtjeevpvwqrkgss trjzsgtjeevpvwqrkgss trjzsgtjeevpvwqrkgss trjzsgtjeevpvwqrkgss trjzsgtjeevpvwqrkgss trjzsgtjeevpvwqrkgss trjzsgtjeevpvwqrkgss" height="1" src="http://www.assoc-amazon.com/e/ir?t=oaktonsoftwar-20&amp;amp;l=as2&amp;amp;o=1&amp;amp;a=0345340426" style="border: medium none ! important; margin: 0px ! important;" width="1" /&gt; In the books, author JRR Tolkien challenges readers by using many different names, without explanation, for the same character.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://3.bp.blogspot.com/_oaFlRokTed0/SwVk06zoDAI/AAAAAAAAAFA/RynxD9wWqeI/s1600/3217184131_afdb40069e_m.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_oaFlRokTed0/SwVk06zoDAI/AAAAAAAAAFA/RynxD9wWqeI/s320/3217184131_afdb40069e_m.jpg" /&gt;&lt;/a&gt;The character Aragorn, for example, is also known as Strider, Dunadan, the Heir of Isildur, and several other names and titles.&amp;nbsp; Each name, it turns out, is associated with a different culture or point of view.&lt;br /&gt;
&lt;br /&gt;
At first, all this can be deeply confusing.&amp;nbsp; With a little effort and patience, however, things begin to make sense, and the end result can be deeply rewarding.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;The Tolkien Effect in Business&lt;/b&gt; &lt;br /&gt;
&lt;br /&gt;
The same kind of thing happens when gathering requirements for a dimensional model.&amp;nbsp; Within a business, it is commonplace to find several names for the same thing.&amp;nbsp;&amp;nbsp; Different departments, for example, may refer to products in different ways.&amp;nbsp; Even within a department, there may be multiple names for the same thing.&lt;br /&gt;
&lt;br /&gt;
Depending on your previous experience with the area in question, it may take you some time to realize this is going on. I will never forget the day&amp;nbsp; I realized that a finance group meant the same thing by Ten-Digit-Department, Level 3 Code and Budget Line.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
It’s crucial to identify these situations, or the capabilities of your model will be crippled.&amp;nbsp; Data elements of interest in multiple contexts should be given a single, shared definition in your model.&amp;nbsp; For dimensions in particular, this will be crucial in supporting analysis that crosses subject areas.&lt;br /&gt;
&lt;br /&gt;
These shared dimensions are called conformed dimensions, and they are the key to avoiding stove-pipe subject areas. Even within a subject area, this can be crucial.&amp;nbsp; The Ten-Digit-Department realization was essential in permitting comparison of budgets to actuals.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;The Reverse-Tolkien&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The converse is also a commonplace: a single name used to signify very different things.&amp;nbsp; The best example of this is “Sales.”&amp;nbsp; A salesperson will often use this word to refer to an &lt;i&gt;order &lt;/i&gt;or &lt;i&gt;contract&lt;/i&gt;.&amp;nbsp; In finance, however, the word is reserved for the event that allows the recognition or revenue, which is often &lt;i&gt;fulfillment &lt;/i&gt;or &lt;i&gt;shipment&lt;/i&gt; of the order.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
Once again, it is crucial that analyst keep an eye out for these situations; failure to produce consistent and well defined definitions for each fact or measurement is also a failure of conformance.&amp;nbsp; The result will be challenges to the accuracy of the data, distrust of the solution, and a lack of user adoption.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;What You Can Do&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
How then to avoid these problems?&amp;nbsp; &lt;i&gt;Listen&lt;/i&gt;.&amp;nbsp; Don’t assume you know what people mean. Repeat things back in your own words.&amp;nbsp; Be sure to write down and review definitions of each data element.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
Look out for The Tolkien Effect.&amp;nbsp; Pay close attention to people who live and work on the cusp of two subject areas or departments, as they will be keenly aware of these kind of linguistic challenges.&amp;nbsp; So will the data administrator, if your organization has one.&lt;br /&gt;
&lt;br /&gt;
-Chris&lt;br /&gt;
&lt;br /&gt;
&lt;div style="text-align: right;"&gt;Image Attribution:&amp;nbsp; &lt;a href="http://www.flickr.com/photos/lrargerich/" rel="cc:attributionURL" target="_blank"&gt;lrargerich&lt;/a&gt; &lt;br /&gt;
&lt;a href="http://creativecommons.org/licenses/by/2.0/" rel="license" target="_blank"&gt;CC BY 2.0&lt;/a&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-3990369708756735192?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=3990369708756735192&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/3990369708756735192?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/3990369708756735192?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/11/tolkien-effect.html" title="The Tolkien Effect" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_oaFlRokTed0/SwVk06zoDAI/AAAAAAAAAFA/RynxD9wWqeI/s72-c/3217184131_afdb40069e_m.jpg" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;DEIMQnY4eyp7ImA9WxNWFEs.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-2056986628644950511</id><published>2009-10-13T09:58:00.008-04:00</published><updated>2009-10-13T16:43:03.833-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-10-13T16:43:03.833-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Aggregates" /><category scheme="http://www.blogger.com/atom/ns#" term="Q and A" /><title>Highly Summarized Dimensions</title><content type="html">From the inbox today, a question about aggregate design:&lt;br /&gt;
&lt;blockquote&gt;Q:&amp;nbsp; "If you roll up a dimension that is hierarchical, but you are now left with the highest level of the hierarchy (no further roll-up would be possible in that dimension) - would you maintain a separate dimension for this in the aggregated star?&lt;br /&gt;
&lt;br /&gt;
"Or would you join a number of these unrelated top hierarchy levels in a junk dimension?"&lt;br /&gt;
&lt;br /&gt;
- Dublin, Ireland&lt;br /&gt;
&lt;/blockquote&gt;The reader has a potential aggregate that will summarize more than one dimension at a very high level.&amp;nbsp; For example, an aggregate of sales that summarizes the customer dimension by country and summarizes the product dimension by category. &lt;br /&gt;
&lt;br /&gt;
As a general rule of thumb, try to set up aggregate dimensions that preserve the original separation of the base dimension tables.&amp;nbsp; So rather than have country and category combined as a sort of "junk" dimension, create a rollup of customer called "country" and a rollup of product called "category."&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://2.bp.blogspot.com/_oaFlRokTed0/StSGZ_isHvI/AAAAAAAAAE4/obaok7of7Do/s1600-h/3634884155_43002641b4_m.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_oaFlRokTed0/StSGZ_isHvI/AAAAAAAAAE4/obaok7of7Do/s320/3634884155_43002641b4_m.jpg" /&gt;&lt;/a&gt;This gives the aggregate star a kind of "symmetry" with the original star.&amp;nbsp; It tends to be the most workable solution with aggregate-aware query tools, databases, etc.&amp;nbsp; It also is the easiest to manage when no tools are aggregate-aware, as it minimizes the effort required to rewrite SQL.&lt;br /&gt;
&lt;br /&gt;
If the summary is very small, it may be workable to create a "big wide table" that stores the dimension values directly with the summary facts.&amp;nbsp; This works best if you are using a tool to implement the aggregate that is able to exploit the query rewrite function of a database, like a materialized view or materialized query table.&amp;nbsp; In this case, applications can still query the "base" star, but receive the benefit of the aggregate.&lt;br /&gt;
&lt;br /&gt;
But do you really need this aggregate?&amp;nbsp; The more highly summarized an aggregate is, the fewer queries or reports it is able to accelerate.&amp;nbsp;&amp;nbsp; It is usually possible to construct aggregates that are not quite so highly summarized, giving a performance boost to a wider range of queries.&amp;nbsp; For example, summarizing a customer dimension by region, rather than country, will allow the aggregate to work with more reports--including those that focus on country.&lt;br /&gt;
&lt;br /&gt;
- Chris&lt;br /&gt;
&lt;br /&gt;
&lt;div style="text-align: right;"&gt;Image credit:&amp;nbsp; &lt;a href="http://www.flickr.com/photos/zapthedingbat/3634884155/" target="_blank"&gt;ZapTheDingbat&lt;/a&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-2056986628644950511?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=2056986628644950511&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/2056986628644950511?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/2056986628644950511?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/10/highly-summarized-dimensions.html" title="Highly Summarized Dimensions" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_oaFlRokTed0/StSGZ_isHvI/AAAAAAAAAE4/obaok7of7Do/s72-c/3634884155_43002641b4_m.jpg" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CkMMR3o9fSp7ImA9WxNQF04.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-4808446127441953238</id><published>2009-09-08T09:20:00.022-04:00</published><updated>2009-09-23T14:28:06.465-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-09-23T14:28:06.465-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Q and A" /><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>More on Surrogate Keys</title><content type="html">This post addresses a common follow-up question regarding surrogate keys:&lt;br /&gt;
&lt;blockquote&gt;"My source system already tracks changes.  In this case, do dimension tables really need their own unique identifiers?"&lt;br /&gt;
&lt;/blockquote&gt;Yes.  There may be exceptions, but they are far less common than you may think, and they limit your future options.&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;In a previous post, I laid out &lt;a href="http://blog.oaktonsoftware.com/2009/05/do-i-really-need-surrogate-keys.html"&gt;the case for surrogate keys&lt;/a&gt;.&amp;nbsp; In short:&lt;br /&gt;
&lt;ol&gt;&lt;li&gt;By not reusing unique identifiers from a source system, dimensions are free to respond to changed data in whatever manner makes most sense from an analytic perspective.  &lt;/li&gt;
&lt;li&gt;Surrogate keys are the best option, though a multi-part key is also possible.&lt;/li&gt;
&lt;/ol&gt;It is tempting to assume that if the source system logs all changes, there is no need to maintain a separate unique identifier for dimensions. I would seem that they can maintain all history, while reusing the natural keys from the source system. Usually, this is not the case.&lt;span style="font-weight: bold;"&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-weight: bold;"&gt;A Dimension is Not Equivalent to an Entity&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Unfortunately, a dimension in a data warehouse &lt;span style="font-style: italic;"&gt;rarely &lt;/span&gt;corresponds to a single entity in an operational system.  Operational systems are more highly normalized; most dimensions correspond to more than one table in a source system.  A customer dimension, for example, may draw information from multiple source tables: customer, household, address, phone_number, demographics and so forth.&lt;br /&gt;
&lt;br /&gt;
A change log for the central entity corresponding to a dimension may not reflect changes to some of the related tables, especially when they represent dependent entities or are linked through intersect tables.  If this is the case, it will be impossible for the dimension to undergo a &lt;a href="http://blog.oaktonsoftware.com/2007/10/for-slowly-changing-dimensions-change.html"&gt;type 2 response&lt;/a&gt; when ancillary information changes.&lt;br /&gt;
&lt;br /&gt;
If &lt;span style="font-style: italic;"&gt;every &lt;/span&gt;source table that feeds a dimension is logged, it may be possible to design a multi-part unique identifier that is completely recycled.  (As observed previously, compound keys in dimensions are &lt;a href="http://blog.oaktonsoftware.com/2009/05/do-i-really-need-surrogate-keys.html"&gt;feasible but not optimal&lt;/a&gt;.)   This approach can lock you into type 2 processing for defining elements of the related tables, even when a type 1 response is desired.&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-weight: bold;"&gt;Surrogate Keys Will Limit Future Rework&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Even in the rare case when a source system handles all changes exactly the way we want, reuse of the natural keys limits future options.&lt;br /&gt;
&lt;br /&gt;
If the source system is modified, the desired behavior may not be maintained.  It will either be necessary to engage in significant rework of the dimensional model (and all dependent reports), or simply live with the unsatisfactory change handling of the source system.&lt;br /&gt;
&lt;br /&gt;
Mergers and acquisitions may also bring additional source systems that correspond to an existing dimension.  These systems arrive complete with entirely different identifiers.  Incorporating them as sources for an existing dimension will always necessitate some rework of the dimension.  But if surrogate keys are in place, fact tables will be unaffected and existing reports will still work.  If, on the other hand, natural keys are the basis of unique identifiers, rework will cascade to related fact tables and also to existing queries and reports.&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-4808446127441953238?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=4808446127441953238&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/4808446127441953238?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/4808446127441953238?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/09/more-on-surrogate-keys.html" title="More on Surrogate Keys" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;DEcFRns4eCp7ImA9WxNQFkg.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-6755874805203079392</id><published>2009-07-27T11:48:00.016-04:00</published><updated>2009-09-22T17:46:57.530-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-09-22T17:46:57.530-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Strategy" /><category scheme="http://www.blogger.com/atom/ns#" term="Announcements" /><title>Recommended Books on the Data Warehouse Lifecycle</title><content type="html">&lt;span style="font-style: italic;"&gt;Recommended Reading:  A new book by Laura Reeves, and a revised edition of the classic Lifecycle Toolkit.&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
If you've been to any of my classes, you already know that I am a fan of Laura Reeves.  She has a pragmatic, get-things-done approach to data warehousing.&lt;br /&gt;
&lt;br /&gt;
You may also know her as co-author of the original edition of &lt;span style="font-style: italic;"&gt;The Data Warehouse Lifecycle Toolkit&lt;/span&gt;, a book she wrote with Ralph Kimball, Margy Ross and Warren Thornthwaite.  (For more on that book, see below.)&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.amazon.com/gp/product/0470176385?ie=UTF8&amp;amp;tag=oaktonsoftwar-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=0470176385" target="blank"&gt;&lt;img alt="" border="0" id="BLOGGER_PHOTO_ID_5363177520301629154" src="http://2.bp.blogspot.com/_oaFlRokTed0/Sm3VihTwXuI/AAAAAAAAAEA/-uN_YR4i0H0/s200/0470176385.jpg" style="cursor: pointer; float: left; height: 200px; margin: 0pt 10px 10px 0pt; width: 159px;" /&gt;&lt;/a&gt;Laura has a new book out, which I highly recommend: &lt;a href="http://www.amazon.com/gp/product/0470176385?ie=UTF8&amp;amp;tag=oaktonsoftwar-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=0470176385" target="blank"&gt;A Manager's Guide to Data Warehousing&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
In this book, she provides a practical guide to planning and executing data warehouse projects. It is written for managers (I.T. and business) who do not necessarily have a technical background in data warehousing.&lt;br /&gt;
&lt;br /&gt;
Laura touches on each phase of the data warehouse lifecycle, providing useful advice without over-burdensome methodology, detailed task lists or the like.  This makes it easy to fit her advice into your own organization's development style.&lt;br /&gt;
&lt;br /&gt;
Even if you already have a strong background in dimensional design, you will find this book to be quite useful.  You can get it at &lt;a href="http://www.amazon.com/gp/product/0470176385?ie=UTF8&amp;amp;tag=oaktonsoftwar-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=0470176385" target="blank"&gt;Amazon.com&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-weight: bold;"&gt;Also Recommended&lt;/span&gt;&lt;br /&gt;
If you have a dimensional data warehouse, I also urge you to check out &lt;a href="http://www.amazon.com/gp/product/0470149779?ie=UTF8&amp;amp;tag=oaktonsoftwar-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0470149779" target="blank"&gt;The Data Warehouse Lifecycle Toolkit, Second Edition&lt;/a&gt; by Ralph Kimball, Margy Ross, Warren Thornthwaite, Joy Mundy and Bob Becker.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.amazon.com/gp/product/0470149779?ie=UTF8&amp;amp;tag=oaktonsoftwar-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0470149779" target="blank"&gt;&lt;img alt="" border="0" id="BLOGGER_PHOTO_ID_5363177695194996658" src="http://3.bp.blogspot.com/_oaFlRokTed0/Sm3Vss1lX7I/AAAAAAAAAEI/_a2gadbuVm4/s200/0470176385.jpg" style="cursor: pointer; float: left; height: 200px; margin: 0pt 10px 10px 0pt; width: 159px;" /&gt;&lt;/a&gt;This fully revised version of the classic book contains detailed tasks and deliverables to help you manage all phases of the data warehouse lifecycle.&lt;br /&gt;
&lt;br /&gt;
It is an excellent reference for data warehousing professionals. Read more about it at &lt;a href="http://www.amazon.com/gp/product/0470149779?ie=UTF8&amp;amp;tag=oaktonsoftwar-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0470149779" target="blank"&gt;Amazon.com&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
The original edition has been a long time recommendation on this blog, and the new edition carries on the standard.  (Apologies to Warren Thornthwaite, whose name was previously misspelled here.)&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-6755874805203079392?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=6755874805203079392&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/6755874805203079392?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/6755874805203079392?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/07/recommended-books-on-data-warehouse.html" title="Recommended Books on the Data Warehouse Lifecycle" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_oaFlRokTed0/Sm3VihTwXuI/AAAAAAAAAEA/-uN_YR4i0H0/s72-c/0470176385.jpg" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CUMEQHg4fip7ImA9WxJRGU8.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-2786463412159998632</id><published>2009-05-20T12:34:00.007-04:00</published><updated>2009-05-21T12:30:01.636-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-05-21T12:30:01.636-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Slow Changes" /><category scheme="http://www.blogger.com/atom/ns#" term="Basics" /><category scheme="http://www.blogger.com/atom/ns#" term="Q and A" /><title>Do I really need Surrogate Keys?</title><content type="html">Here is the #1 most frequently question that people ask of me:&lt;br /&gt;&lt;blockquote&gt;Q: Do I really need surrogate keys?&lt;/blockquote&gt;&lt;span style="font-style: italic;"&gt;A: You &lt;span style="font-weight: bold;"&gt;absolutely must &lt;/span&gt;have a unique identifier for each dimension table, one that does &lt;span style="font-weight: bold;"&gt;not&lt;/span&gt; come from a source system.  A surrogate key is the best way to handle this, but there are other possibilities.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The case for the surrogate key is entirely pragmatic.  Read on for a full explanation.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Dimensions Need their Own Unique Identifier&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;It is crucial that the dimensional schema be able to handle changes to source data in whatever manner makes most sense from an analytic perspective.  This may be different from how the change is handled in the source.&lt;br /&gt;&lt;br /&gt;For this reason, every dimension table needs its own unique identifier -- not one that comes from a source system.  (For more on handling changes, start with &lt;a href="http://blog.oaktonsoftware.com/2007/10/for-slowly-changing-dimensions-change.html"&gt;this post&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;A surrogate key makes the best unique identifier. It is simply an integer value, holding no meaning for end users.   A single, compact column, it keeps fact table rows small and makes SQL easy to read and write.  It is simple to manage during the ETL process.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Compound Keys Work, But Why Take that Route?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;An alternative is to supplement a unique identifier from the source system (also known as a natural key) with a sequence number.  This results in a compound  key, or multi-part key.&lt;br /&gt;&lt;br /&gt;This kind of compound key also allows the dimension to handle changes differently than the source does.  But there are several disadvantages:&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt; &lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;Fact table rows become larger, as they must include the multi-part key for each dimension&lt;/li&gt;&lt;li&gt;The ETL process must manage a sequence for each natural key value, rather than a single sequence for a surrogate key&lt;/li&gt;&lt;li&gt;SQL becomes more difficult to read, write or debug&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Multi-part dimension keys can disrupt star join optimizers&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;So:  a compound key takes more space, is not any easier, and may disrupt performance.   Why bother?&lt;br /&gt;&lt;br /&gt;(By the way, many source system identifiers are &lt;span style="font-style: italic;"&gt;already &lt;/span&gt;compound keys.  Adding a sequence number will make them even larger!)&lt;br /&gt;&lt;br /&gt;Sometimes it is suggested that the natrual key be supplemented with a date, rather than a sequence.  This may simplify ETL slightly, but the rest of the drawbacks remain.  Plus, dates are even bigger than a sequence number.  Worse,  the date will appear in the fact table.  That is sure to lead to trouble!&lt;br /&gt;&lt;br /&gt;(This is not to say that datestamps on dimension rows are a bad thing.  To the contrary, they &lt;a href="http://blog.oaktonsoftware.com/2008/03/responding-to-star-schema-detractors.html"&gt;can be quite useful&lt;/a&gt;.  They just don't work well as part of a compound key.)&lt;br /&gt;&lt;br /&gt;That's the basic answer.  I'll respond to some common follow up questions over the coming weeks.&lt;br /&gt;&lt;br /&gt;- Chris&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-2786463412159998632?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=2786463412159998632&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/2786463412159998632?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/2786463412159998632?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/05/do-i-really-need-surrogate-keys.html" title="Do I really need Surrogate Keys?" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;C08HQn8yfSp7ImA9WxJTGEg.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-2703032116659126686</id><published>2009-04-27T12:22:00.005-04:00</published><updated>2009-04-27T13:03:53.195-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-04-27T13:03:53.195-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Q and A" /><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>More on Distinct Counts</title><content type="html">Shortly before my &lt;a href="http://blog.oaktonsoftware.com/2009/03/q-is-unique-count-fact-or-dimension.html"&gt;recent post&lt;/a&gt; on unique counts, a reader sent in a related question that takes a slightly different perspective:&lt;br /&gt;&lt;blockquote&gt;As fact tables could easily have a hierarchy of keys, the counts using the lowest grain can be counted via a count(*), and anything above that needs a count(distinct &lt;span style="font-style: italic;"&gt;column_a&lt;/span&gt;).&lt;br /&gt;&lt;br /&gt;Do you see many applications accommodating count(*)'s…by that I mean, making separate higher grain fact tables so ad-hoc users do not have to use a count distinct?&lt;br /&gt;&lt;br /&gt;P. Petrini&lt;/blockquote&gt;As the reader points out, unique counts (or distinct counts) can be problematic for end users to put together.  This may be particularly if you are using a SQL-generating Business Intelligence (BI)  tool.&lt;br /&gt;&lt;br /&gt;For example, suppose an orders fact table has one row per order line.  Its major dimensions are customer, product, salesperson and day.  To count the number of distinct products ordered on a day, you would need to place &lt;span style="font-family:courier new;"&gt;count(distinct product_name&lt;/span&gt;)  in your SQL.&lt;br /&gt;&lt;br /&gt;That is something a developer can do in her sleep, but may be a bit much for a power-user.  And it may be difficult to get a BI tool to generate this kind of SQL.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;The reader correctly points out that this would not be necessary if the grain of the fact table precisely matches whatever it is that needs to be counted.  All that would be necessary would be &lt;span style="font-family:courier new;"&gt;count(*) &lt;/span&gt;or something along those lines.  So a summary table &lt;span style="font-style: italic;"&gt;might &lt;/span&gt;help....&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The Issue&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The problem with this is that the summary table will simplify &lt;span&gt;one &lt;/span&gt;&lt;span style="font-style: italic; color: rgb(51, 51, 255);"&gt;and only one &lt;/span&gt;kind of count.&lt;br /&gt;&lt;br /&gt;There will probably be a &lt;span style="font-style: italic;"&gt;variety &lt;/span&gt;of &lt;span style="font-style: italic;"&gt;different&lt;/span&gt;, levels or time periods across which you want distinct counts,  and each would require its own aggregate.&lt;br /&gt;&lt;br /&gt;This could quickly become a lot of work -- simply to make it easier to formulate certain kinds of queries.  And while it may make a particular query easier to express, it introduces new complexity for the user, who must now choose precisely the right star for each query.&lt;br /&gt;&lt;br /&gt;For example, a summary fact table that has one row for each product for each day makes it easier to count the number of products that sold on a day.  But this aggregate must omit other dimensions like salesperson and customer to be useful in this regard.  If you also want to count other distinct things by day, each will require its own aggregate.  Also want distinct counts by month?  More aggregates.&lt;br /&gt;&lt;br /&gt;Not very efficient, unfortunately.  The SQL for counting distinct things is simplified, to be sure.  But now the ETL process is doing a lot more work, and users are faced with choosing the right aggregate for each query they build.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Alternatives&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;My suggestion is that, rather than build fact tables that eliminate the need for count(distinct), build and cache reports that do the work.  People interested in these counts can access the reports, instead of writing SQL.  If these counts are common, you might also find it easier to use an OLAP tool.&lt;br /&gt;&lt;br /&gt;Not perfect, I know, but nothing ever is.&lt;br /&gt;&lt;br /&gt;This is not to say that there is anything wrong with creating a series of summary tables for this purpose, or perhaps storing some pre-computed counts &lt;a href="http://blog.oaktonsoftware.com/2009/03/q-is-unique-count-fact-or-dimension.html"&gt;as discussed earlier&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As people who attend my classes know, my motto is "be pragmatic, not dogmatic."  As long as everyone is aware of the pros and cons, and understands how it affects each aspect of the data warehouse implementation, then if there is a consensus that adding summary tables is the best way to make life easier, go ahead and do it.&lt;br /&gt;&lt;br /&gt;Chris&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Many thanks to P. Petrini, for consenting to have his question appear here. &lt;br /&gt;&lt;br /&gt;If you have a question, you can email it to me. Click my profile picture at the top of this page for the address. I try to answer all questions, and might even print yours here&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-2703032116659126686?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=2703032116659126686&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/2703032116659126686?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/2703032116659126686?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/04/more-on-distinct-counts.html" title="More on Distinct Counts" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;A0QMSHw4fCp7ImA9WxJTEEU.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-3270530745697022944</id><published>2009-04-18T14:42:00.008-04:00</published><updated>2009-04-18T17:16:29.234-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-04-18T17:16:29.234-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Aggregates" /><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>Dealing with Period-to-Date Measurements (MTD, YTD and so forth)</title><content type="html">A few weeks ago, I &lt;a href="http://blog.oaktonsoftware.com/2009/01/q-customers-with-no-sales-in-fact-table.html"&gt;touched on&lt;/a&gt; some problems you might encounter  when trying to store period-to-date measurements in a fact table.  Today, I want to address the issue more directly.&lt;br /&gt;&lt;br /&gt;Period-to-date measurements can summarize just about any fact over a period of time.  A measurement of month-to-date sales , for example, aggregates sales from the beginning of the month to the date in question.  Month-to-date, quarter-to-date and year-to-date facts are often used to do things like compute commissions, volume discounts, etc.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Period-to-date Facts and Additivity&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Most facts in a star schema can be summed up across any and all dimensions.  Sales dollars, for example, can be summed across days, customers, stores, or whatever dimension makes sense to the person doing the analysis.  We call this kind of fact &lt;span style="font-style: italic;"&gt;fully additive&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;The basic issue with storing period-to-date facts is this:  &lt;span style="font-style: italic;"&gt;they are never fully additive&lt;/span&gt;.   It does not make any sense to sum up period-to-date measurements that were taken at different times.  Month-to-date sales for a customer on January 14, 15 and 16, for example, cannot be meaningfully added together.&lt;br /&gt;&lt;br /&gt;Period-to-date measurements are &lt;span style="font-style: italic;"&gt;semi-additive&lt;/span&gt;; the can be summed up across some dimensions, but not time.&lt;br /&gt;&lt;br /&gt;It &lt;span style="font-style: italic;"&gt;might &lt;/span&gt;make sense to aggregate together several period-to-date measurements if they were taken &lt;span style="font-style: italic;"&gt;at the same time&lt;/span&gt;.  The year-to-date sales of all customers who made a purchase on a particular day, for example, might factor into some useful analysis.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Storing a period-to-date fact with transactions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Since it makes no sense to sum up period-to-date measurements across different points in time, they have very limited use in a transaction-grained fact table.&lt;br /&gt;&lt;br /&gt;For example,  a transaction-grained fact table that stores a row for each order-line of each order might contain dimensions that capture the order_line number, date and time of the order, product sold, and so forth.  You might store a period-to-date measurement in such a fact table, but there there will be little or no opportunity to sum this fact across different order lines.  Only orders with the same date/time can be meaningfully summed.  And if there are multiple order-lines for a given order, even this may not make sense, since the period-to-date measurement would be stored redundantly.&lt;br /&gt;&lt;br /&gt;Stored in this kind of fact table, the period-to-date measurement has very limited use.  It can really only be used to study the most granular data in the fact table.  For example, it might be used to compute an applicable discount, or a commission percentage that escalates with sales volume.&lt;br /&gt;&lt;br /&gt;But is this needed in the fact table?  Such metrics are usually highly operational.  They may &lt;span style="font-style: italic;"&gt;influence &lt;/span&gt;other metrics, which have more analytic value and happen to be additive.  The commission paid or discount given, for example, may be computed using period-to-date information, but itself is fully additive. This will be useful to store in a fact table.&lt;br /&gt;&lt;br /&gt;This does not mean that period-to-date measurements have no place in a transaction-grained fact table.  If they are central to the business, or to analysis of the process, then place them there.  They may, however, cause excessive growth in the row size, since it is possible to enumerate multiple period-to-date versions of any given fact.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Storing a period-to-date fact with period aggregates or snapshots&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;The natural home for a period-to-date measurement is a fact table with a periodic-snapshot grain, or a fact table that aggregates across the time dimension.&lt;br /&gt;&lt;br /&gt;An aggregate that summarizes the time dimension sums up data from a transaction-based fact table over a particular period.  For each period summarized, the resulting rows all summarize &lt;span style="font-style: italic;"&gt;the same &lt;/span&gt;period.  This means that each can contain period-to-date measurements that &lt;span style="font-style: italic;"&gt;might &lt;/span&gt;be meaningfully summed, at least within the period.  (Note I said "might."  There is a caveat, which I will get to in a second.)&lt;br /&gt;&lt;br /&gt;An aggregate that records monthly sales totals, for example, is potentially a good place to include a quarter-to-date or year-to-date measurement.  Each row in the table will summarize the same period, so the period-to-date measurements may be meaningfully summarized.&lt;br /&gt;&lt;br /&gt;The same goes for a periodic snapshot fact table.  Like a periodic aggregate, each row in a snapshot summarizes a particular period.  The main difference is that a snapshot will contain some form of unique status measurement that cannot be found in the transactions, such as a balance or level.  Once again, a semi-additive period-to-date fact can be stored here.&lt;br /&gt;Snapshots record the status of things at fixed time intervals.  A monthly snapshot of bank accounts, for example, records the day-end balance of each account.  This is a logical place to store month-to-date or other such&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Caution:  Exactly what is being measured period-to-date?&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Even within an aggregate or snapshot period, period-to-date facts may not be additive.  It is important to ask yourself: &lt;span style="font-style: italic;"&gt;exactly what is being measured period-to-date?&lt;/span&gt;  If it does not coincide with the grain of the table, it may be non-additive.&lt;br /&gt;&lt;br /&gt;For example, a monthly aggregate stores sales by product and customer.  If we want to store year-to-date customer sales in this table, the same fact will be repeated multiple times if a single customer purchased multiple products.  It is not additive across products; it is repeated if the customer bought more than one product.&lt;br /&gt;&lt;br /&gt;On the other hand, if the period-to-date measurement captures sales by customer and product, it coincides with the grain of the table.  It can be meaningfully aggregated across any dimension but time.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Reporting&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;The old stand-by is to compute a period-to-date measurement in a query or report.  It is calculated from transactions at run-time.  This allows access to any conceivable period-to-date metric, and may also be used as a &lt;a href="http://blog.oaktonsoftware.com/2009/01/q-customers-with-no-sales-in-fact-table.html"&gt;behavioral qualification&lt;/a&gt; rather than a fact.&lt;br /&gt;&lt;br /&gt;Such queries may be particularly nasty or time consuming, but they can be run during batch windows and cached to compensate.&lt;br /&gt;&lt;br /&gt;My next post will be another Q&amp;amp;A.  Please send in your questions!&lt;br /&gt;&lt;br /&gt;Chris&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-3270530745697022944?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=3270530745697022944&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/3270530745697022944?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/3270530745697022944?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/04/dealing-with-period-to-date.html" title="Dealing with Period-to-Date Measurements (MTD, YTD and so forth)" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;C0IAQ3o9cCp7ImA9WxVbEUU.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-8786082441416344638</id><published>2009-03-27T13:34:00.001-04:00</published><updated>2009-03-27T15:19:02.468-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-27T15:19:02.468-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Aggregates" /><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>Is a Snapshot an Aggregate?</title><content type="html">&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt;Students in my classes often ask me if a snapshot fact table is an aggregate.  My short answer is that it doesn't matter.  But here's the long answer.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is really a matter of semantics more than anything else.  What it boils down to is how you define an aggregate table.  Is it simply a table that summarizes data from other tables?  Then yes, most of the time a snapshot will fit this definition -- but not always.   However, I like to reserve the word "aggregate" for a very specific kind of summary.  If you do as well, then the answer is no. Never.&lt;br /&gt;&lt;br /&gt;Before getting into that, though, here is some background.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Snapshot Fact Tables&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A snapshot, for the uninitiated, is a fact table that periodically samples a measurement.  A snapshot always includes at least one measurement of  status, such as a balance or level.  A transaction fact table, in contrast, captures measurements as they occur.&lt;br /&gt;&lt;br /&gt;The snapshot is useful for describing the effect of transactions, rather than transactions themselves.  The best example is your bank account.  Its all well and good if you have a record of each deposit, check, interest payment, fee, direct debit, and so forth.  But what is your balance?  Aggregating all your transactions is not the most expedient way to figure that out.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Sometimes Snapshots Summarize Transactions, but not Always&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;People sometimes think of snapshots as aggregates because they can be constructed from a table that contains transactions.  That is certainly true in the case of account balances.  If you have the transactions, you can compute the snapshots.&lt;br /&gt;&lt;br /&gt;In this sense, you might be justified in thinking of  a snapshot as an aggregate.  But I don't.  I'll explain why in just a second.&lt;br /&gt;&lt;br /&gt;First, though, it is important to understand that not all snapshots are constructed from  transactions.  Sometimes, the number of transactions is too numerous to keep around.  But in other cases,  &lt;span style="font-style: italic;"&gt;there simply aren't any&lt;/span&gt;.   For example, we can't make use of transactions to describe the ambient temperature on the factory floor, or the level of pressure inside an industrial device.  In these cases, the snapshot is clearly not a summary.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Snapshots Contain "new" Facts&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So how about those other situations -- the ones where there &lt;span style="font-style: italic;"&gt;are &lt;/span&gt;transactions that can be used to create a snapshot?  Isn't a snapshot an aggregate in this case?&lt;br /&gt;&lt;br /&gt;It really depends on how you define an aggregate.&lt;br /&gt;&lt;br /&gt;The word aggregate is normally reserved for a summary table that does not transform the structure of its corresponding base table.  For example, suppose account transactions are recorded in a fact table, with a fact called &lt;span style="font-family:courier new;"&gt;transaction_amount.&lt;/span&gt; A monthly aggregate would have exactly the same fact.  Still called &lt;span style="font-family:courier new;"&gt;transaction_amount&lt;/span&gt;, is the same as the fact in the base table.  The only difference is in level of detail:  it represents a month's worth of transactions, rather than individual transactions.  It is the same measurement.&lt;br /&gt;&lt;br /&gt;Account balance (or other status measurements like levels) is a &lt;span style="font-style: italic;"&gt;not the same &lt;/span&gt;as what the base fact table records.  It is &lt;span style="font-style: italic;"&gt;not &lt;/span&gt;an aggregation of transaction that occurred during the snapshot period; it is something more.  It describes the &lt;span style="font-style: italic;"&gt;net effect &lt;/span&gt;of the account transactions.   It is a different measurement.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Conclusion&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I use the term &lt;span style="font-weight: bold;"&gt;derived table &lt;/span&gt;to describe any table in a dimensional design uses another as its source.  I save the word &lt;span style="font-weight: bold;"&gt;aggregate &lt;/span&gt;for non-transformative summaries.&lt;br /&gt;&lt;br /&gt;To sum things up, then:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Yes, a snapshot sometimes summarizes a transaction fact table.  &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;But if you accept these definitions for aggregate tables and derived tables, the answer is different:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;A snapshot is is sometimes a derived table, but never an aggregate.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is just a matter of semantics, and not very important to me.  Call it whatever you want, and I will be fine with it.  The important thing is that you have a dimensional design that enables useful analytics.&lt;br /&gt;&lt;br /&gt;- Chris&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-8786082441416344638?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=8786082441416344638&amp;isPopup=true" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/8786082441416344638?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/8786082441416344638?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/03/is-snapshot-aggregate.html" title="Is a Snapshot an Aggregate?" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></entry><entry gd:etag="W/&quot;C0MMQXk8eCp7ImA9WxVUGU0.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-6327202582694668553</id><published>2009-03-24T09:22:00.003-04:00</published><updated>2009-03-24T09:31:20.770-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-24T09:31:20.770-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Announcements" /><title>Feed Updated, Now Contains Full Posts</title><content type="html">If you follow this blog in a newsreader, you will be happy to know that the &lt;a href="http://feeds.feedburner.com/StarSchemaCentral"&gt;RSS feed&lt;/a&gt; now contains full posts,  rather than the first couple of paragraphs.  That should make your reading experience a lot easier.&lt;br /&gt;&lt;br /&gt;The change seems to have been glitch-free, but if there are any issues please let me know.&lt;br /&gt;&lt;br /&gt;Chris&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-6327202582694668553?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=6327202582694668553&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/6327202582694668553?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/6327202582694668553?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/03/feed-updated-now-contains-full-posts.html" title="Feed Updated, Now Contains Full Posts" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CkIHRn44eSp7ImA9WxVUFEQ.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-7247367203714729383</id><published>2009-03-19T13:11:00.004-04:00</published><updated>2009-03-19T15:22:17.031-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-19T15:22:17.031-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Q and A" /><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>Q&amp;A: Is a Unique Count a Fact or Dimension?</title><content type="html">&lt;span style="font-style: italic;"&gt; A reader recently sent in a question about unique counts.  Are they facts, or dimensions?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here is a paraphrased version of their questions.  The business example has been changed.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-weight: bold;"&gt;Q:&lt;/span&gt; We report on the number of unique customers that visit a location. We do this at daily, monthly and annual levels, and also look at the same thing across all locations. We originally modeled the customer counts as facts. The problem is that if we compute unique customer counts by day, they do not correctly roll up to month. Does it make more sense to treat this as a dimension?&lt;/blockquote&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;A:&lt;/span&gt; First, let me say that customer counts are certainly facts. They are important metrics that are colored by the dimensions we use to define them.&lt;br /&gt;&lt;br /&gt;The difficulty with counts of distinct things, as you have discovered, is that they are&lt;span style="font-style: italic;"&gt; &lt;/span&gt;&lt;span&gt;&lt;span style="font-weight: bold;"&gt;non-additive&lt;/span&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt; &lt;/span&gt;facts. As soon as you use a set of dimensions to create a distinct count -- such as day, location and customer -- you have an aggregation that cannot be further summed. If you need to go to the monthly level, or across all locations, you need to go back to the granular data.&lt;br /&gt;&lt;br /&gt;You have a few options.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Option 1&lt;/span&gt;:&lt;span style="font-weight: bold;"&gt;  Compute in reports&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first is to compute unique counts in reports. This can be time consuming, and a major hassle if there are several counts that must be computed frequently. You might be able to schedule the reports to run in a batch window. This deals with the processing issues, but not the report or query complexity.&lt;br /&gt;&lt;br /&gt;In my view, however, it is the best option.  There are other possibilities, but each has significant drawbacks.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Option 2: Create snapshots or aggregates&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A second option is to create some snapshot fact tables that compute the various distinct counts at various periodic intervals. These will contain non-additive facts, and your developers will have to understand that they should not be aggregated. Depending on their design, these tables may more closely resemble aggregate tables.&lt;br /&gt;&lt;br /&gt;This approach may require quite a few fact tables, since each of your unique counts will have a different grain -- daily by location, monthly by location, annually by location, daily across all locations, monthly across all locations, etc.&lt;br /&gt;&lt;br /&gt;To make life a little easier for report developers, designers sometimes replicate the monthly and annual totals in the daily level snapshot. Developers find this appealing, because all the information to produce subtotals over time can be fetched from a single row, rather than three queries. The flipside is that the monthly and annual data is stored redundantly in a lot of places, and must be maintained consistently and used properly.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Option 3: Behavioral Dimensions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If the current period is the primary focus, a third option is to create behavioral dimensions. For example, in the store dimension, you can add attributes that represent counts of unique customers for the current month and year. I mentioned behavioral dimensions in a &lt;a href="http://blog.oaktonsoftware.com/2009/01/q-customers-with-no-sales-in-fact-table.html"&gt;previous Q&amp;amp;A post&lt;/a&gt;, and the concept is the same here.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Option 4: Creating Special Facts that can be Summed&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Lastly, some designers like to create special facts in the fact table that can be used to sum unique counts. If you have a fact table that gets a row each time a customer visits a location, you may be able to tweak it a bit. Add an additional fact called &lt;span style="font-family:courier new;"&gt;first_visit_of_month &lt;/span&gt;that only contains a 1 the first time the customer visits &lt;span style="font-style: italic;"&gt;any &lt;/span&gt;location during the month.  &lt;span style="font-family:courier new;"&gt;First_visit_of_year &lt;/span&gt;will only contain a 1 the first time the customer visits during the year. These facts can be summed to produce unique counts for various time frames, but you need to be careful not to aggregate them across the periods they represent.&lt;br /&gt;&lt;br /&gt;Another drawback is that, once again, numerous special facts will be required. To compute unique counts by location, you will need facts like first_visit_this_location_this_month. This fact should not be aggregated across months or locations. That's a lot of rules, and they are likely to be broken.&lt;br /&gt;&lt;br /&gt;As you can see, no option looks perfect. Don’t let that lead you to push back on requirements, however. Distinct counts are often very important metrics, particularly when studying customer service or processing efficiency.&lt;br /&gt;&lt;br /&gt;Thanks for the question!&lt;br /&gt;&lt;br /&gt;- Chris&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;If you have a question, you can email it to me. Click my profile picture at the top of this page for the address. I try to answer all questions, and might even print yours here.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-7247367203714729383?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=7247367203714729383&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/7247367203714729383?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/7247367203714729383?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/03/q-is-unique-count-fact-or-dimension.html" title="Q&amp;A: Is a Unique Count a Fact or Dimension?" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;C0MNQnk_cSp7ImA9WxVUGU0.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-7190557097448818542</id><published>2009-03-19T09:00:00.005-04:00</published><updated>2009-03-24T09:31:33.749-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-24T09:31:33.749-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Announcements" /><title>Privacy Policy</title><content type="html">Recent changes to the Google AdSense program prompted me to define a privacy policy for this blog.  Here it is:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;blockquote&gt;&lt;span style="font-weight: bold;"&gt;Oakton Software does not gather any information about your visits to this blog, or the pages you view.&lt;/span&gt;&lt;/blockquote&gt;&lt;br /&gt;I wish the policy could be that simple, but you should also know the following:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Blogger:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The blog is hosted by Google's Blogger.com.&lt;br /&gt;&lt;br /&gt;If you have a Blogger account, Google may log information about your visits to this blog.  The Blogger privacy policy can be found by visiting the &lt;a href="http://www.google.com/privacy.html"&gt;Google Privacy Center&lt;/a&gt; and clicking on the link for Blogger.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Newsletter:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you subscribe to the newsletter, the information you provide is used to send occasional email updates.  These always include an opt-out link.&lt;br /&gt;&lt;br /&gt;The mail list is maintained and managed by an external service provider.  They follow a confirmed opt-in policy to ensure compliance with anti-spam legislation.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Ads By Google:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This blog displays advertising that is served by the Google AdSense program.&lt;br /&gt;&lt;br /&gt;You have probably seen these ads all over the internet.  Google uses a cookie to serve the ads.  The cookie allows Google to show you ads based on this and other sites you have visited that feature ads by Google.  It is not used to gather information like your name or email address.&lt;br /&gt;&lt;br /&gt;Google provides a way to opt out of this interest-based advertising program.  Visit their  &lt;a href="http://www.google.com/privacy_ads.html"&gt;Advertising and Privacy Page&lt;/a&gt; for an opt-out link,  or simply click on the Ads By Google header that appears over their ads.  You can also see what Google has decided are your interests by visiting their &lt;a href="http://www.google.com/ads/preferences/"&gt;Ads Preference Manager&lt;/a&gt; page.&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-7190557097448818542?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=7190557097448818542&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/7190557097448818542?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/7190557097448818542?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/03/privacy-policy.html" title="Privacy Policy" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CkIHRn44eSp7ImA9WxVUFEQ.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-6815046542630841705</id><published>2009-02-16T16:21:00.003-05:00</published><updated>2009-03-19T15:22:17.031-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-19T15:22:17.031-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>Accounting Transactions Have Multiple Dates</title><content type="html">&lt;span style="font-style: italic;"&gt;Building a financial data mart? Don't let the multiple dates associated with accounting transactions confuse you.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A standard feature of financial systems often befuddles star schema designers.  When someone inputs a transaction into a financial system, the transaction has at least two associated dates.  One represents the date that the event happened, and the second represents the date the transaction was entered into the system.  The first is usually called the transaction date, while the second is called the date posted.  Sometimes, there are more dates as well.&lt;br /&gt;&lt;br /&gt;For example, if a check is written, the record in the accounting system will have two dates.  The &lt;span style="font-style: italic;"&gt;transaction date&lt;/span&gt; will be the date on the check.   The &lt;span style="font-style: italic;"&gt;date posted &lt;/span&gt;will be the day that someone in the accounting department actually entered the transaction into the system.  (Some financial systems may have other dates as well, such as an audit date or an edited date that applies to comments.)&lt;br /&gt;&lt;br /&gt;Don’t let these dates confuse you.  If you are designing a financial data mart, you do not need to choose between them; both dates will be useful, and both should appear in your model.  A fact table that logs general ledger activity, for example, will have one row per transaction.  That row can contain two references to a day dimension; one represents the transaction date and one represents the posting date.  Name the keys appropriately.  For example, you can call them &lt;span style="font-family:courier new;"&gt;day_key_transaction&lt;/span&gt;, &lt;span style="font-family:courier new;"&gt;day_key_posted&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;The &lt;span style="font-style: italic;"&gt;transaction date &lt;/span&gt;will be the one that is used to drive most reporting.  This is the date that properly associates the transaction with an accounting period.   You would use it to summarize transactions by day, month or fiscal period, for example.&lt;br /&gt;&lt;br /&gt;The &lt;span style="font-style: italic;"&gt;date posted&lt;/span&gt; will be of lesser use, but may also occasionally serve analytic purposes--especially when compared to the transaction date. For example, auditors may want to look for transactions posted one or more months after they actually took place. Managers might want to improve their efficiency by studying lags between transaction date and date posted.&lt;br /&gt;&lt;br /&gt;If you are also going to build a monthly snapshot by account, then the star for account transactions may serve as its source.  In that case, the date posted will help the ETL developers maintaining the snapshot.  During each processing window, they can simply look for any transactions that were posted prior to the last time the snapshot was updated, and process only those rows.&lt;br /&gt;&lt;br /&gt;- Chris&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-6815046542630841705?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=6815046542630841705&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/6815046542630841705?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/6815046542630841705?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/02/accounting-transactions-have-multiple.html" title="Accounting Transactions Have Multiple Dates" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;Ak8HSX45eip7ImA9WxVXGEw.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-373440023359332376</id><published>2009-02-16T15:50:00.006-05:00</published><updated>2009-02-16T16:13:58.022-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-02-16T16:13:58.022-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Q and A" /><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>Q&amp;A: Chosing the right time dimension for a Snapshot</title><content type="html">&lt;span style="font-style: italic;"&gt;In this Q&amp;amp;A, a reader asks how to designate a time dimension for a fact table that is a snapshot. &lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;Q:  We are designing a new star.  The fact table is a monthly snapshot.  It will capture several metrics each month for each open account.  My question is how to handle the time dimension.  We cannot agree whether it makes more sense to have a month dimension, or to use a month end date from a day dimension. What makes the most sense?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;- Name withheld&lt;/span&gt; &lt;span style="font-style: italic;"&gt;by request&lt;/span&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;First, I want to commend you for stating the grain of your fact table, and stating it clearly.  Grain confusion can cause many problems, and makes a question like yours even more difficult to answer.  Snapshots always have a time component to their grain; in your case it is monthly.  Your fact table will have one row per account per month.&lt;br /&gt;&lt;br /&gt;Now to your question:  I strongly prefer the use of a month dimension for a monthly snapshot.  This approach is non-ambiguous.  It is self-evident what each row represents; the chance for any confusion is minimal.&lt;br /&gt;&lt;br /&gt;When a monthly snapshot is designed with a day dimension, the designers are generally clear on what each row represents.  However, report developers or other users may misinterpret the data now or in the future.&lt;br /&gt;&lt;br /&gt;Designers who use a day dimension with a monthly snapshot generally choose the last day of the period represented, as you suggested.  The problem with this is that there are  many elements of a typical day dimension that simply do not apply to a monthly summary.  Tor example, a snapshot representing March 2009 would have the day 3/31/09 associated with it.  The day dimension has a lot of information about March 31, 2009 that does not apply to the snapshot, such as what day of the week it was, whether it was a holiday, and so forth.  It would be all too easy to misinterpret this information as being applicable to the facts.&lt;br /&gt;&lt;br /&gt;Even if everyone is clear on this at the start, a year or two down the road there are likely to be new members of your development team who may misinterpret the data.  The presence of a date can also lead to confusion about what a row represents.  If your monthly summary&lt;br /&gt;contains an average daily balance, for example, a user may confuse this for a balance as of the day in question.&lt;br /&gt;&lt;br /&gt;Confusion will also arise over what date to use for the current period.  During March 2009, you may be updating the snapshot rows for each account on a daily basis.  Using a day dimension, ETL developers may assign the particular date in March, which now serves double duty -- designating the snapshot period and the date loaded.&lt;br /&gt;&lt;br /&gt;Sometimes, there is a strong desire among technical team members to use a day dimension with a monthly snapshot.  If you cannot convince them otherwise, then you can avoid some confusion by creating a view for use with the monthly snapshot that hides the day-level attributes.  You may be able to do the same kind of “hiding” of non-applicable dimension attributes within your BI tool.  Use the month-end date for each fact table row.  If the ETL team wants to track the day each row was loaded, they can do this with a second relationship to the day dimension.  (This trick works nicely even if you do go with a month dimension.)&lt;br /&gt;&lt;br /&gt;Regardless of your approach, I also advise you to name the facts carefully.  If you want to include a balance as of the end of the period, name it &lt;span style="font-family:courier new;"&gt;period_end_balance&lt;/span&gt;.  An average daily balance for the period in question should be called &lt;span style="font-family:courier new;"&gt;average_daily_balance&lt;/span&gt;.  A column name like &lt;span style="font-family:courier new;"&gt;balance &lt;/span&gt;is simply too vague, and bound to be misused.&lt;br /&gt;&lt;br /&gt;Thanks for your question!&lt;br /&gt;&lt;br /&gt;- Chris&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;If you have a question, click my profile picture at the top of the page for my address.  I try to respond to all messages, and may even post an answer here.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-373440023359332376?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=373440023359332376&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/373440023359332376?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/373440023359332376?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/02/q-chosing-right-time-dimension-for.html" title="Q&amp;A: Chosing the right time dimension for a Snapshot" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;DE4BSH0-cSp7ImA9WxVRF0k.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-4037197849732621716</id><published>2009-01-23T15:34:00.006-05:00</published><updated>2009-01-23T16:42:39.359-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-01-23T16:42:39.359-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Q and A" /><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>Q&amp;A: Customers with No Sales in Fact Table</title><content type="html">&lt;span style="font-style: italic;"&gt;In this post, Chris answer a reader's question about reporting on customers with no sales in the fact table.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;New Q&amp;amp;A Feature&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I get a lot of questions about dimensional modeling from readers of the blog, my books, and people who attend my classes.  I try to answer all of them.  I thought it would be interesting to post some of them here, for all to read.&lt;br /&gt;&lt;br /&gt;This week, I'm answering a question about reporting on customers with no sales.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;In general what is the recommended approach for reporting customers that have no sales in the fact table for MTD or YTD? Or, for that matter, any dimension member where no fact exists for some time period?&lt;br /&gt;&lt;br /&gt;- J. Dolan&lt;/blockquote&gt;&lt;br /&gt;Usually, something like “customers with no sales for a period” is computed as part of the reporting process.  Developers often bristle at this kind of report, because it requires using a correlated subquery.  This makes it difficult to do with a SQL-generating BI tool, and can be slow.  And if you want to use the list of customers with no sales as a filter for another query, things can get really hairy.&lt;br /&gt;&lt;br /&gt;An alternative is to pre-compute this information, storing the results in a table that summarizes reporting periods.   This is likely to be a factless fact table, with foreign keys to dimension tables representing the period summarized and the dimension in question--customers in your case.   If you require monthly and yearly slices,  you will want to multiple summaries.  This approach may simplify reporting, but adds complexity to the ETL process.&lt;br /&gt;&lt;br /&gt;If current period is the only period of concern, as in your question, then it is also possible to pre-compute the information and store the result &lt;span style="font-weight: bold;"&gt;directly &lt;/span&gt;to the customer dimension table, as a flag.  The flag would specify whether the customer has placed an order during the current period.   In your case, it would be two flags -- one for month to date and one for year to date.&lt;br /&gt;&lt;br /&gt;This kind of attribute is called a &lt;span style="font-weight: bold;"&gt;behavioral dimension &lt;/span&gt;because it takes behavioral activity recorded in the fact table and transforms it into a dimension.   Behavioral dimensions are very powerful, because they can be used as simple filters for other queries.&lt;br /&gt;&lt;br /&gt;Again, this concept trades ETL complexity for reporting flexibility and performance.  But isn't that what data warehousing is all about?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Thanks to J. Dolan for agreeing to have his question posted.  If you have a question, send it in.  I try to answer all my email, though it sometimes takes a while!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Chris&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-4037197849732621716?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=4037197849732621716&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/4037197849732621716?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/4037197849732621716?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2009/01/q-customers-with-no-sales-in-fact-table.html" title="Q&amp;A: Customers with No Sales in Fact Table" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;AkcGRHY5eCp7ImA9WxRRGUg.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-5004912775186655834</id><published>2008-10-02T09:23:00.011-04:00</published><updated>2008-10-02T10:33:45.820-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-10-02T10:33:45.820-04:00</app:edited><title>Coming to TDWI New Orleans?  Vote First!</title><content type="html">&lt;span style="font-style: italic;"&gt;If you'll be joining me at the TDWI conference in New Orleans during election week, make sure you vote ahead of time.  Here's how. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This election day, I'll be teaching &lt;span style="font-weight: bold;"&gt;Advanced Dimensional Modeling&lt;/span&gt; at &lt;a href="http://www.tdwi.org/neworleans/" target="_blank"&gt;The Data Warehouse Institute World Conference&lt;/a&gt; in New Orleans.  I was a bit reluctant to accept the invitation at first, out of fear that it would prevent my vote from being counted.  But after doing a little bit of research, I accepted with confidence.&lt;br /&gt;&lt;br /&gt;You can attend with confidence as well. All you need is an absentee ballot.&lt;br /&gt;&lt;a href="http://govoteabsentee.org" target="_blank"&gt; &lt;img style="float: left; padding: 20px 20px 10px 0px; border-color: #ffffff" src="http://govoteabsentee.org/images/widgetalt.gif"  alt="GoVoteAbsentee.org / learn how to vote absentee!"/&gt;&lt;/a&gt;Absentee voting procedures vary by locality.  To find out how it works where you live, you can go to &lt;a href="http://GoVoteAbsentee.org" target="_blank"&gt;GoVoteAbsentee.org&lt;/a&gt;.  &lt;br /&gt;&lt;br /&gt;Plug in your zip code, and they'll provide you with information on how to apply for an absentee ballot.  They'll even give you a link to download the ballot application. If you prefer, you can contact your county or city government for information.&lt;br /&gt;&lt;br /&gt;The process is simple, and you can be sure your vote will be heard.  Some localities even permit residents to cast absentee ballots in person.  In Fairfax county Virgina, where I live,  you can personally feed your ballot into the optical scanner that tallies the votes.  &lt;br /&gt;&lt;br /&gt;And as an added bonus, you won't have to stand in line on election day!&lt;br /&gt;&lt;br /&gt;I have already cast my absentee vote.  If you will be traveling to The Data Warehouse Institute Conference in New Orleans, I urge you to do the same.  Go to &lt;a href="http://GoVoteAbsentee.org" target="_blank"&gt;GoVoteAbsentee.org&lt;/a&gt; now, and find out how.&lt;br /&gt;&lt;br /&gt;- Chris&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-5004912775186655834?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=5004912775186655834&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/5004912775186655834?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/5004912775186655834?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2008/10/coming-to-tdwi-new-orleans-vote-first.html" title="Coming to TDWI New Orleans?  Vote First!" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;C0UMRXk5cCp7ImA9WxVaEU4.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-283177250852723935</id><published>2008-06-10T11:39:00.007-04:00</published><updated>2009-04-07T15:08:04.728-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-04-07T15:08:04.728-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Aggregates" /><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>Available Now:  Designing a Performance Layer</title><content type="html">My article &lt;span style="font-style: italic;"&gt;"Designing A Performance Layer:  Aggregates and Derived Tables" &lt;/span&gt;appears in the current issue of &lt;span style="font-weight: bold;"&gt;The Business Intelligence Journal&lt;/span&gt; (Vol 13, No. 1),  available to members of The Data Warehouse Institute.&lt;br /&gt;&lt;br /&gt;This article shows you how to create a &lt;span style="font-style: italic;"&gt;performance layer&lt;/span&gt; for your dimensional data warehouse -- a set of cubes and tables designed specifically for performance.&lt;br /&gt;&lt;br /&gt;The article covers two types of performance design:  &lt;span style="font-style: italic;"&gt;aggregate tables&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;derived tables&lt;/span&gt;.&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Aggregate Tables &lt;/span&gt;store partially summarized copies of existing data.  These tables (or cubes) improve performance for queries that do not require granular detail.  The article fully explains aggregates, including summarization of metrics, hierarchies, and implementation with or without aggregate navigators.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Derived Tables&lt;/span&gt; store restructured copies of existing data.  They do not necessarily summarize information, but they may.  Derived tables deliver performance benefits by streamlining the query and reporting processes.  In the article, I show how to supplement an existing schema with merge, pivot, and set operation fact tables (or cubes).&lt;/li&gt;&lt;/ul&gt;The full article appears in &lt;span style="font-weight: bold;"&gt;The Business Intelligence Journal&lt;/span&gt;, which is available in print and &lt;a href="http://www.tdwi.org/Publications/BIJournal/index.aspx" target="_blank"&gt;on line&lt;/a&gt; for TDWI members.&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-283177250852723935?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=283177250852723935&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/283177250852723935?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/283177250852723935?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2008/06/available-now-designing-performance.html" title="Available Now:  Designing a Performance Layer" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;D0YBRH07fyp7ImA9WxZWE04.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-7251522539905068687</id><published>2008-03-12T10:53:00.004-04:00</published><updated>2008-03-12T11:19:15.307-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-03-12T11:19:15.307-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Slow Changes" /><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>Responding to Star Schema Detractors with Timestamps</title><content type="html">&lt;span style="font-style: italic;"&gt;Time-stamped dimensions can pack additional analytic punch, compared with standard slow change techniques.  There is additional cost in the load process, but it may be worth it, especially if you plan to load historic data.&lt;/span&gt;    &lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;/o:p&gt;This is the fourth in a series of posts on Slowly Changing Dimensions.&lt;span style=""&gt;  &lt;/span&gt;If you are just joining in, you might want to review &lt;a href="http://blog.oaktonsoftware.com/2007/10/for-slowly-changing-dimensions-change.html"&gt;Part I&lt;/a&gt;, &lt;a href="http://blog.oaktonsoftware.com/2007/11/how-to-document-type-3-scd-pair-of-type.html"&gt;Part II&lt;/a&gt; and &lt;a href="http://blog.oaktonsoftware.com/2008/01/have-it-both-ways-with-hybrid-slow.html"&gt;Part III&lt;/a&gt;.&lt;/p&gt;        &lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;/o:p&gt;Star schema detractors often suggest that that a dimensional model throws away valuable detail.&lt;span style=""&gt;  &lt;/span&gt;Proponents of dimensional design counter that one of their basic design tenets is to set fact table grain at the lowest level of detail possible.&lt;span style=""&gt;  &lt;/span&gt;Fair enough.&lt;span style=""&gt; &lt;/span&gt;&lt;o:p&gt; &lt;/o:p&gt;But this response does not address the concern fully.&lt;span style=""&gt;  &lt;/span&gt;What about the dimensions?&lt;span style=""&gt;  &lt;/span&gt;&lt;br /&gt;&lt;/p&gt;    &lt;p class="MsoNormal"&gt;Designers who eschew the dimensional model for their enterprise data warehouse point out that they can tell you what a particular customer looked like on a given day, regardless of whether there were any sales.&lt;span style=""&gt;  &lt;/span&gt;A dimensional design using standard Type 2 techniques does not offer this capability.&lt;span style=""&gt;  &lt;/span&gt;It writes a new row to the dimension table each time a characteristic changes.&lt;span style=""&gt;  &lt;/span&gt;&lt;span style="font-style: italic;"&gt;But &lt;/span&gt;we can only link these versions of the customer to a point in time &lt;span style="font-style: italic;"&gt;if&lt;/span&gt; there is a transaction recorded in a fact table.&lt;/p&gt;    &lt;p class="MsoNormal"&gt;In the over-hyped "&lt;a href="http://blog.oaktonsoftware.com/2007/03/three-data-warehouse-architectures-that.html"&gt;religious wars&lt;/a&gt;" of data warehousing, this is one criticism of the star schema that I have always felt had some validity.&lt;span style=""&gt;  &lt;/span&gt;However, it is an easy one to address.&lt;span style=""&gt;  &lt;/span&gt;&lt;/p&gt;    &lt;p style="font-weight: bold;" class="MsoNormal"&gt;Time Stamped Dimensions&lt;/p&gt;    &lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;/o:p&gt;The solution is so simple that I have encountered it being employed numerous times by groups with no formal training in dimensional design.&lt;span style=""&gt;  &lt;/span&gt;The solution is to add a pair of dates to the dimension table:&lt;span style=""&gt;  &lt;/span&gt;effective_date and expiration_date.&lt;/p&gt;    &lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;/o:p&gt;When a row is added to the dimension table, it is given an effective_date that indicates when the change became effective.&lt;span style=""&gt;  &lt;/span&gt;Later, when a slow change occurs, two things occur.&lt;span style=""&gt;  &lt;/span&gt;The new record is inserted with the new effective_date, similar to a standard Type 2 response.&lt;span style=""&gt;  &lt;/span&gt;In addition, the previous record is updated: its expiration_date is set to the date before the change occurred.&lt;/p&gt;    &lt;p style="font-weight: bold;" class="MsoNormal"&gt;More ETL Work, but More Analytic Possibilities&lt;/p&gt;    &lt;p class="MsoNormal"&gt;This extra step may be a bit of additional work for the ETL developers, but it provides some additional analytic capability.&lt;span style=""&gt;  &lt;/span&gt;Using these dates, it is possible to:&lt;/p&gt;    &lt;ol&gt;&lt;li&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;span style="font-style: italic;"&gt;List a series of changes to something in sequence&lt;/span&gt;: order by the effective_date.&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Determine the status of something at a point in time&lt;/span&gt;:&lt;span style=""&gt;  &lt;/span&gt;qualify the date in question as greater than or equal to the effective_date and less than or equal to the expiration_date&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Quickly find the current value&lt;/span&gt; for a given dimension, by qualifying on expiration_date&lt;/li&gt;&lt;/ol&gt;For all this to work, it is useful to initialize expiration dates of current records to an actual date.&lt;span style=""&gt;  &lt;/span&gt;This avoids the tortured query predicates that are brought on by the use of NULL values.&lt;span style=""&gt;  &lt;/span&gt;I prefer to use the highest date value supported by the RDBMS, but that’s a topic for another post.&lt;br /&gt;&lt;br /&gt;&lt;p class="MsoNormal"&gt;As described, this is more of a "Date Stamped" dimension.  But if the granularity of significant change is finer than the day, it can easily be extended to include a time component.&lt;br /&gt;&lt;/p&gt;    &lt;p class="MsoNormal"&gt;Because there is a bit of extra ETL processing, it is worth considering whether the additional analytic value is worth it.  But there is also another consideration to take into account, and here the ETL developers may actually save some work.&lt;br /&gt;&lt;/p&gt;&lt;p style="font-weight: bold;" class="MsoNormal"&gt;Another Benefit&lt;/p&gt;    &lt;p class="MsoNormal"&gt;Having the timestamps on our conformed dimensions will make it easier to load past history when we build new fact tables in the future. This is especially handy if we are building the warehouse one subject area at a time.&lt;/p&gt;    &lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;/o:p&gt;For example, our Customer dimension may be implemented first as part of an orders schema.&lt;span style=""&gt;  &lt;/span&gt;Later, it may come into play again as part of a receivables schema. When it comes time to load the receivables history into a fact table, each transaction will need to be associated with the correct historic version of the customer.&lt;span style=""&gt;  &lt;/span&gt;Without timestamps, ETL developers will need to use the operational data to identify exactly what the customer looked like at the time of the transaction, then search for a match in the customer dimension.&lt;/p&gt;    &lt;p class="MsoNormal" style=""&gt;&lt;o:p&gt;&lt;/o:p&gt;With timestamps, all the ETL developers will need is the customer_id.&lt;span style=""&gt;  &lt;/span&gt;We can easily find the correct version of the customer in the dimension table by referring to the timestamps.&lt;/p&gt;  &lt;p class="MsoNormal" style=""&gt;&lt;o:p&gt;For this simple reason alone, I encourage designers to use time-stamps on the dimensions that are important across the enterprise--the &lt;span style="font-style: italic;"&gt;conformed dimensions&lt;/span&gt; that form the backbone of the data warehouse. &lt;/o:p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-7251522539905068687?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=7251522539905068687&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/7251522539905068687?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/7251522539905068687?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2008/03/responding-to-star-schema-detractors.html" title="Responding to Star Schema Detractors with Timestamps" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CkcNQn86fCp7ImA9WB9aGEw.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-5681324636833723036</id><published>2008-01-08T11:05:00.000-05:00</published><updated>2008-01-08T11:34:53.114-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-01-08T11:34:53.114-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Slow Changes" /><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>Have it Both Ways With Hybrid Slow Changes</title><content type="html">&lt;span style="font-style: italic;"&gt;This is the third in a series on Slowly Changing Dimensions.  It looks at an often misunderstood approach: the hybrid SCD.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When the source data for a dimension table changes, the Slowly Changing Dimension characteristics of a dimensional model dictate how the warehouse schema should respond.  In my previous two posts, I examined the three most common techniques.  (See &lt;a href="http://blog.oaktonsoftware.com/2007/10/for-slowly-changing-dimensions-change.html"&gt;Part I&lt;/a&gt; &amp;amp; &lt;a href="http://blog.oaktonsoftware.com/2007/11/how-to-document-type-3-scd-pair-of-type.html"&gt;Part II&lt;/a&gt;.) When the source value for a dimension attribute value changes:&lt;ul&gt;&lt;li&gt;Type 1 response:  Update the existing value in the dimension table &lt;/li&gt;&lt;li&gt;Type 2 response:  Add a new row to the dimension table, with a new surrogate key&lt;/li&gt;&lt;li&gt;Type 3 response:  Maintain a single row in the dimension table, with separate columns for current and previous values of the dimension attribute in question.&lt;/li&gt;&lt;/ul&gt;Sometimes, one single technique does not fit the bill.  Requirements will come along that seem to call for both Type 1 and Type 2 treatment.  What to do?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;“We always want to do it this way.  But sometimes we want to do it that way.”  &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Consider a company building a typical orders star schema for business-to-business sales.  Customer companies are represented by a customer dimension table.  You are trying to determine what the correct slow change response should be when the name of a customer company changes.  In an interview, the first thing you are told is:&lt;br /&gt;&lt;blockquote&gt;“We always want to use the most current name for the company in our analysis…”&lt;/blockquote&gt;This sounds to you like a type 1 change, which looks reasonable.  After all, you might reason, one wouldn’t want this year vs. last year comparisons to be befuddled by different company names.  But no sooner have you noted this, the speaker continues:&lt;blockquote&gt;“...except for when we want to use the name that was in effect at the time of the transaction.”&lt;/blockquote&gt;Now things look more complicated.  To provide the company name as it existed at the time of an order would require a Type 2 response.  But that would conflict with the first requirement.&lt;br /&gt;&lt;br /&gt;What to do?  Don’t jump for Type 3.  That’s not the solution either.  A type 3 response will not tell you which value was in effect at the time of a given transaction.  (It also usually does not capture more than one previous value of the attribute – see the previous post.)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Having it Both Ways&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The hybrid approach allows you to meet both requirements.  It requires that you model two dimension attributes that will capture the value in question – company name in this case.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-family:courier new;"&gt;company_name_current&lt;/span&gt; &lt;/span&gt;will capture the current name of the company.  This attribute will exhibit Type 1 behavior.  When a company name changes, it will be updated.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;font-family:courier new;" &gt;company_name_historic&lt;/span&gt; will capture the detailed history of company names.  This attribute will exhibit Type 2 behavior.  When a company name changes, a new row will be created to capture the new value, and to associate with any subsequent transactions&lt;/li&gt;&lt;/ul&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_oaFlRokTed0/R4Oii5S6dfI/AAAAAAAAABw/GTTzpxm8qnU/s1600-h/hybrid_scd_1.jpg"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_oaFlRokTed0/R4Oii5S6dfI/AAAAAAAAABw/GTTzpxm8qnU/s400/hybrid_scd_1.jpg" alt="" id="BLOGGER_PHOTO_ID_5153141119021381106" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;This means that when a hybrid change occurs, the ETL process must do at least two things:&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;ol&gt;&lt;li&gt;Update the &lt;span style="font-family:courier new;"&gt;_current&lt;/span&gt; value for all records that share the same natural key&lt;/li&gt;&lt;li&gt;Insert a new row, initializing both &lt;span style="font-family:courier new;"&gt;_current&lt;/span&gt; and &lt;span style="font-family:courier new;"&gt;_historic&lt;/span&gt; values to the new value&lt;/li&gt;&lt;/ol&gt;Years ago, it was popular to refer to this approach as a “Type 4 Change.” With the second edition of &lt;a target="_blank" href="http://www.amazon.com/gp/product/0471200247?ie=UTF8&amp;amp;tag=oaktonsoftwar-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=0471200247"&gt;&lt;span style="font-style: italic;"&gt;The Data Warehouse Toolkit&lt;/span&gt;&lt;/a&gt;, Kimball and Ross suggest we call this a hybrid change.  And for good reason:  it is really a bit of Type 1 and a bit of Type 2.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Hybrid Change in Action&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Developers often have trouble with this type of design.  To fully understand it, look what happens when the name of a company changes.&lt;br /&gt;&lt;br /&gt;Suppose that one of the customers was Apple Computer.  You may recall that in 2007, the company changed its name from Apple Computer Inc. to Apple Inc.&lt;br /&gt;&lt;br /&gt;Initially, the dimension table contains a row that looks like this:&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_oaFlRokTed0/R4OijJS6dgI/AAAAAAAAAB4/rgbjjZbfckA/s1600-h/hybrid_scd_2.jpg"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_oaFlRokTed0/R4OijJS6dgI/AAAAAAAAAB4/rgbjjZbfckA/s400/hybrid_scd_2.jpg" alt="" id="BLOGGER_PHOTO_ID_5153141123316348418" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;In 2007, Apple changed its name from “Apple Computer Inc.” to “Apple Inc.”  The next illustration shows the schema’s hybrid response in yellow:&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_oaFlRokTed0/R4OijJS6dhI/AAAAAAAAACA/PLB2UB8OX9w/s1600-h/hybrid_scd_3.jpg"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_oaFlRokTed0/R4OijJS6dhI/AAAAAAAAACA/PLB2UB8OX9w/s400/hybrid_scd_3.jpg" alt="" id="BLOGGER_PHOTO_ID_5153141123316348434" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;The first row shows the existing record.  The &lt;span style="font-family:courier new;"&gt;_current&lt;/span&gt; value is updated with the new name, “Apple Inc.”  Its &lt;span style="font-family:courier new;"&gt;_historic&lt;/span&gt; value is not touched.  The next row shows the new record that has been inserted.  It shows the new name for both &lt;span style="font-family:courier new;"&gt;_current&lt;/span&gt; and &lt;span style="font-family:courier new;"&gt;_historic&lt;/span&gt; values.  Any facts subsequently recorded will be connected to this new row.&lt;br /&gt;&lt;br /&gt;To analyze all facts (before and after the name change) using the new name, Apple Inc., you use &lt;span style="font-family:courier new;"&gt;company_name_current &lt;/span&gt;to group query results.  To study all facts in the context of the historically accurate name, group query results by &lt;span style="font-family:courier new;"&gt;company_name_historic&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Repeatable&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This process is repeatable.  If Apple once again changed the corporate name, say to “Apple Electronics, Inc.”, the ETL process for the customer table would:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Change &lt;span style="font-family:courier new;"&gt;company_name_current&lt;/span&gt; to “Apple Electronics Inc.” for &lt;span style="font-style: italic;"&gt;both &lt;/span&gt;existing rows.&lt;/li&gt;&lt;li&gt;Add a third row (with a new surrogate key)  with &lt;span style="font-family:courier new;"&gt;_current&lt;/span&gt; and &lt;span style="font-family:courier new;"&gt;_historic&lt;/span&gt; values recorded as “Apple Electronics, Inc.”  Subsequent facts would be connected to this row.&lt;/li&gt;&lt;/ol&gt;&lt;span style="font-weight: bold;"&gt;The Importance of Good Documentation&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;It is important to document the hybrid change carefully.  ETL developers must understand how it is to be loaded, and report/query developers must understand how to use it.  Otherwise, the hybrid approach can lead to confusion.&lt;br /&gt;&lt;br /&gt;And be sure that it is worth the trouble.  You might ask why people want the historic name, if 99% of the time they only need the current name. If it is because they want to be able to reproduce an invoice, perhaps that would be better served by going back to the source system.  Then the schema can sport a more simple Type 1 approach.&lt;br /&gt;&lt;br /&gt;But if a real analytic need exists, the hybrid approach is the only way to go.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:78%;"&gt;Copyright (c) 2007 Chris Adamson&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-5681324636833723036?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=5681324636833723036&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/5681324636833723036?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/5681324636833723036?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2008/01/have-it-both-ways-with-hybrid-slow.html" title="Have it Both Ways With Hybrid Slow Changes" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_oaFlRokTed0/R4Oii5S6dfI/AAAAAAAAABw/GTTzpxm8qnU/s72-c/hybrid_scd_1.jpg" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry gd:etag="W/&quot;CU4DRnkyeSp7ImA9WB9WEUk.&quot;"><id>tag:blogger.com,1999:blog-2707613555623047672.post-4249188506123785641</id><published>2007-11-15T09:20:00.000-05:00</published><updated>2007-11-15T11:26:17.791-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2007-11-15T11:26:17.791-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Slow Changes" /><category scheme="http://www.blogger.com/atom/ns#" term="Design Techniques" /><title>How To Document a Type 3 SCD:  A pair of Type 1 Responses</title><content type="html">&lt;span style="font-style: italic;"&gt;Many dimensionally-aware database modeling tools allow you to document the Slow Change characteristics of each dimension attribute.  But often, your only choices are Type 1 or Type 2.  This post describes what to do when your design calls for the less common Type 3 change. As it turns out, a Type 3 SCD response can be understood as a pair of Type 1 responses.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In a previous post, I pointed out that we &lt;span style="font-style: italic;"&gt;think &lt;/span&gt;about slow changes with respect to a source system, we usually &lt;span style="font-style: italic;"&gt;document &lt;/span&gt;them as part of the dimensional schema.  For example, if the customer's name or date of birth changes on the &lt;span style="font-style: italic;"&gt;source, &lt;/span&gt;what do we do in the star schema? A Type 1 response would be to overwrite a corresponding value in the star schema; a Type 2 response would be to insert a new row for the revised version of the customer.  We usually document this behavior by flagging each attribute in the star as "Type 1" or "Type 2", even though the "change" happens on the source.&lt;br /&gt;&lt;br /&gt;(Not sure Slowly Changing Dimensions are?  See &lt;a href="http://blog.oaktonsoftware.com/2007/10/for-slowly-changing-dimensions-change.html"&gt;For Slowly Changing Dimensions, Change Is Relative&lt;/a&gt; for an definition of the concept and explanations of the Type 1 and Type 2 techniques.)&lt;br /&gt;&lt;br /&gt;A Type 3 change calls for yet another kind of response to a change in source data.  This approach response maintains before and after versions of the value in separate dimension attributes.  Because the before and after value are stored in the same row, either version can be associated with activity (read: facts) that occurred before or after the change.&lt;br /&gt;&lt;br /&gt;Schema designers often throw up their arms in disgust when they discover that their modeling tool does not support the concept of a Type 3 change.  But this despair may not be warranted; it is possible to document a Type 3 change as a pair of Type 1 changes.&lt;br /&gt;&lt;br /&gt;Before we look at how to do this, here's a quick example of a Type 3 slow change.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Type 3 SCD: An Example&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Suppose that the dimensional model contains a dimension table called &lt;span style="font-family:courier new;"&gt;product&lt;/span&gt;, and that one of the dimension attributes is &lt;span style="font-family:courier new;"&gt;category&lt;/span&gt;. The &lt;span style="font-family:courier new;"&gt;category &lt;/span&gt;attribute carries a standard classification mechanism used to categorize products, and appears in a large number of reports.  An example of a category is "stationery."&lt;br /&gt;&lt;br /&gt;A wide variety of products have this category designation, including several types of envelope, note pad, and paper sheets. All these products have rows in the &lt;span style="font-family:courier new;"&gt;product &lt;/span&gt;dimension table with a &lt;span style="font-family:courier new;"&gt;category &lt;/span&gt;of "stationery."  Fact tables linked to the &lt;span style="font-family:courier new;"&gt;product &lt;/span&gt;dimension can be analyzed by &lt;span style="font-family:courier new;"&gt;category&lt;/span&gt;, grouping all these products under the value "stationery."&lt;br /&gt;&lt;br /&gt;Now lets assume that the business has expanded its product lines, and the old "stationery" category is a bit too broad.  Management plans to redesignate existing stationery products as "Personal Stationery" or "Business Stationery".  Going forward, they want to use the new values for all analysis, even when studying facts that were recorded prior to the changeover.  So far, sounds like a Type 1 change.&lt;br /&gt;&lt;br /&gt;The catch: some planning numbers are tied to the old designations.  For at least a while, management will need the ability to use the old value to study facts, whether they were recorded before or after the change.  (A type 2 response does not help here; it associates all facts recorded before the change with the old value, and all facts recorded after the change with the new value.)  And, now that they've thought about it some more, it might be nice to be able to do this every year.&lt;br /&gt;&lt;br /&gt;The solution is to maintain two dimensional attributes for the category.  The first is meant to carry the current category designation, and might be called &lt;span style="font-family:courier new;"&gt;category_current&lt;/span&gt;.  The second is meant to carry the old category designation, and might be called &lt;span style="font-family:courier new;"&gt;category_previous&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;When the category for a product changes, say from "stationery" to "business stationery", we update the &lt;span style="font-family:courier new;"&gt;category_current&lt;/span&gt; to contain the new value "business stationery" and place the old value, "stationery" in &lt;span style="font-family:courier new;"&gt;category_previous&lt;/span&gt;.  When management wants to study facts using the new designations, their reports should group facts using the &lt;span style="font-family:courier new;"&gt;category_current&lt;/span&gt; dimension attribute.  When they want to study facts using the old categorization scheme, they can use the &lt;span style="font-family:courier new;"&gt;category_previous&lt;/span&gt; attribute.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Documenting Type 3 Change&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As discussed in the previous post, we often document slow change response on an attribute by attribute basis in the dimensional model.  For our product example, this might be done as shown in the following illustration.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_oaFlRokTed0/RzxnQxMf8AI/AAAAAAAAABI/OlLt0ZyuohE/s1600-h/scd3-1.jpg"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_oaFlRokTed0/RzxnQxMf8AI/AAAAAAAAABI/OlLt0ZyuohE/s320/scd3-1.jpg" alt="" id="BLOGGER_PHOTO_ID_5133091213076852738" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;Many modeling tools allow us to document the SCD characteristic of a column, offering a pick list that allows us to choose what type of response is expected when the source of an attribute changes.  But what if your tool does not support Type 3?   Or what if you have a tool that automates the construction of an ETL routine, aggregate table or OLAP cube, and the tool does not offer the Type 3 option?&lt;br /&gt;&lt;br /&gt;There may be a solution.  Notice that when a type 3 change occurs, we update both versions of the attribute.  In our example, &lt;span style="font-family: courier new;"&gt;category_current&lt;/span&gt; is updated withe the new name of the product, and category_previous is updated with the prior value (which is the old value of category_current.)  We effectively have a pair of Type 1 attributes, and can document them as shown below.  I've added a highlight to illustrate the pair of attributes in question.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_oaFlRokTed0/RzxpCBMf8BI/AAAAAAAAABQ/Ba4j46LYX2s/s1600-h/scd3-2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_oaFlRokTed0/RzxpCBMf8BI/AAAAAAAAABQ/Ba4j46LYX2s/s320/scd3-2.jpg" alt="" id="BLOGGER_PHOTO_ID_5133093158697037842" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Of course, if you have tools that use this information to do something, such as generate a cube or ETL script, you need to be sure this method will not be disruptive.  You will probably need to provide sources for each of the attributes, and it may be challenging to define a source for &lt;span style="font-family:courier new;"&gt;category_previous&lt;/span&gt;, which will probably not be present in the source after the change has occurred.   The solution may be able to define its source as the the &lt;span style="font-family: courier new;"&gt;category_current&lt;/span&gt; value of the product dimension table itself.  This, in turn, may require that you join to a copy of the dimension when extracting source data.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Repeatable Process&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To round out this discussion of the Type 3 change, I should point out that this slow change process is repeatable.  Next year, if the designations change again, we can move &lt;span style="font-family: courier new;"&gt;category_current&lt;/span&gt; values into &lt;span style="font-family: courier new;"&gt;category_previous&lt;/span&gt;, and place the new values in &lt;span style="font-family: courier new;"&gt;category_current&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Note that we always have access to two values:  the current value and the prior value.  Not enough?  Its also possible to maintain more than two columns for a type 3 response.  We might choose to maintain columns called &lt;span style="font-family: courier new;"&gt;category_current, category_last_year, category_2yrs_ago,&lt;/span&gt; etc.  Now, any or all of the values can be used to study all facts, regardless of when they were recorded.   (A scheme like &lt;span style="font-family: courier new;"&gt;category_2007, category_2006, category_2005&lt;/span&gt; might work as well, but might require you to redesign reports each year to use the correct attributes.)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;More Changes on the Way&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;That's not quite the end of the story of SCD's.  In future posts, I will look at the popular "timestamped" and "hybrid" approaches, as well as run down a few advanced concepts and caveats.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:78%;"&gt;Copyright (c) 2007 Chris Adamson&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;Copyright (c) 2007-2010 Chris Adamson, except where noted.

www.oaktonsoftware.com | www.chrisadamson.net | www.starschemacentral.com&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2707613555623047672-4249188506123785641?l=blog.oaktonsoftware.com' alt='' /&gt;&lt;/div&gt;</content><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=2707613555623047672&amp;postID=4249188506123785641&amp;isPopup=true" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/4249188506123785641?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/2707613555623047672/posts/default/4249188506123785641?v=2" /><link rel="alternate" type="text/html" href="http://blog.oaktonsoftware.com/2007/11/how-to-document-type-3-scd-pair-of-type.html" title="How To Document a Type 3 SCD:  A pair of Type 1 Responses" /><author><name>Chris Adamson</name><uri>http://www.blogger.com/profile/14386002073556773411</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="00542310086779199689" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_oaFlRokTed0/RzxnQxMf8AI/AAAAAAAAABI/OlLt0ZyuohE/s72-c/scd3-1.jpg" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry></feed>
