<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:pingback="http://madskills.com/public/xml/rss/module/pingback/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
  <channel>
    <title>Miky Schreiber's Blog</title>
    <link>http://www.miky-schreiber.com/Blog/</link>
    <description>Business Intelligence, Analysis Services, MDX, DataWarehousing and more...</description>
    <language>en-us</language>
    <copyright>Miky Schreiber</copyright>
    <lastBuildDate>Wed, 14 Sep 2011 14:23:04 GMT</lastBuildDate>
    <generator>newtelligence dasBlog 1.9.6264.0</generator>
    <managingEditor>miky@zahav.net.il</managingEditor>
    <webMaster>miky@zahav.net.il</webMaster>
    <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/MikySchreiberBlog" /><feedburner:info uri="mikyschreiberblog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=78c3349c-514d-476e-8fa9-36b004b40f23</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,78c3349c-514d-476e-8fa9-36b004b40f23.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,78c3349c-514d-476e-8fa9-36b004b40f23.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=78c3349c-514d-476e-8fa9-36b004b40f23</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
You must be wondering why this blog wasn't updated for a long time. The reason is
that I started a new job. :-) Along with other stuff, I'm playing with QlikView and
I'd like to share some interesting ideas.
</p>
        <p>
In QlikView, The list boxes which filters the entire sheet (or document)
is a very common thing. I thought that if we have a box list with a small amount of
values, it might be more beautiful to replace each value with an image. For exmaple,
let's say that we have three divisions in the company. Instead of putting a list-box
with their names, we can put three images with their logos and make that a click on
the image will filter on these values (very simple with the Actions tab). The problem
is that the user won't know on which division the sheet is filtered on. That's why
we need to make the other images more transparent. Using macro, that's very easy and
the result is awsome:
</p>
        <p>
sub SetTransparentImage<br />
  set image = ActiveDocument.GetSheetObject("ImageIDHere")<br />
  set imageProps = image.GetProperties<br />
  imageProps.Layout.BkgAlpha = 50<br />
  image.setProperties imageProps<br />
end sub 
</p>
        <p>
Now, put that in the macro script inside your document. In order to add images to
your sheet, add new Text boxes and set their background to the image you want. In
the Actions tab, set the desired value on click and also run this subroutine. 
</p>
Enjoy.<img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=78c3349c-514d-476e-8fa9-36b004b40f23" /><xhtml:img xmlns:xhtml="http://www.w3.org/1999/xhtml" src="http://feeds.feedburner.com/~r/MikySchreiberBlog/~4/n1rUJd4PF-A" height="1" width="1" /></body>
      <title>Playing with image transparency in QlikView</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,78c3349c-514d-476e-8fa9-36b004b40f23.aspx</guid>
      <link>http://feedproxy.google.com/~r/MikySchreiberBlog/~3/n1rUJd4PF-A/PermaLink,guid,78c3349c-514d-476e-8fa9-36b004b40f23.aspx</link>
      <pubDate>Wed, 14 Sep 2011 14:23:04 GMT</pubDate>
      <description>&lt;p&gt;
You must be wondering why this blog wasn't updated for a long time. The reason is
that I started a new job. :-)&amp;nbsp;Along with other stuff, I'm playing with QlikView&amp;nbsp;and
I'd like to share some interesting ideas.
&lt;/p&gt;
&lt;p&gt;
In QlikView, The&amp;nbsp;list&amp;nbsp;boxes&amp;nbsp;which filters the entire sheet (or document)
is a very common thing. I thought that if we have a box list with a small amount of
values, it might be more beautiful to replace each value with an image. For exmaple,
let's say that we have three divisions in the company. Instead of putting a list-box
with their names, we can put three images with their logos and make that a click on
the image will filter on these values (very simple with the Actions tab). The problem
is that the user won't know on which division the sheet is filtered on. That's why
we need to make the other images more transparent. Using macro, that's very easy and
the result is awsome:
&lt;/p&gt;
&lt;p&gt;
sub SetTransparentImage&lt;br&gt;
&amp;nbsp; set image = ActiveDocument.GetSheetObject("ImageIDHere")&lt;br&gt;
&amp;nbsp; set imageProps = image.GetProperties&lt;br&gt;
&amp;nbsp; imageProps.Layout.BkgAlpha = 50&lt;br&gt;
&amp;nbsp; image.setProperties imageProps&lt;br&gt;
end sub 
&lt;/p&gt;
&lt;p&gt;
Now, put that in the macro script inside your document. In order to add images to
your sheet, add new Text boxes and set their background to the image you want. In
the Actions tab, set the desired value on click and also run this subroutine. 
&lt;/p&gt;
Enjoy.&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=78c3349c-514d-476e-8fa9-36b004b40f23" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,78c3349c-514d-476e-8fa9-36b004b40f23.aspx</comments>
      <category>BI;BI/QlikView</category>
    <feedburner:origLink>http://www.miky-schreiber.com/Blog/PermaLink,guid,78c3349c-514d-476e-8fa9-36b004b40f23.aspx</feedburner:origLink></item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=52bedb05-38a4-4f77-8bc7-1fe7b9f78a75</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,52bedb05-38a4-4f77-8bc7-1fe7b9f78a75.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,52bedb05-38a4-4f77-8bc7-1fe7b9f78a75.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=52bedb05-38a4-4f77-8bc7-1fe7b9f78a75</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
The problems: A. Storage costs money, much money. B. Today's storage is very slow
and we want fast retrieve of data.
</p>
        <p>
The Solution: Take the commonly-used data and put it in very fast and expensive storage.
Take the less used data and put it on slower and cheaper storage. The overall budget
should be much lower.
</p>
        <p>
          <a href="http://hosteddocs.ittoolbox.com/os_us_en_eb_tsdummies.pdf">Link</a> (pdf)
</p>
        <img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=52bedb05-38a4-4f77-8bc7-1fe7b9f78a75" />
      <xhtml:img xmlns:xhtml="http://www.w3.org/1999/xhtml" src="http://feeds.feedburner.com/~r/MikySchreiberBlog/~4/gNaNz9m5qxo" height="1" width="1" /></body>
      <title>Great Storage Tiering E-Book</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,52bedb05-38a4-4f77-8bc7-1fe7b9f78a75.aspx</guid>
      <link>http://feedproxy.google.com/~r/MikySchreiberBlog/~3/gNaNz9m5qxo/PermaLink,guid,52bedb05-38a4-4f77-8bc7-1fe7b9f78a75.aspx</link>
      <pubDate>Thu, 23 Jun 2011 17:42:33 GMT</pubDate>
      <description>&lt;p&gt;
The problems: A. Storage costs money, much money. B. Today's storage is very slow
and we want fast retrieve of data.
&lt;/p&gt;
&lt;p&gt;
The Solution: Take the commonly-used data and put it in very fast and expensive&amp;nbsp;storage.
Take the less used data and put it on slower and cheaper storage. The overall budget
should be much lower.
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://hosteddocs.ittoolbox.com/os_us_en_eb_tsdummies.pdf"&gt;Link&lt;/a&gt;&amp;nbsp;(pdf)
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=52bedb05-38a4-4f77-8bc7-1fe7b9f78a75" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,52bedb05-38a4-4f77-8bc7-1fe7b9f78a75.aspx</comments>
      <category>BI;BI/DataWarehousing</category>
    <feedburner:origLink>http://www.miky-schreiber.com/Blog/PermaLink,guid,52bedb05-38a4-4f77-8bc7-1fe7b9f78a75.aspx</feedburner:origLink></item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=ef80cf5a-b024-4626-b7b3-e67ed82cd2f3</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,ef80cf5a-b024-4626-b7b3-e67ed82cd2f3.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,ef80cf5a-b024-4626-b7b3-e67ed82cd2f3.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=ef80cf5a-b024-4626-b7b3-e67ed82cd2f3</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
For the google search: Informatica service start and stops
</p>
        <p>
Yeah, that's exactly what happened to me. Everything looked just fine until the last
step of the installation, where the installer pings the new integration service, but
there is no answer.<br />
Looking at the services manager, I see that the InformaticaService is stopped. When
I restart it, it stops after 5-10 seconds. I thought it's because of the service's
account settings, but it's not. After some time of search and re-attempts to make
it work, this is what I can say if you're stuck as I was:<br /></p>
        <ul>
          <li>
While you're in the installation wizard, on the node configuration step, press
the "Advances Settings" button. There, you'll see the min. and max. port numbers.
Go back to the node configuration and change the port number so it will be in that
range (why it wasn't in the first place? ...)</li>
          <li>
The Windows PATH environment variable <strong>must</strong> contain the informatica
server binary folder: &lt;INFORMATICA_INSTALL_DIR&gt;\Server\bin 
</li>
          <li>
If you're still stuck, look after the service.log file in the informatica directory.
Look at the error and google it. 
</li>
        </ul>
Hope I saved somebody a few hours. 
<p /><img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=ef80cf5a-b024-4626-b7b3-e67ed82cd2f3" /><xhtml:img xmlns:xhtml="http://www.w3.org/1999/xhtml" src="http://feeds.feedburner.com/~r/MikySchreiberBlog/~4/0pnNhof9Fjs" height="1" width="1" /></body>
      <title>Installing Informatica 8.6 on Windows 7</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,ef80cf5a-b024-4626-b7b3-e67ed82cd2f3.aspx</guid>
      <link>http://feedproxy.google.com/~r/MikySchreiberBlog/~3/0pnNhof9Fjs/PermaLink,guid,ef80cf5a-b024-4626-b7b3-e67ed82cd2f3.aspx</link>
      <pubDate>Wed, 15 Jun 2011 20:57:43 GMT</pubDate>
      <description>&lt;p&gt;
For the google search: Informatica service start and stops
&lt;/p&gt;
&lt;p&gt;
Yeah, that's exactly what happened to me. Everything looked just fine until the last
step of the installation, where the installer pings the new integration service, but
there is no answer.&lt;br&gt;
Looking at the services manager, I see that the InformaticaService is stopped. When
I restart it, it stops after 5-10 seconds. I thought it's because of the service's
account settings, but it's not. After some time of search and re-attempts to make
it work, this is what I can say if you're stuck as I was:&lt;br&gt;
&lt;ul&gt;
&lt;li&gt;
While you're in the installation wizard,&amp;nbsp;on the node configuration step, press
the "Advances Settings" button. There, you'll see the min. and max. port numbers.
Go back to the node configuration and change the port number so it will be in that
range (why it wasn't in the first place? ...)&lt;/li&gt;
&lt;li&gt;
The Windows PATH environment variable &lt;strong&gt;must&lt;/strong&gt; contain the informatica
server binary folder: &amp;lt;INFORMATICA_INSTALL_DIR&amp;gt;\Server\bin 
&lt;/li&gt;
&lt;li&gt;
If you're still stuck, look after the service.log file in the informatica directory.
Look at the error and google it. 
&lt;/li&gt;
&lt;/ul&gt;
Hope I saved somebody a few hours. 
&lt;p&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=ef80cf5a-b024-4626-b7b3-e67ed82cd2f3" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,ef80cf5a-b024-4626-b7b3-e67ed82cd2f3.aspx</comments>
      <category>BI;BI/Informatica</category>
    <feedburner:origLink>http://www.miky-schreiber.com/Blog/PermaLink,guid,ef80cf5a-b024-4626-b7b3-e67ed82cd2f3.aspx</feedburner:origLink></item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=f55fb614-9c1f-4b9d-9767-b64d688dbdc0</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,f55fb614-9c1f-4b9d-9767-b64d688dbdc0.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,f55fb614-9c1f-4b9d-9767-b64d688dbdc0.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=f55fb614-9c1f-4b9d-9767-b64d688dbdc0</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
Yes, and it's great!
</p>
        <p>
Finally, the semantic web is here. This means that from now on, the search engines
won't have to guess what is the content of the (our?) web pages. The answer will be
there, behind the scenes. 
<br />
Technically, we (the web developers) will add new properties to our HTML tags that
will describe the content of the text. For example, you can add a property to your
DIV or SPAN that will tell the search engine that this is an article, creative idea
or a critique.
</p>
        <p>
What is the meaning of this for us, the web developers or independent bloggers? That
depends. If you want to make your blog (or site) to appear in more relevant search
results, you should add these properties. Yes, you'll have to manually edit your HTML
or to make a template. Of course you don't have to do this.<br />
For those who want to have a deeper understanding of this subject, go to the common
standard site - <a href="http://schema.org">Schema.org </a></p>
        <img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=f55fb614-9c1f-4b9d-9767-b64d688dbdc0" />
      <xhtml:img xmlns:xhtml="http://www.w3.org/1999/xhtml" src="http://feeds.feedburner.com/~r/MikySchreiberBlog/~4/JQJgC-TGpUA" height="1" width="1" /></body>
      <title>Cooperation of Google, Microsoft and Yahoo - Can it be?</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,f55fb614-9c1f-4b9d-9767-b64d688dbdc0.aspx</guid>
      <link>http://feedproxy.google.com/~r/MikySchreiberBlog/~3/JQJgC-TGpUA/PermaLink,guid,f55fb614-9c1f-4b9d-9767-b64d688dbdc0.aspx</link>
      <pubDate>Thu, 02 Jun 2011 20:08:33 GMT</pubDate>
      <description>&lt;p&gt;
Yes, and it's great!
&lt;/p&gt;
&lt;p&gt;
Finally, the semantic web is here. This means that from now on, the search engines
won't have to guess what is the content of the (our?) web pages. The answer will be
there, behind the scenes. 
&lt;br&gt;
Technically, we (the web developers) will add new properties to our HTML tags that
will describe the content of the text. For example, you can add a property to your
DIV or SPAN that will tell the search engine that this is an article, creative idea
or a critique.
&lt;/p&gt;
&lt;p&gt;
What is the meaning of this for us, the web developers or independent bloggers? That
depends. If you want to make your blog (or site) to appear in more relevant search
results, you should add these properties. Yes, you'll have to manually edit your HTML
or to make a template. Of course you don't have to do this.&lt;br&gt;
For those who want to have a deeper understanding of this subject, go to the common
standard site - &lt;a href="http://schema.org"&gt;Schema.org &lt;/a&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=f55fb614-9c1f-4b9d-9767-b64d688dbdc0" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,f55fb614-9c1f-4b9d-9767-b64d688dbdc0.aspx</comments>
      <category>Programming;Programming/Web</category>
    <feedburner:origLink>http://www.miky-schreiber.com/Blog/PermaLink,guid,f55fb614-9c1f-4b9d-9767-b64d688dbdc0.aspx</feedburner:origLink></item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=76280280-4f12-41f1-ad29-698315adca9f</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,76280280-4f12-41f1-ad29-698315adca9f.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,76280280-4f12-41f1-ad29-698315adca9f.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=76280280-4f12-41f1-ad29-698315adca9f</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">After six years of work with SSAS, I decided
to start re-reading the best book on the subject - Microsoft SQL Server 2005 Analysis
Services. Why read about 2005 when there are more recent versions? That's because
this is the most comprehensive book on the subject and the only one that was written
by the SSAS's development team. The book gives you a better understanding of the technlogy
and shows you the motivation for every architectual decision of the creators. While
reading, I come across subjects that I wanted to post about and I believe that along
with reading the book I'll post more. 
<br /><center><img src="http://www.miky-schreiber.com/blog/content/binary/ssas2005bookcover.jpg" height="50%" /><br />
The book cover
</center><p>
At the university, I was taught that half crazy is more dangerous than crazy. The
reason is simple: You know that you should always be careful from the crazy guy, but
regarding the half crazy person you don't know when to be careful and when to learn
from him. Information systems are the same and especially DWH: There is no half truth
or not precise truth. If you know that there's a data problem - bubble it up and don't
show you user half a truth (because it's also half a lie). 
</p><p>
Every time we bind a DB column to a dimension attibute, a DataItem object is created.
The DataItem defines the connection between the conceptual and the physical model.
For example, it defines from which table and column the data is taken. Along with
other properties, there is one important property called <b>NullProcessing</b> which
defines how the server behaves with null data. The default is <b>ZeroOrBlank</b>,
meaning that the null data will be shown to the end-user as zero (if the DataType
is not string) or empty string (if the DataType is string). The problem is in case
where the attribute is a parent in an hierarchy. This is what happened:<br />
We defined "Sales Amount" measure under the hierarchy Category -&gt; Sub-Category
-&gt; Product. By mistake, one of the members in the categoty level got null value
and it was shown as zero in the cube. The user was used to see the Shoes sub-category
under the Clothes category. One day, he seen that the shoes sales amount was decreased
dramatically - He seen products under the Shoes sub-category which is under the Clothes
category but the sub-total was decreased. In the bottom of the table other products
were "hidden" with the Shoes sub-category but with the zero category. The reason is
that some of the shoes products were given null categories and in the report the shoes
products were split between the Clothes category and the zero category. The user got
half truth and he interpreted it as wrong data and that should never happen in a BI
system. 
</p><p>
What can we do? In my opinion, in most cases we need to set NullProcessing as Error,
which will throw an error and prevent us from showing wrong data. This is how we (the
IT people) will know about the problem before the end-users. 
<br />
There are more options to NullProcessing: 
</p><ul><li>
Preserve - keeps the null value with no replacement. As I shown, this is not recommended.
Furthermore, preserve uses more system resources.</li><li>
Unknown Member - replaces the null with the unkown member you defined in the dimension.
I recommend using this option only if the users know that the unknown member exists
and what it means.</li><li>
Automatic - The server will select the most appropriate behaviour according to the
context. I don't really understand what it means and the writers don't explain it,
but generally it is recommended to define deterministic behavior to your system.</li></ul><p>
Does it mean that now we'll go over all our data bindings and change their
NullProcessing proprety? Of course not ! We need to think where this behavior has
a bad effect and where it doesn't matter. For example, in a key attribute of a dimension
which is also a leaf level of an hierarchy, the ZeroOrBlank option is quite good.
</p><img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=76280280-4f12-41f1-ad29-698315adca9f" /><xhtml:img xmlns:xhtml="http://www.w3.org/1999/xhtml" src="http://feeds.feedburner.com/~r/MikySchreiberBlog/~4/3xLhB_QQ9Oc" height="1" width="1" /></body>
      <title>Analysis Services NullProcessing Property</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,76280280-4f12-41f1-ad29-698315adca9f.aspx</guid>
      <link>http://feedproxy.google.com/~r/MikySchreiberBlog/~3/3xLhB_QQ9Oc/PermaLink,guid,76280280-4f12-41f1-ad29-698315adca9f.aspx</link>
      <pubDate>Mon, 23 May 2011 10:56:43 GMT</pubDate>
      <description>After six years of work with SSAS, I decided to start re-reading the best book on the subject - Microsoft SQL Server 2005 Analysis Services. Why read about 2005 when there are more recent versions? That's because this is the most comprehensive book on the subject and the only one that was written by the SSAS's development team. The book gives you a better understanding of the technlogy and shows you the motivation for every architectual decision of the creators. While reading, I come across subjects that I wanted to post about and I believe that along with reading the book I'll post more. &lt;br&gt;
&lt;center&gt;&lt;img src="http://www.miky-schreiber.com/blog/content/binary/ssas2005bookcover.jpg" height="50%"&gt;
&lt;br&gt;
The book cover
&lt;/center&gt;
&lt;p&gt;
At the university, I was taught that half crazy is more dangerous than crazy. The
reason is simple: You know that you should always be careful from the crazy guy, but
regarding the half crazy person you don't know when to be careful and when to learn
from him. Information systems are the same and especially DWH: There is no half truth
or not precise truth. If you know that there's a data problem - bubble it up and don't
show you user half a truth (because it's also half a lie). 
&lt;p&gt;
Every time we bind a DB column to a dimension attibute, a DataItem object is created.
The DataItem defines the connection between the conceptual and the physical model.
For example, it defines from which table and column the data is taken. Along with
other properties, there is one important property called &lt;b&gt;NullProcessing&lt;/b&gt; which
defines how the server behaves with null data. The default is &lt;b&gt;ZeroOrBlank&lt;/b&gt;,
meaning that the null data will be shown to the end-user as zero (if the DataType
is not string) or empty string (if the DataType is string). The problem is in case
where the attribute is a parent in an hierarchy. This is what happened:&lt;br&gt;
We defined "Sales Amount" measure under the hierarchy Category -&amp;gt; Sub-Category
-&amp;gt; Product. By mistake, one of the members in the categoty level got null value
and it was shown as zero in the cube. The user was used to see the Shoes sub-category
under the Clothes category. One day, he seen that the shoes sales amount was decreased
dramatically - He seen products under the Shoes sub-category which is under the Clothes
category but the sub-total was decreased. In the bottom of the table other products
were "hidden" with the Shoes sub-category but with the zero category. The reason is
that some of the shoes products were given null categories and in the report the shoes
products were split between the Clothes category and the zero category. The user got
half truth and he interpreted it as wrong data and that should never happen in a BI
system. 
&lt;p&gt;
What can we do? In my opinion, in most cases we need to set NullProcessing as Error,
which will throw an error and prevent us from showing wrong data. This is how we (the
IT people) will know about the problem before the end-users. 
&lt;br&gt;
There are more options to NullProcessing: 
&lt;ul&gt;
&lt;li&gt;
Preserve - keeps the null value with no replacement. As I shown, this is not recommended.
Furthermore, preserve uses more system resources.&lt;/li&gt;
&lt;li&gt;
Unknown Member - replaces the null with the unkown member you defined in the dimension.
I recommend using this option only if the users know that the unknown member exists
and what it means.&lt;/li&gt;
&lt;li&gt;
Automatic - The server will select the most appropriate behaviour according to the
context. I don't really understand what it means and the writers don't explain it,
but generally it is recommended to define deterministic behavior to your system.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
Does it mean that now we'll go over all&amp;nbsp;our data bindings&amp;nbsp;and change their
NullProcessing proprety? Of course not ! We need to think where this behavior has
a bad effect and where it doesn't matter. For example, in a key attribute of a dimension
which is also a leaf level of an hierarchy, the ZeroOrBlank option is quite good.
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=76280280-4f12-41f1-ad29-698315adca9f" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,76280280-4f12-41f1-ad29-698315adca9f.aspx</comments>
      <category>BI;BI/SQL Server 2005;BI/SQL Server 2005/Analysis Services;BI/SQL Server 2008;BI/SQL Server 2008/Analysis Services</category>
    <feedburner:origLink>http://www.miky-schreiber.com/Blog/PermaLink,guid,76280280-4f12-41f1-ad29-698315adca9f.aspx</feedburner:origLink></item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=b85543f6-d372-4590-9322-0fe867f1872e</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,b85543f6-d372-4590-9322-0fe867f1872e.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,b85543f6-d372-4590-9322-0fe867f1872e.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=b85543f6-d372-4590-9322-0fe867f1872e</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
We're doing something very useful at our workplace and because I believe in collaboration
I would like to share it with you, dear readers. I'm sure it will help a lot of system
analysts and developers.
</p>
        <p>
When we write down the design of the ETL process, especially regarding complex fact
tables, we find it necessary to write it down as a flowchart. This creates a simple
and common language between the designer/analyst and the developer and even with the
customer. It's much easier for everyone to read a flowchart rather than reading a
long and complex document. Therefore, we create diagrams such as these using Visio
(this diagram was created using Google Docs, but it's preety much the same):<br /><img align="middle" src="http://www.miky-schreiber.com/Blog/content/binary/SampleFlowchart.jpg" width="500" /></p>
        <p>
I only brought here the beginning of the diagram. The real one is 3 times bigger. 
<br />
Note that every leaf in the tree has its own number. These numbers indicates the record's
"case number". Now for the trick: In the target table (fact, dimension or anything
else) we add a new column that contains the number of the case, according to the numbers
shown in the diagram. This field is not exposed to the end-user. The pros of this
trick are very significant: 
</p>
        <ul>
          <li>
Already in the early stages of the implementation we can check the statistics, meaning
how many times each case happens and see if we got what we expected. This can help
us find new things that we (and sometimes even the users) weren't aware of and
find bugs in early stage of the development. 
</li>
          <li>
Each record can "tell its story" and tell us "why it was created". This saves us a
lot of effort when we want to question the data and find the source of the problems. 
</li>
          <li>
During the tests we check two different things: Whether each record was "identified"
correctly by the ETL (by checking the case number) and whether the values entered
into the other fields are correct. While once we used to check only the values correctness
and then find the problems, now we can detect the problems much faster.</li>
        </ul>
After we came up with this idea, I thought of another one: Why won't we expose this
new field to the users/analysts and let them "enjoy" this knowledge? Maybe these statistics
can also help them with the data analysis? This requires us to build "cases" dimension
and add a description for every case number. I don't think that this is a good idea
in any circumstances, but you should know that it's an option. 
<p /><img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=b85543f6-d372-4590-9322-0fe867f1872e" /><xhtml:img xmlns:xhtml="http://www.w3.org/1999/xhtml" src="http://feeds.feedburner.com/~r/MikySchreiberBlog/~4/62nbFywGzTk" height="1" width="1" /></body>
      <title>Putting the Design Into the Implementation</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,b85543f6-d372-4590-9322-0fe867f1872e.aspx</guid>
      <link>http://feedproxy.google.com/~r/MikySchreiberBlog/~3/62nbFywGzTk/PermaLink,guid,b85543f6-d372-4590-9322-0fe867f1872e.aspx</link>
      <pubDate>Tue, 29 Mar 2011 13:20:39 GMT</pubDate>
      <description>&lt;p&gt;
We're doing something very useful at our workplace and because I believe in collaboration
I would like to share it with you, dear readers. I'm sure it will help a lot of system
analysts and developers.
&lt;/p&gt;
&lt;p&gt;
When we write down the design of the ETL process, especially regarding complex fact
tables, we find it necessary to write it down as a flowchart. This creates a simple
and common language between the designer/analyst and the developer and even with the
customer. It's much easier for everyone to read a flowchart rather than reading&amp;nbsp;a
long and complex document. Therefore, we create diagrams such as these using Visio
(this diagram was created using Google Docs, but it's preety much the same):&lt;br&gt;
&lt;img align=middle src="http://www.miky-schreiber.com/Blog/content/binary/SampleFlowchart.jpg" width=500&gt;
&lt;/p&gt;
&lt;p&gt;
I only brought here the beginning of the diagram. The real one is 3 times bigger. 
&lt;br&gt;
Note that every leaf in the tree has its own number. These numbers indicates the record's
"case number". Now for the trick: In the target table (fact, dimension or anything
else) we add a new column that contains the number of the case, according to the numbers
shown in the diagram. This field is not exposed to the end-user. The pros of this
trick are very significant: 
&lt;ul&gt;
&lt;li&gt;
Already in the early stages of the implementation we can check the statistics, meaning
how many times each case happens and see if we got what we expected. This can help
us find new things that&amp;nbsp;we (and sometimes even the users) weren't aware of and
find bugs in early stage of the development. 
&lt;li&gt;
Each record can "tell its story" and tell us "why it was created". This saves us a
lot of effort when we want to question the data and find the source of the problems. 
&lt;li&gt;
During the tests we check two different things: Whether each record was "identified"
correctly by the ETL (by checking the case number) and whether the values entered
into the other fields are correct. While once we used to check only the values correctness
and then find the problems, now we can detect the problems much faster.&lt;/li&gt;
&lt;/ul&gt;
After we came up with this idea, I thought of another one: Why won't we expose this
new field to the users/analysts and let them "enjoy" this knowledge? Maybe these statistics
can also help them with the data analysis? This requires us to build "cases" dimension
and add a description for every case number. I don't think that this is a good idea
in any circumstances, but you should know that it's an option. 
&lt;p&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=b85543f6-d372-4590-9322-0fe867f1872e" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,b85543f6-d372-4590-9322-0fe867f1872e.aspx</comments>
      <category>BI;BI/DataWarehousing;Tips &amp; Tricks</category>
    <feedburner:origLink>http://www.miky-schreiber.com/Blog/PermaLink,guid,b85543f6-d372-4590-9322-0fe867f1872e.aspx</feedburner:origLink></item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=a81e3c13-cba1-4a2c-94e6-4adee48daca5</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,a81e3c13-cba1-4a2c-94e6-4adee48daca5.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,a81e3c13-cba1-4a2c-94e6-4adee48daca5.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=a81e3c13-cba1-4a2c-94e6-4adee48daca5</wfw:commentRss>
      <slash:comments>2</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p align="left">
          <em>This post is the fifth (and last, I think) in a <a href="http://www.miky-schreiber.com/Blog/CategoryView,category,BI%2cHR.aspx">series</a></em>
          <em> about HR
BI systems. In the <a href="http://www.miky-schreiber.com/Blog/PermaLink,guid,d03645f4-3c2a-4971-83f7-357ffb5757ca.aspx">previous
post</a></em>
          <em> I've shown how to make the BI system smarter by cleaning records
and finding Super-Transactions. In this post I'll complete a few last points and introduce
you to some interesting reports I built.</em>
        </p>
        <p align="left">
          <u>Finding Super Transactions</u>
        </p>
        <p align="left">
In the last post, I introduced you the concept of Super Transactions (ST). I explained
how to make the ETL process find them in the raw data and how to model them in the
DWH and let the end-users use them in their analysis. However, I didn't explain how
to find them in the initial design phase.
</p>
        <p align="left">
          <strong>Objective:</strong> Find the material processes that happens to many people.<br /><strong>The main problem:</strong> There are many people and we don't know where to
start looking! We should think about a common denominator to all the processes in
order to find them.<br /><strong>The key observation:</strong><strong>People are boring</strong>. Meaning,
in terms of their statuses or various trailers. Think about your personal diary (in
terms of what we talked about in the previous posts), or the "log" of your life. Think
how many times you changed your martial status, salary, workplace status, etc. Yes,
you have no more than a few records. You change something only once in a couple of
years (in the best case).<br />
On the other hand, when an employee undergoes a material change (ST), there are many
changes that happens (almost) at once in different appendixes (or subjects). When
the student changes his position in the university, he changes in short time his position,
salary, courses and more.<br />
Let's show it in a simple graph. The X axis is the time line and the Y axis is the
number of appendixes that changed:<br /></p>
        <center>
          <img src="http://www.miky-schreiber.com/Blog/content/binary/ST_occurence_graph.JPG" />
        </center>
        <br />
This shows us that most of the time our life is static and suddenly you find there
are "peaks" that indicates that something "big" happened. This is where we'll look
for the STs.<br /><strong>How we'll find them?</strong> I'll take you back to the advanced modeling
of the project. For every record in the diary we added end_effective_date column which
has the date of the next record minus one day. That enable us to write between statements
and make our life more simple. Back to our case: using analytic functions, I added
a field that says how much time passed between the current and the previous record.
As I said, the number are very large in most cases. The cases where there's some records
with small numbers are the cases we are looking for. It says that something important
happended. Using this method I found eight out of nine (!!) which implemented in the
system. 
<p /><p align="left"><u>Incremental Load</u></p><p align="left">
In an early stage of the project we realized that it will be very "unwise" to load
all the people all the time. As I mentioned, most of the people don't change every
week. The problem is that incremental load of the diary ("log") of people is not simple
at all. The first reason is that we need to update the end_effective_date and ind_current
of the last row, but that's not difficult. The second and main reason is that updating
and adding one record is not enough: in case that the man was in a middle of a ST
we need to re-calculate the ST. This is why we decided to completely delete the records
of the people that changed (in one or more appendixes) and rebuild them from scratch.
This is why we run the ETL only on for the people that changed. This solution is very
simple and easy to implement. All we need to do is to delete these people's records
from all three diaries and then run the ETL only on them. Thus, the ETL process is
very short (half hour total) in every night and only the first run (on all people)
is (very...) long. 
</p><p align="left"><u>Interesting Reports</u></p><p align="left">
I'll show here some interesting reports I built for the users in order to stimulate
your thinking: 
</p><ul><li><strong>Organizational Pyramid</strong> - the ratio, over the years, between the number
of different populations. For example, the ratio of senior lecturers, lecturers and
students. The report is shown as stacked graph bar. The graph answers the basic question:
Do we have enough students to make them enough teachers in the coming years? Do we
have enough lecturers to make them enough senior lecturers in the coming future? In
this picture I built a very similar graph in Excel that shows how many students of
every year we have. 
<br /><center><img src="http://www.miky-schreiber.com/Blog/content/binary/Organizational_Pyramid_Graph.JPG" /></center><br /></li><li><strong>Students Rift</strong> - For every faculty, the gap between the number of
students that the university wanted to have and the actual number of students. You
can change the time parameter and see the screen as it was on another date. You can
also drill-down on every faculty and see the raw numbers.</li><li><strong>Number of Promotions &amp; Retirements</strong> - For every year, how many
promotions &amp; retirements were made. You can drill-down the numbers and see the
names and details of the people. 
</li><li><strong>Grades Graph</strong> - Classic line graphs othat shows grades for a lecturer
or faculty over the timeline. Excellent tool to discover interesting trends such as
professors who write too difficult exams or decrease in the academic level of a faculty. 
</li></ul><p /><p align="left"><strong>To conclude:</strong> I hope you enjoyed the posts about BI HR. I got some
mails regarding these posts and I'm glad to hear that I'm helping people here in Israel
and even over the world. 
</p><img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=a81e3c13-cba1-4a2c-94e6-4adee48daca5" /><xhtml:img xmlns:xhtml="http://www.w3.org/1999/xhtml" src="http://feeds.feedburner.com/~r/MikySchreiberBlog/~4/UV-kZ_a7zbc" height="1" width="1" /></body>
      <title>HR BI - Last Post</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,a81e3c13-cba1-4a2c-94e6-4adee48daca5.aspx</guid>
      <link>http://feedproxy.google.com/~r/MikySchreiberBlog/~3/UV-kZ_a7zbc/PermaLink,guid,a81e3c13-cba1-4a2c-94e6-4adee48daca5.aspx</link>
      <pubDate>Mon, 07 Mar 2011 12:31:10 GMT</pubDate>
      <description>&lt;p align=left&gt;
&lt;em&gt;This post&amp;nbsp;is the&amp;nbsp;fifth (and last, I think)&amp;nbsp;in a &lt;a href="http://www.miky-schreiber.com/Blog/CategoryView,category,BI%2cHR.aspx"&gt;series&lt;/a&gt;&lt;/em&gt;&lt;em&gt;&amp;nbsp;about&amp;nbsp;HR
BI systems. In the &lt;a href="http://www.miky-schreiber.com/Blog/PermaLink,guid,d03645f4-3c2a-4971-83f7-357ffb5757ca.aspx"&gt;previous
post&lt;/a&gt;&lt;/em&gt;&lt;em&gt;&amp;nbsp;I've shown how to make the BI system smarter by cleaning records
and finding Super-Transactions. In this post I'll complete a few last points and introduce
you to some interesting reports I built.&lt;/em&gt;
&lt;/p&gt;
&lt;p align=left&gt;
&lt;u&gt;Finding Super Transactions&lt;/u&gt;
&lt;/p&gt;
&lt;p align=left&gt;
In the last post, I introduced you the concept of Super Transactions (ST). I explained
how to make the ETL process find them in the raw data and how to model them in the
DWH and let the end-users use them in their analysis. However, I didn't explain how
to find them in the initial design phase.
&lt;/p&gt;
&lt;p align=left&gt;
&lt;strong&gt;Objective:&lt;/strong&gt; Find the material processes that happens to many people.&lt;br&gt;
&lt;strong&gt;The main problem:&lt;/strong&gt; There are many people and we don't know where to
start looking! We should think about a common denominator to all the processes in
order to find them.&lt;br&gt;
&lt;strong&gt;The key observation:&lt;/strong&gt; &lt;strong&gt;People are boring&lt;/strong&gt;. Meaning,
in terms of their statuses or various trailers. Think about your personal diary (in
terms of what we talked about in the previous posts), or the "log" of your life. Think
how many times you changed your martial status, salary, workplace status, etc. Yes,
you have no more than a few records. You change something only once in a couple of
years (in the best case).&lt;br&gt;
On the other hand, when an employee undergoes a material change (ST), there are many
changes that happens (almost) at once in different appendixes (or subjects). When
the student changes his position in the university, he changes in short time his position,
salary, courses and more.&lt;br&gt;
Let's show it in a simple graph. The X axis is the time line and the Y axis is the
number of appendixes that changed:&lt;br&gt;
&lt;center&gt;&lt;img src="http://www.miky-schreiber.com/Blog/content/binary/ST_occurence_graph.JPG"&gt;
&lt;/center&gt;
&lt;br&gt;
This shows us that most of the time our life is static and suddenly you find there
are "peaks" that indicates that something "big" happened. This is where we'll look
for the STs.&lt;br&gt;
&lt;strong&gt;How we'll find them?&lt;/strong&gt; I'll take you back to the advanced modeling
of the project. For every record in the diary we added end_effective_date column which
has the date of the next record minus one day. That enable us to write between statements
and make our life more simple. Back to our case: using analytic functions, I added
a field that says how much time passed between the current and the previous record.
As I said, the number are very large in most cases. The cases where there's some records
with small numbers are the cases we are looking for. It says that something important
happended. Using this method I found eight out of nine (!!) which implemented in the
system. 
&lt;p&gt;
&lt;/p&gt;
&lt;p align=left&gt;
&lt;u&gt;Incremental Load&lt;/u&gt;
&lt;/p&gt;
&lt;p align=left&gt;
In an early stage of the project we realized that it will be very "unwise" to load
all the people all the time. As I mentioned, most of the people don't change every
week. The problem is that incremental load of the diary ("log") of people is not simple
at all. The first reason is that we need to update the end_effective_date and ind_current
of the last row, but that's not difficult. The second and main reason is that updating
and adding one record is not enough: in case that the man was in a middle of a ST
we need to re-calculate the ST. This is why we decided to completely delete the records
of the people that changed (in one or more appendixes) and rebuild them from scratch.
This is why we run the ETL only on for the people that changed. This solution is very
simple and easy to implement. All we need to do is to delete these people's records
from all three diaries and then run the ETL only on them. Thus, the ETL process is
very short (half hour total) in every night and only the first run (on all people)
is (very...) long. 
&lt;/p&gt;
&lt;p align=left&gt;
&lt;u&gt;Interesting Reports&lt;/u&gt;
&lt;/p&gt;
&lt;p align=left&gt;
I'll show here some interesting reports I built for the users in order to stimulate
your thinking: 
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Organizational Pyramid&lt;/strong&gt; - the ratio, over the years, between the number
of different populations. For example, the ratio of senior lecturers, lecturers and
students. The report is shown as stacked graph bar. The graph answers the basic question:
Do we have enough students to make them enough teachers in the coming years? Do we
have enough lecturers to make them enough senior lecturers in the coming future? In
this picture I built a very similar graph in Excel that shows how many students of
every year we have. 
&lt;br&gt;
&lt;center&gt;&lt;img src="http://www.miky-schreiber.com/Blog/content/binary/Organizational_Pyramid_Graph.JPG"&gt;
&lt;/center&gt;
&lt;br&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Students Rift&lt;/strong&gt; - For every faculty, the gap between the number of
students that the university wanted to have and the actual number of students. You
can change the time parameter and see the screen as it was on another date. You can
also drill-down on every faculty and see the raw numbers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Number of Promotions &amp;amp; Retirements&lt;/strong&gt; - For every year, how many
promotions &amp;amp; retirements were made. You can drill-down the numbers and see the
names and details of the people. 
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grades Graph&lt;/strong&gt; - Classic line graphs othat shows grades for a lecturer
or faculty over the timeline. Excellent tool to discover interesting trends such as
professors who write too difficult exams or decrease in the academic level of a faculty. 
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;p align=left&gt;
&lt;strong&gt;To conclude:&lt;/strong&gt; I hope you enjoyed the posts about BI HR. I got some
mails regarding these posts and I'm glad to hear that I'm helping people here in Israel
and even over the world. 
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=a81e3c13-cba1-4a2c-94e6-4adee48daca5" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,a81e3c13-cba1-4a2c-94e6-4adee48daca5.aspx</comments>
      <category>BI;BI/DataWarehousing;BI/HR</category>
    <feedburner:origLink>http://www.miky-schreiber.com/Blog/PermaLink,guid,a81e3c13-cba1-4a2c-94e6-4adee48daca5.aspx</feedburner:origLink></item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=d03645f4-3c2a-4971-83f7-357ffb5757ca</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,d03645f4-3c2a-4971-83f7-357ffb5757ca.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,d03645f4-3c2a-4971-83f7-357ffb5757ca.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=d03645f4-3c2a-4971-83f7-357ffb5757ca</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p align="left">
          <em>This post is the fourth in a <a href="http://www.miky-schreiber.com/Blog/CategoryView,category,BI%2cHR.aspx">series</a></em>
          <em> about HR
BI systems. In the <a href="http://www.miky-schreiber.com/Blog/PermaLink,guid,2beab113-6d5f-45fb-9a20-83de2f978793.aspx">previous
post</a></em>
          <em> I've shown to advanced model of the solution, the ERD of the
datawarehouse and I shown how to answer most of the business questions using that
model. In this post I'll show a more complex issue which is smart records filtering
and Super Transactions. I will introduce you to this subject and show you how I implemented
it.</em>
        </p>
        <p align="left">
          <strong>To better understand this post, I highly recommend you to read the previous
post first</strong>. Let's summarize the last post: We reached to the conclusion that
in order to answer most of the business question regarding HR, we need to build a
very wide Fact table (meaning that it will have many fields) that contains all the
relevant fields from the appendixes. I introduced you to the problem of irrelevance
and I shown the way to solve it using the N/A value instead of null when the appendix
hasn't started yet.
</p>
        <p align="left">
Actually, after doing what I described in the last post we finished 90% of the project.
Now, only the other 90% remains... ;-) The thing is that another main requirement
remains (although basically it was our idea). The reason for which I didn't mention
it in the previous posts is that I didn't want to bring in this complex issue in the
first steps. I wanted you, my reader, to stay focused on the main structure of the
solution. The requirement is to clean and improve (from the data point-of-view) the
personal diary records and identify personal processes. I will explain later what
this means. Note that at this point the example of the university is not so relevant
so I apologize for the weird examples.
</p>
        <p align="left">
          <u>Cleaning Records</u>
        </p>
        <p align="left">
For example, let's say that we have one lecturer (or more) that has swift changes
in his salary: During the month his salary changes but in the end of the month it
always goes back to the original salary, meaning that he always gets the same salary
in his bank account. I know it sounds weird, but I've seen a lot more confusing cases
in financial systems. If we'll bring the lecturer's data to the DWH as it is, we'll
have a number of serious problems:
</p>
        <ol>
          <li>
            <div align="left">In case that the user filter the report on the date where the lecturer
has the temporary salary, the user will get a wrong result because he won't see the
"real" salary ("real" is a problematic word in this case. We'll discuss it later). 
</div>
          </li>
          <li>
            <div align="left">The rapid changes in the lecturer's salary appendix will cause great
growth in its diary. This is problematic because of disk space, but the main problem
is that the diary that will be shown to the end-user will be "junky" with a lot of
unnecessary records which will prevent the end-user from seeing the true changes in
the diary. In other words, we want a cleaner personal diary. 
</div>
          </li>
          <li>
            <div align="left">On the other hand, there are other users (especially business analysts)
that want to analyze exactly these cases so we can't just delete the records... 
</div>
          </li>
        </ol>
        <p align="left">
          <u>Super Transactions</u>
        </p>
        <p align="left">
Super Transaction, or ST, is a personal process or change in the man's life which
reflects some appendixes and can happen on more than one day. For example, let's say
that we have a tutor that got a promotion and becomes a lecturer. By doing that, he
changes his <strong>Position</strong> (tutor -&gt; lecturer), <strong>Courses </strong>(now
he teaches other courses) and the most important of all - the <strong>Salary</strong>.
These changes won't have to happen exactly on the same day. Its position can be changed
when he got the announcement but the salary will change only in the end of the month.
From the technical point of view, we can say that the man only had some changes in
some of his appendixes. But from the business point of view, he has a serious career
change aka he been over a process, or a ST.
</p>
        <p align="left">
The requirement is to identify and flood these STs and even let the users investigate
and filter by them (You can read more about it in <a href="http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247">Kimball's
book</a>).
</p>
        <p align="left">
So, how we'll make it real? In particular, how we'll let the users see both the original
and the cleaner data? First things first, so let's talk about the big picture - the
model of the solution.
</p>
        <p align="left">
          <u>General Model</u>
        </p>
        <p align="left">
To fulfill these requirements we'll define three work modes. The user can switch between
them at any time.
</p>
        <ol>
          <li>
Full Mode - In this mode, the user see exactly the data of the source systems. The
records filtering and the ST detection won't take part here. 
</li>
          <li>
Filtered Mode - In this mode the user will have cleaner records and he will see the
ST. 
</li>
          <li>
Integrated Mode - The user will see both the original data and the filtered records,
so he'll can compare them and have a better understanding of how the system works.</li>
        </ol>
        <p>
Let's wear the hat of the DWH analyst. If we'll want to add a filter to the table
according to the work mode that user has chosen, we'll have a major problem. Every
record can be belonging to more than one mode. A record that won't be filtered belongs
to every mode. A filtered record only belongs to the filtered and the integrated
mode. ST record (we'll talk about it later) also belongs to the filtered and the integrated
mode. The naive solution says that we'll add three indicators (boolean columns) that
will say for which work mode this record belongs and we'll filter the table using
these columns according to the work mode the user has chosen. The problem is it will
cause great performance problems so we have to create indexes for these columns. But,
creating indexes for boolean records is not very affective because the variance is
extremely small.
</p>
        <p>
The solution we found is to split the diary into three tables, one table for every
work mode. When the user changes the work mode, the connection will change and the
system will take the data from another table. The great benefit in this solution is
the performance gain. The query is mapped on the appropriate table and we don't
need to filter according to the work mode. In order to implement this, you need a
reporting tool that can dynamically change the connection, but I believe that any
reporting tool can do it after some workarounds. In Reporting Services, you can do
it by defining a Stored Procedure as the Data Source. The SP will get the required
schema/table as a parameter and use it in the query (you can see code <a href="http://stackoverflow.com/questions/848656/sql-server-2005-report-builder-how-to-dynamically-change-data-source">here</a>).
In OBIEE, you can do it by using value-based fragmentation (instructions <a href="http://gerardnico.com/wiki/dat/obiee/fragmentation_content">here</a>).
The disadvantage is that you take unnecessary disk space, but we prefer performance
over space.
</p>
        <p>
The ETL process now has three targets, one for each work mode. The filters inside
the ETL process define in which table, or work mode every record will be loaded.
</p>
        <p>
          <u>Cleaning Records</u>
        </p>
        <p>
Let's start from the most important thing: It's very important to make the customers
understand that our BI system is not a artificial intelligence system and it can't
guess which records need to be filtered. We need to make clear that the system will
work by <strong>predefined rules</strong>. This is why the preliminary step is the
most difficult: We need to define the rules, test them, improve them, test again and
so on. A good example is that: If a person changes its status (in almost any appendix)
and then changes it back or change it into another status, this is a "dirty" record
which needed to be cleaned. This simple rule can filter a lot of noise in the data.
It happens a lot of times that a user of the source system changes something, see
that he has mistaken and turning things back. The problem is that some of the system
only add rows and don't update them is such cases. This is why it makes sense to filter
these records.
</p>
        <p>
After we make sure that we and the customers understand each other, we can start with
the implementation. This can be very easy or very difficult, depends on the ETL tool
you're using and especially on the complexity of the rules you defined. While the
ETL tools usually look on one record at a time, we need to compare the record with
its predecessor, the record after it and sometimes even more. In this step, I recommend
you to stop, take a paper &amp; pen and start to think how to implement it before
you're opening your ETL tool. The most important thing is to check that the results
satisfy your expectations. Maybe you'll have performance problems here, but you can
come back and handle them in the final steps of the project (personally, I prefer
dealing with data correctness in the first steps and performance in the last
steps). As I said, after you decide (in the ETL process) whether it's a regular,
filtered or ST record, you'll route it to the appropriate target(s).
</p>
        <p>
          <u>Super Transactions (ST)</u>
        </p>
        <p>
This is the most difficult part of the project. Take a deep breath.<br />
I'll remind you what it is from the business view and this time also from the technical
point of view. ST is a personal change, or process that happened to a man. This change
will often be reflected in more than one appendix and <strong>can</strong> happen
on more than one day. Remind yourself with the tutor that became lecturer. The salary
change happened weeks after the position change. The requirement is to flood that
a big change happened here and not only some changes in some appendixes.<br />
Let's go on with the technical stuff. There are some basic questions regarding ST:
</p>
        <ol>
          <li>
What will happen with the records that are contained inside the ST? contained record
is a record which is a part of the change (salary change or course change in our example).
If all the changes happened on the same day, will we mark the record as a ST record
or maybe we'll add a new ST record? If the changes happened on more than one day,
will we add a new ST record or maybe we'll mark one of the records as the ST record?
Which one? What will happen with the rest of the contained records? 
</li>
          <li>
What will happen with records that occurred on the same time as the ST but don't belong
to it? For example, let's say that when the tutor became lecturer he also became
a father (Mazal Tov!). The diary will have a new record for this change, but it doesn't
belong to the ST. What we'll do with this record? 
</li>
          <li>
Will we run the ST detection before or after the record cleaning? It will change the
results dramatically.</li>
        </ol>
        <p>
Enough with questions. let's start with the answers. We thought and got to the conclusion
that there is a strong connection between the work modes and the ST.<br />
We defined the full mode as the mode where the data is exactly the same as the source
data. This is why we won't clean records here and we won't mark any ST records. We
want to keep the balance with the source systems. Our greatest threat is that the
users will say: "Hey, this is not what I see in my (source) system. This whole BI
is crap!!!"
</p>
        <p>
On the other hand, the filtered mode is where our system becomes smarter. Here we
can add the record filtering and the ST detection. This is why here we'll <strong>mark</strong> ST
records. Why mark and not add a new record? Because this will break our most basic
rule - one man cannot have two records with the same date. This is not just a technical
problem (we can add Is_ST column and add it to the table's primary key). This is a
conceptual principal. We added a new column named ST_Name.(If it's null then the record
is not a ST record). 
</p>
        <p>
Which record will we mark as the ST record? Very simple: As part of the design stage,
we'll decide (along with the customers) for each ST which is the leading record and
we'll mark it. Often, this is the record that will help us to detect the ST. Going
back to our example, the ST record is the one where the tutor changed its position.
In this record, we'll make ST_Name=Position_Change. Don't forget to mark the contained
records as filtered and in the what_changed columns in the ST record, add the other
contained appendixes. Back to the example, Write in the what_changed column that also
the courses and salary changed as well. We'll also update these appendixes' column
in the ST record with the new values.
</p>
        <p>
What about not-contained records (such as the family status change)? Our solution
completely covers that. The ST detection only handles with the appendixes that we
defined as contained in the ST. Even if the change happened in the same date as the
position change, it won't affect anything at all.<br />
Regarding the order of the actions (ST detection or records filtering) - We thought
about it and came to the conclusion that it is better to first clean the records because
the ST detection process needs to see the best and most clear picture as it can for
a better ST detection. We've been glad to see it's right in the tests we did.
</p>
        <p>
          <u>Summary</u>
        </p>
        <p>
The record cleaning and the ST detection that the BI system does is a quantum leap.
The system does not only reflect the source data, but it also add an "artificial intelligence"
layer which helps the user navigate the mixture of data. As a matter of fact, this
makes the system <strong>turn data into knowledge</strong>. On the other hand, it
doesn't forces itself on the end-user and it enables him to see the original source
data. It's important to say that the design is a dynamic and "breathing" process that
doesn't end on Day 1 of the system. As time goes by, things and business rules are
getting clearer and better so we come back and sharp the rules.
</p>
        <p>
In the next post (the last of the series in my opinion) I'll discuss even more interesting
issues such as differential load, how to find ST in the design phase and some of the
most interesting reports you can do with our HR DWH. Stay tuned.
</p>
        <img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=d03645f4-3c2a-4971-83f7-357ffb5757ca" />
      <xhtml:img xmlns:xhtml="http://www.w3.org/1999/xhtml" src="http://feeds.feedburner.com/~r/MikySchreiberBlog/~4/tiw8mN_u11g" height="1" width="1" /></body>
      <title>HR BI - Turning Data Into Knowledge</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,d03645f4-3c2a-4971-83f7-357ffb5757ca.aspx</guid>
      <link>http://feedproxy.google.com/~r/MikySchreiberBlog/~3/tiw8mN_u11g/PermaLink,guid,d03645f4-3c2a-4971-83f7-357ffb5757ca.aspx</link>
      <pubDate>Wed, 05 Jan 2011 09:29:09 GMT</pubDate>
      <description>&lt;p align=left&gt;
&lt;em&gt;This post&amp;nbsp;is the&amp;nbsp;fourth in a &lt;a href="http://www.miky-schreiber.com/Blog/CategoryView,category,BI%2cHR.aspx"&gt;series&lt;/a&gt;&lt;/em&gt;&lt;em&gt;&amp;nbsp;about&amp;nbsp;HR
BI systems. In the &lt;a href="http://www.miky-schreiber.com/Blog/PermaLink,guid,2beab113-6d5f-45fb-9a20-83de2f978793.aspx"&gt;previous
post&lt;/a&gt;&lt;/em&gt;&lt;em&gt;&amp;nbsp;I've shown to advanced model of the solution, the ERD of the
datawarehouse and I shown how to answer most of the business questions using that
model. In this post I'll show a more complex issue which is smart records filtering
and Super Transactions. I will introduce you to this subject and show you how I implemented
it.&lt;/em&gt;
&lt;/p&gt;
&lt;p align=left&gt;
&lt;strong&gt;To better understand this post, I highly recommend you to read the previous
post first&lt;/strong&gt;. Let's summarize the last post: We reached to the conclusion that
in order to answer most of the business question regarding HR, we need to build a
very wide Fact table (meaning that it will have many fields) that contains all the
relevant fields from the appendixes. I introduced you to the problem of irrelevance
and I shown the way to solve it using the N/A value instead of null when the appendix
hasn't started yet.
&lt;/p&gt;
&lt;p align=left&gt;
Actually, after doing what I described in the last post we finished 90% of the project.
Now, only the other 90% remains... ;-) The thing is that another main requirement
remains (although basically it was our idea). The reason for which I didn't mention
it in the previous posts is that I didn't want to bring in this complex issue in the
first steps. I wanted you, my reader, to stay focused on the main structure of the
solution. The requirement is to clean and improve (from the data point-of-view) the
personal diary records and identify personal processes. I will explain later what
this means. Note that at this point the example of the university is not so relevant
so I apologize for the weird examples.
&lt;/p&gt;
&lt;p align=left&gt;
&lt;u&gt;Cleaning Records&lt;/u&gt;
&lt;/p&gt;
&lt;p align=left&gt;
For example, let's say that we have one lecturer (or more) that has swift changes
in his salary: During the month his salary changes but in the end of the month it
always goes back to the original salary, meaning that he always gets the same salary
in his bank account. I know it sounds weird, but I've seen a lot more confusing cases
in financial systems. If we'll bring the lecturer's data to the DWH as it is, we'll
have&amp;nbsp;a number of&amp;nbsp;serious problems:
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;div align=left&gt;In case that the user filter the report on the date where the lecturer
has the temporary salary, the user will get a wrong result because he won't see the
"real" salary ("real" is a problematic word in this case. We'll discuss it later). 
&lt;/div&gt;
&lt;li&gt;
&lt;div align=left&gt;The rapid changes in the lecturer's salary appendix will cause great
growth in its diary. This is problematic because of disk space, but the main problem
is that the diary that will be shown to the end-user will be "junky" with a lot of
unnecessary records which will prevent the end-user from seeing the true changes in
the diary. In other words, we want a cleaner personal diary. 
&lt;/div&gt;
&lt;li&gt;
&lt;div align=left&gt;On the other hand, there are other users (especially business analysts)
that want to analyze exactly these cases so we can't just delete the records... 
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p align=left&gt;
&lt;u&gt;Super Transactions&lt;/u&gt;
&lt;/p&gt;
&lt;p align=left&gt;
Super Transaction, or ST, is a personal process or change in the man's life which
reflects some appendixes and can happen on more than one day. For example, let's say
that we have a tutor that got a promotion and becomes a lecturer. By doing that, he
changes his &lt;strong&gt;Position&lt;/strong&gt; (tutor -&amp;gt; lecturer), &lt;strong&gt;Courses &lt;/strong&gt;(now
he teaches other courses) and the most important of all - the &lt;strong&gt;Salary&lt;/strong&gt;.
These changes won't have to happen exactly on the same day. Its position can be changed
when he got the announcement but the salary will change only in the end of the month.
From the technical point of view, we can say that the man only had some changes in
some of his appendixes. But from the business point of view, he has a serious career
change aka he been over a process, or a ST.
&lt;/p&gt;
&lt;p align=left&gt;
The requirement is to identify and flood these STs and even let the users investigate
and filter by them (You can read more about it in &lt;a href="http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247"&gt;Kimball's
book&lt;/a&gt;).
&lt;/p&gt;
&lt;p align=left&gt;
So, how we'll make it real? In particular, how we'll let the users see both the original
and the cleaner data? First things first, so let's talk about the big picture - the
model of the solution.
&lt;/p&gt;
&lt;p align=left&gt;
&lt;u&gt;General Model&lt;/u&gt;
&lt;/p&gt;
&lt;p align=left&gt;
To fulfill these requirements we'll define three work modes. The user can switch between
them at any time.
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
Full Mode - In this mode, the user see exactly the data of the source systems. The
records filtering and the ST detection won't take part here. 
&lt;li&gt;
Filtered Mode - In this mode the user will have cleaner records and he will see the
ST. 
&lt;li&gt;
Integrated Mode - The user will see both the original data and the filtered records,
so he'll can compare them and have a better understanding of how the system works.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
Let's wear the hat of the&amp;nbsp;DWH analyst. If we'll want to add a filter to the table
according to the work mode that user has chosen, we'll have a major problem. Every
record can be belonging to more than one mode. A record that won't be filtered belongs
to every mode. A&amp;nbsp;filtered record only belongs to the filtered and the integrated
mode. ST record (we'll talk about it later) also belongs to the filtered and the integrated
mode. The naive solution says that we'll add three indicators (boolean columns) that
will say for which work mode this record belongs and we'll filter the table using
these columns according to the work mode the user has chosen. The problem is it will
cause great performance problems so we have to create indexes for these columns. But,
creating indexes for boolean records is not very affective because the variance is
extremely small.
&lt;/p&gt;
&lt;p&gt;
The solution we found is to split the diary into three tables, one table for every
work mode. When the user changes the work mode, the connection will change and the
system will take the data from another table. The great benefit in this solution is
the performance gain. The query is mapped on the&amp;nbsp;appropriate table and we don't
need to filter according to the work mode. In order to implement this, you need a
reporting tool that can dynamically change the connection, but I believe that any
reporting tool can do it after some workarounds. In Reporting Services, you can do
it by defining a Stored Procedure as the Data Source. The SP will get the required
schema/table as a parameter and use it in the query (you can see code &lt;a href="http://stackoverflow.com/questions/848656/sql-server-2005-report-builder-how-to-dynamically-change-data-source"&gt;here&lt;/a&gt;).
In OBIEE, you can do it by using value-based fragmentation (instructions &lt;a href="http://gerardnico.com/wiki/dat/obiee/fragmentation_content"&gt;here&lt;/a&gt;).
The disadvantage is that you take unnecessary disk space, but we prefer performance
over space.
&lt;/p&gt;
&lt;p&gt;
The ETL process now has three targets, one for each work mode. The filters inside
the ETL process define in which table, or work mode&amp;nbsp;every record will be loaded.
&lt;/p&gt;
&lt;p&gt;
&lt;u&gt;Cleaning Records&lt;/u&gt;
&lt;/p&gt;
&lt;p&gt;
Let's start from the most important thing: It's very important to make the customers
understand that our BI system is not a artificial intelligence system and it can't
guess which records need to be filtered. We need to make clear that the system will
work by &lt;strong&gt;predefined rules&lt;/strong&gt;. This is why the preliminary step is the
most difficult: We need to define the rules, test them, improve them, test again and
so on. A good example is that: If a person changes its status (in almost any appendix)
and then changes it back or change it into another status, this is a "dirty" record
which needed to be cleaned. This simple rule can filter a lot of noise in the data.
It happens a lot of times that a user of the source system changes something, see
that he has mistaken and turning things back. The problem is that some of the system
only add rows and don't update them is such cases. This is why it makes sense to filter
these records.
&lt;/p&gt;
&lt;p&gt;
After we make sure that we and the customers understand each other, we can start with
the implementation. This can be very easy or very difficult, depends on the ETL tool
you're using and especially on the complexity of the rules you defined. While the
ETL tools usually look on one record at a time, we need to compare the record with
its predecessor, the record after it and sometimes even more. In this step, I recommend
you to stop, take a paper &amp;amp; pen and start to think how to implement it before
you're opening your ETL tool. The most important thing is to check that the results
satisfy your expectations. Maybe you'll have performance problems here, but you can
come back and handle them in the final steps of the project (personally, I prefer
dealing with data&amp;nbsp;correctness in the first steps and performance in the last
steps). As I said, after you decide (in the ETL process) whether it's&amp;nbsp;a regular,
filtered or ST record, you'll route it to the appropriate target(s).
&lt;/p&gt;
&lt;p&gt;
&lt;u&gt;Super Transactions (ST)&lt;/u&gt;
&lt;/p&gt;
&lt;p&gt;
This is the most difficult part of the project. Take a deep breath.&lt;br&gt;
I'll remind you what it is from the business view and this time also from the technical
point of view. ST is a personal change, or process that happened to a man. This change
will often be reflected in more than one appendix and &lt;strong&gt;can&lt;/strong&gt; happen
on more than one day. Remind yourself with the tutor that became lecturer. The salary
change happened weeks after the position change. The requirement is to flood that
a big change happened here and not only some changes in some appendixes.&lt;br&gt;
Let's go on with the technical stuff. There are some basic questions regarding ST:
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
What will happen with the records that are contained inside the ST? contained record
is a record which is a part of the change (salary change or course change in our example).
If all the changes happened on the same day, will we mark the record as a ST record
or maybe we'll add a new ST record? If the changes happened on more than one day,
will we add a new ST record or maybe we'll mark one of the records as the ST record?
Which one? What will happen with the rest of the contained records? 
&lt;li&gt;
What will happen with records that occurred on the same time as the ST but don't belong
to it? For example, let's say that when the tutor became lecturer he also&amp;nbsp;became
a father (Mazal Tov!). The diary will have a new record for this change, but it doesn't
belong to the ST. What we'll do with this record? 
&lt;li&gt;
Will we run the ST detection before or after the record cleaning? It will change the
results dramatically.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
Enough with questions. let's start with the answers. We thought and got to the conclusion
that there is a strong connection between the work modes and the ST.&lt;br&gt;
We defined the full mode as the mode where the data is exactly the same as the source
data. This is why we won't clean records here and we won't mark any ST records. We
want to keep the balance with the source systems. Our greatest threat is that the
users will say: "Hey, this is not what I see in my (source) system. This whole BI
is crap!!!"
&lt;/p&gt;
&lt;p&gt;
On the other hand, the filtered mode is where our system becomes smarter. Here we
can add the record filtering and the ST detection. This is why here we'll &lt;strong&gt;mark&lt;/strong&gt; ST
records. Why mark and not add a new record? Because this will break our most basic
rule - one man cannot have two records with the same date. This is not just a technical
problem (we can add Is_ST column and add it to the table's primary key). This is a
conceptual principal. We added a new column named ST_Name.(If it's null then the record
is not a ST record). 
&lt;/p&gt;
&lt;p&gt;
Which record will we mark as the ST record? Very simple: As part of the design stage,
we'll decide (along with the customers) for each ST which is the leading record and
we'll mark it. Often, this is the record that will help us to detect the ST. Going
back to our example, the ST record is the one where the tutor changed its position.
In this record, we'll make ST_Name=Position_Change. Don't forget to mark the contained
records as filtered and in the what_changed columns in the ST record, add the other
contained appendixes. Back to the example, Write in the what_changed column that also
the courses and salary changed as well. We'll also update these appendixes' column
in the ST record with the new values.
&lt;/p&gt;
&lt;p&gt;
What about not-contained records (such as the family status change)? Our solution
completely covers that. The ST detection only handles with the appendixes that we
defined as contained in the ST. Even if the change happened in the same date as the
position change, it won't affect anything at all.&lt;br&gt;
Regarding the order of the actions (ST detection or records filtering) - We thought
about it and came to the conclusion that it is better to first clean the records because
the ST detection process needs to see the best and most clear picture as it can for
a better ST detection. We've been glad to see it's right in the tests we did.
&lt;/p&gt;
&lt;p&gt;
&lt;u&gt;Summary&lt;/u&gt;
&lt;/p&gt;
&lt;p&gt;
The record cleaning and the ST detection that the BI system does is a quantum leap.
The system does not only reflect the source data, but it also add an "artificial intelligence"
layer which helps the user navigate the mixture of data. As a matter of fact, this
makes the system &lt;strong&gt;turn data into knowledge&lt;/strong&gt;. On the other hand, it
doesn't forces itself on the end-user and it enables him to see the original source
data. It's important to say that the design is a dynamic and "breathing" process that
doesn't end on Day 1 of the system. As time goes by, things and business rules are
getting clearer and better so we come back and sharp the rules.
&lt;/p&gt;
&lt;p&gt;
In the next post (the last of the series in my opinion) I'll discuss even more interesting
issues such as differential load, how to find ST in the design phase and some of the
most interesting reports you can do with our HR DWH. Stay tuned.
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=d03645f4-3c2a-4971-83f7-357ffb5757ca" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,d03645f4-3c2a-4971-83f7-357ffb5757ca.aspx</comments>
      <category>BI;BI/DataWarehousing;BI/HR</category>
    <feedburner:origLink>http://www.miky-schreiber.com/Blog/PermaLink,guid,d03645f4-3c2a-4971-83f7-357ffb5757ca.aspx</feedburner:origLink></item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=2beab113-6d5f-45fb-9a20-83de2f978793</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,2beab113-6d5f-45fb-9a20-83de2f978793.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,2beab113-6d5f-45fb-9a20-83de2f978793.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=2beab113-6d5f-45fb-9a20-83de2f978793</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
          <em>This post is the third in a </em>
          <a href="http://www.miky-schreiber.com/Blog/CategoryView,category,BI%2cHR.aspx">
            <em>
              <strong>
                <font color="#53afe0">series</font>
              </strong>
            </em>
          </a>
          <em> about HR
BI systems. In the </em>
          <a href="http://www.miky-schreiber.com/Blog/PermaLink,guid,7821afca-321f-44eb-a960-5b00addc2440.aspx">
            <em>
              <strong>
                <font color="#53afe0">previous
post</font>
              </strong>
            </em>
          </a>
          <em> I've showed the basic design problems. In this post
I'll show how we solved them. I recommend reading the previous post before reading
this one.</em>
        </p>
        <p>
I finished the previous post with a lot of questions and zero answers. I have
shown that the classic star-schema won't work here because it can't answer the most
basic questions in the HR business world. At this point we decided to stop and make
the most reasonable move - use the wisdom of others which dealt with similar problems.
We opened the <a href="http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247">classic
book of Kimball</a>, who's considered to be the father of DataWarehousing. I
sat for a while and read the HR chapter. Between the lines, I recognized that Kimball
had the same problems we did, which brought him to model the DWH as follows:
</p>
        <p>
Instead of classic star-schema, we'll build here a very wide table which we'll call
the Diary. This Diary will contain all the appendixes. For example, in our case (look
at the previous post) the fields are: ID, start_effective_date, end_effective_date,
first name, last name, more static fields (which don't change with time), department,
more fields from the department appendix, position, more fields from the position
appendix, course, more fields from the course appendix, payroll, more fields from
the payroll appendix, IsCurrentRow. The basic logic of this idea is that each row
was created because there's a record in one of this person's appendixes which was
created in that date (or maybe in more than one appendix). We take the data from the
record in the corresponding appendix and bring the relevant data for that date from
the other appendixes. For a better understanding, I'll show a little example. Here
are the four appendixes: (Note that for the sake of the example, a student/lecturer
can have only one course at a time)
</p>
        <table style="BORDER-TOP-STYLE: solid; BORDER-RIGHT-STYLE: solid; BORDER-LEFT-STYLE: solid; BORDER-BOTTOM-STYLE: solid">
          <tbody>
            <tr>
              <td colspan="3">
                <u>Department Appendix</u>
              </td>
            </tr>
            <tr>
              <td>
012</td>
              <td>
01/01/2000</td>
              <td>
History</td>
            </tr>
            <tr>
              <td>
012</td>
              <td>
01/06/2005</td>
              <td>
Computers</td>
            </tr>
          </tbody>
        </table>
        <table style="BORDER-TOP-STYLE: solid; BORDER-RIGHT-STYLE: solid; BORDER-LEFT-STYLE: solid; BORDER-BOTTOM-STYLE: solid">
          <tbody>
            <tr>
              <td colspan="3">
                <u>Position Appendix</u>
              </td>
            </tr>
            <tr>
              <td>
012</td>
              <td>
01/01/2000</td>
              <td>
Student</td>
            </tr>
            <tr>
              <td>
012</td>
              <td>
01/01/2006</td>
              <td>
Lecturer</td>
            </tr>
          </tbody>
        </table>
        <table style="BORDER-TOP-STYLE: solid; BORDER-RIGHT-STYLE: solid; BORDER-LEFT-STYLE: solid; BORDER-BOTTOM-STYLE: solid">
          <tbody>
            <tr>
              <td colspan="3">
                <u>Courses Appendix</u>
              </td>
            </tr>
            <tr>
              <td>
012</td>
              <td>
01/01/2000</td>
              <td>
Middle Ages</td>
            </tr>
            <tr>
              <td>
012</td>
              <td>
01/06/2005</td>
              <td>
Computer Basics</td>
            </tr>
          </tbody>
        </table>
        <table style="BORDER-TOP-STYLE: solid; BORDER-RIGHT-STYLE: solid; BORDER-LEFT-STYLE: solid; BORDER-BOTTOM-STYLE: solid">
          <tbody>
            <tr>
              <td colspan="3">
                <u>Payroll Appendix</u>
              </td>
            </tr>
            <tr>
              <td>
012</td>
              <td>
01/02/2006</td>
              <td>
Basic</td>
            </tr>
          </tbody>
        </table>
        <p>
After making the "merge" of the appendixes this is how the Diary will look like:
</p>
        <table style="BORDER-TOP-STYLE: solid; BORDER-RIGHT-STYLE: solid; BORDER-LEFT-STYLE: solid; BORDER-BOTTOM-STYLE: solid">
          <tbody>
            <tr>
              <th>
ID</th>
              <th>
Start Date</th>
              <th>
What Changed</th>
              <th>
Name</th>
              <th>
Payroll</th>
              <th>
Department</th>
              <th>
Position</th>
              <th>
Courses</th>
              <th>
End Date</th>
            </tr>
            <tr>
              <td>
012</td>
              <td>
01/01/2000</td>
              <td>
Department<br />
Position<br />
Courses</td>
              <td>
Miky Schreiber</td>
              <td>
              </td>
              <td>
History</td>
              <td>
Student</td>
              <td>
Middle Ages</td>
              <td>
31/05/2005</td>
            </tr>
            <tr>
              <td>
012</td>
              <td>
01/06/2005</td>
              <td>
Department<br />
Courses</td>
              <td>
Miky Schreiber</td>
              <td>
              </td>
              <td>
Computers</td>
              <td>
Student</td>
              <td>
Computer Basics</td>
              <td>
31/12/2005</td>
            </tr>
            <tr>
              <td>
012</td>
              <td>
01/01/2006</td>
              <td>
Position</td>
              <td>
Miky Schreiber</td>
              <td>
              </td>
              <td>
Computers</td>
              <td>
Lecturer</td>
              <td>
Computer Basics</td>
              <td>
31/01/2006</td>
            </tr>
            <tr>
              <td>
012</td>
              <td>
01/02/2006</td>
              <td>
Payroll</td>
              <td>
Miky Schreiber</td>
              <td>
Basic</td>
              <td>
Computers</td>
              <td>
Lecturer</td>
              <td>
Computer Basics</td>
              <td>
31/12/2999</td>
            </tr>
          </tbody>
        </table>
        <p>
          <u>So what have we done here?</u>
        </p>
        <p>
We built a diary that combines all the other appendixes. For each record, or point
in time, you can see what was the situation of the person from every appendix's perspective.
That is, if on date X only the department appendix was changed, you can still see
what was the situation in the position appendix. Hence, you can ask business questions
that combines several appendixes even when these questions are aggregative. For example:
how many students which have the lecturer position get paid basic salary? I invite
the reader to read the business questions in the previous post and see how you can
answer them using this model. <strong>You must understand that</strong> with this
model you must always fetch <strong>only one row per person</strong> or otherwise
you'll have wrong results. In order to do that you need to filter the table on a particular
day (using expression like: where myDate between StartDate and EndDate) or use IsCurrentRow=1
(you can calculate this column in the ETL).
</p>
        <p>
          <u>How did we do that?</u>
        </p>
        <p>
I won't cover here the entire ETL design, but I'll explain the key principles: First,
we build the avenue. The avenue is a small table with only two columns - the ID of
the person and the StartDate. We make distinct query on all the appendixes and get,
for every person, all the start date from all the appendixes and this is how we build
the avenue. Once we have the avenue, we join it with every appendix, and this is how
we bring the values from the appendixes themselves. Now, for each start date we have
the data of the appendixes where this start date came from. In order to "complete"
the data from the other appendixes, we "spread" the missing value from the previous
row. For example, on 1/6/2005 the position status hasn't changed so after the join
the Position field will be null, so we take the value from the previous row.
</p>
        <p>
          <u>The problem of Irrelevance</u>
        </p>
        <p>
After making the above, we had a problem that floated very quickly. As always, each
column has a foreign key to its dimension table, so the values cannot be null. The
problem is that sometimes the values must be null - it happens in the first rows of
each person, when not all appendixes has their values yet. For example, look at the
payroll column in the "diary" table above. It gets a value only in the last row. We
can't have null in this field because we want to have foreign key but we don't have
any value yet. What can we do? The solution we found is a "N/A" value which we added
in such situations. Note that you need to add a new row with the "N/A" key in every
dimension table.
</p>
        <p>
Is that all? of course not! Using this "diary" table we can answer 90% of the business
question we faced. This table is basically the whole project. But, there are more
challenges we have to solve. I'll leave it for the future posts, where I'll show how
we can make the HR BI system smarter.
</p>
        <img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=2beab113-6d5f-45fb-9a20-83de2f978793" />
      <xhtml:img xmlns:xhtml="http://www.w3.org/1999/xhtml" src="http://feeds.feedburner.com/~r/MikySchreiberBlog/~4/w716PVJBii4" height="1" width="1" /></body>
      <title>HR BI - Advanced Design</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,2beab113-6d5f-45fb-9a20-83de2f978793.aspx</guid>
      <link>http://feedproxy.google.com/~r/MikySchreiberBlog/~3/w716PVJBii4/PermaLink,guid,2beab113-6d5f-45fb-9a20-83de2f978793.aspx</link>
      <pubDate>Sun, 14 Nov 2010 13:11:48 GMT</pubDate>
      <description>&lt;p&gt;
&lt;em&gt;This post&amp;nbsp;is the&amp;nbsp;third in a &lt;/em&gt;&lt;a href="http://www.miky-schreiber.com/Blog/CategoryView,category,BI%2cHR.aspx"&gt;&lt;em&gt;&lt;strong&gt;&lt;font color=#53afe0&gt;series&lt;/font&gt;&lt;/strong&gt;&lt;/em&gt;&lt;/a&gt;&lt;em&gt; about&amp;nbsp;HR
BI systems. In the &lt;/em&gt;&lt;a href="http://www.miky-schreiber.com/Blog/PermaLink,guid,7821afca-321f-44eb-a960-5b00addc2440.aspx"&gt;&lt;em&gt;&lt;strong&gt;&lt;font color=#53afe0&gt;previous
post&lt;/font&gt;&lt;/strong&gt;&lt;/em&gt;&lt;/a&gt;&lt;em&gt; I've showed the basic design problems. In this post
I'll show how we solved them.&amp;nbsp;I recommend reading the previous post before reading
this one.&lt;/em&gt;
&lt;/p&gt;
&lt;p&gt;
I&amp;nbsp;finished the previous post with a lot of questions and zero answers. I have
shown that the classic star-schema won't work here because it can't answer the most
basic questions in the HR business world. At this point we decided to stop and make
the most reasonable move - use the wisdom of others which dealt with similar problems.
We opened the &lt;a href="http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247"&gt;classic
book of Kimball&lt;/a&gt;, who's considered to be the father of DataWarehousing.&amp;nbsp;I
sat for a while and read the HR chapter. Between the lines, I recognized that Kimball
had the same problems we did, which brought him to model the DWH as follows:
&lt;/p&gt;
&lt;p&gt;
Instead of classic star-schema, we'll build here a very wide table which we'll call
the Diary. This Diary will contain all the appendixes. For example, in our case (look
at the previous post) the fields are: ID, start_effective_date, end_effective_date,
first name, last name, more static fields (which don't change with time), department,
more fields from the department appendix, position, more fields from the position
appendix, course, more fields from the course appendix, payroll, more fields from
the payroll appendix, IsCurrentRow. The basic logic of this idea is that each row
was created because there's a record in one of this person's appendixes which was
created in that date (or maybe in more than one appendix). We take the data from the
record in the corresponding appendix and bring the relevant data for that date from
the other appendixes. For a better understanding, I'll show a little example. Here
are&amp;nbsp;the four appendixes: (Note that for the sake of the example, a student/lecturer
can have only one course at a time)
&lt;/p&gt;
&lt;table style="BORDER-TOP-STYLE: solid; BORDER-RIGHT-STYLE: solid; BORDER-LEFT-STYLE: solid; BORDER-BOTTOM-STYLE: solid"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td colspan=3&gt;
&lt;u&gt;Department Appendix&lt;/u&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
012&lt;/td&gt;
&lt;td&gt;
01/01/2000&lt;/td&gt;
&lt;td&gt;
History&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
012&lt;/td&gt;
&lt;td&gt;
01/06/2005&lt;/td&gt;
&lt;td&gt;
Computers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table style="BORDER-TOP-STYLE: solid; BORDER-RIGHT-STYLE: solid; BORDER-LEFT-STYLE: solid; BORDER-BOTTOM-STYLE: solid"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td colspan=3&gt;
&lt;u&gt;Position Appendix&lt;/u&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
012&lt;/td&gt;
&lt;td&gt;
01/01/2000&lt;/td&gt;
&lt;td&gt;
Student&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
012&lt;/td&gt;
&lt;td&gt;
01/01/2006&lt;/td&gt;
&lt;td&gt;
Lecturer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table style="BORDER-TOP-STYLE: solid; BORDER-RIGHT-STYLE: solid; BORDER-LEFT-STYLE: solid; BORDER-BOTTOM-STYLE: solid"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td colspan=3&gt;
&lt;u&gt;Courses Appendix&lt;/u&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
012&lt;/td&gt;
&lt;td&gt;
01/01/2000&lt;/td&gt;
&lt;td&gt;
Middle Ages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
012&lt;/td&gt;
&lt;td&gt;
01/06/2005&lt;/td&gt;
&lt;td&gt;
Computer Basics&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table style="BORDER-TOP-STYLE: solid; BORDER-RIGHT-STYLE: solid; BORDER-LEFT-STYLE: solid; BORDER-BOTTOM-STYLE: solid"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td colspan=3&gt;
&lt;u&gt;Payroll Appendix&lt;/u&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
012&lt;/td&gt;
&lt;td&gt;
01/02/2006&lt;/td&gt;
&lt;td&gt;
Basic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;
After making the "merge" of the appendixes this is how the Diary will look like:
&lt;/p&gt;
&lt;table style="BORDER-TOP-STYLE: solid; BORDER-RIGHT-STYLE: solid; BORDER-LEFT-STYLE: solid; BORDER-BOTTOM-STYLE: solid"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th&gt;
ID&lt;/th&gt;
&lt;th&gt;
Start Date&lt;/th&gt;
&lt;th&gt;
What Changed&lt;/th&gt;
&lt;th&gt;
Name&lt;/th&gt;
&lt;th&gt;
Payroll&lt;/th&gt;
&lt;th&gt;
Department&lt;/th&gt;
&lt;th&gt;
Position&lt;/th&gt;
&lt;th&gt;
Courses&lt;/th&gt;
&lt;th&gt;
End Date&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
012&lt;/td&gt;
&lt;td&gt;
01/01/2000&lt;/td&gt;
&lt;td&gt;
Department&lt;br&gt;
Position&lt;br&gt;
Courses&lt;/td&gt;
&lt;td&gt;
Miky Schreiber&lt;/td&gt;
&lt;td&gt;
&lt;/td&gt;
&lt;td&gt;
History&lt;/td&gt;
&lt;td&gt;
Student&lt;/td&gt;
&lt;td&gt;
Middle Ages&lt;/td&gt;
&lt;td&gt;
31/05/2005&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
012&lt;/td&gt;
&lt;td&gt;
01/06/2005&lt;/td&gt;
&lt;td&gt;
Department&lt;br&gt;
Courses&lt;/td&gt;
&lt;td&gt;
Miky Schreiber&lt;/td&gt;
&lt;td&gt;
&lt;/td&gt;
&lt;td&gt;
Computers&lt;/td&gt;
&lt;td&gt;
Student&lt;/td&gt;
&lt;td&gt;
Computer Basics&lt;/td&gt;
&lt;td&gt;
31/12/2005&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
012&lt;/td&gt;
&lt;td&gt;
01/01/2006&lt;/td&gt;
&lt;td&gt;
Position&lt;/td&gt;
&lt;td&gt;
Miky Schreiber&lt;/td&gt;
&lt;td&gt;
&lt;/td&gt;
&lt;td&gt;
Computers&lt;/td&gt;
&lt;td&gt;
Lecturer&lt;/td&gt;
&lt;td&gt;
Computer Basics&lt;/td&gt;
&lt;td&gt;
31/01/2006&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
012&lt;/td&gt;
&lt;td&gt;
01/02/2006&lt;/td&gt;
&lt;td&gt;
Payroll&lt;/td&gt;
&lt;td&gt;
Miky Schreiber&lt;/td&gt;
&lt;td&gt;
Basic&lt;/td&gt;
&lt;td&gt;
Computers&lt;/td&gt;
&lt;td&gt;
Lecturer&lt;/td&gt;
&lt;td&gt;
Computer Basics&lt;/td&gt;
&lt;td&gt;
31/12/2999&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;
&lt;u&gt;So what have we done here?&lt;/u&gt;
&lt;/p&gt;
&lt;p&gt;
We built a diary that combines all the other appendixes. For each record, or point
in time, you can see what was the situation of the person from every appendix's perspective.
That is, if on date X only the department appendix was changed, you can still see
what was the situation in the position appendix. Hence, you can ask business questions
that combines several appendixes even when these questions are aggregative. For example:
how many students which have the lecturer position get paid basic salary? I invite
the reader to read the business questions in the previous post and see how you can
answer them using this model. &lt;strong&gt;You must understand that&lt;/strong&gt; with this
model you must always fetch &lt;strong&gt;only one row per person&lt;/strong&gt; or otherwise
you'll have wrong results. In order to do that you need to filter the table on a particular
day (using expression like: where myDate between StartDate and EndDate) or use IsCurrentRow=1
(you can&amp;nbsp;calculate this column in the ETL).
&lt;/p&gt;
&lt;p&gt;
&lt;u&gt;How did we do that?&lt;/u&gt;
&lt;/p&gt;
&lt;p&gt;
I won't cover here the entire ETL design, but I'll explain the key principles: First,
we build the avenue. The avenue is a small table with only two columns - the ID of
the person and the StartDate. We make distinct query on all the appendixes and get,
for every person, all the start date from all the appendixes and this is how we build
the avenue. Once we have the avenue, we join it with every appendix, and this is how
we bring the values from the appendixes themselves. Now, for each start date we have
the data of the appendixes where this start date came from. In order to "complete"
the data from the other appendixes, we "spread" the missing value from the previous
row. For example, on 1/6/2005 the position status hasn't changed so after the join
the Position&amp;nbsp;field will be null, so we take the value from the previous row.
&lt;/p&gt;
&lt;p&gt;
&lt;u&gt;The problem of Irrelevance&lt;/u&gt;
&lt;/p&gt;
&lt;p&gt;
After making the above, we had a problem that floated very quickly. As always, each
column has a foreign key to its dimension table, so the values cannot be null. The
problem is that sometimes the values must be null - it happens in the first rows of
each person, when not all appendixes has their values yet. For example, look at the
payroll column in the "diary" table above. It gets a value only in the last row. We
can't have null in this field because we want to have foreign key but we don't have
any value yet. What can we do? The solution we found is a "N/A" value which we added
in such situations. Note that you need to add a new row with the "N/A" key in every
dimension table.
&lt;/p&gt;
&lt;p&gt;
Is that all? of course not! Using this "diary" table we can answer 90% of the business
question we faced. This table is basically the whole project. But, there are more
challenges we have to solve. I'll leave it for the future posts, where I'll show how
we can make the HR BI system smarter.
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=2beab113-6d5f-45fb-9a20-83de2f978793" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,2beab113-6d5f-45fb-9a20-83de2f978793.aspx</comments>
      <category>BI;BI/DataWarehousing;BI/HR</category>
    <feedburner:origLink>http://www.miky-schreiber.com/Blog/PermaLink,guid,2beab113-6d5f-45fb-9a20-83de2f978793.aspx</feedburner:origLink></item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=7821afca-321f-44eb-a960-5b00addc2440</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,7821afca-321f-44eb-a960-5b00addc2440.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,7821afca-321f-44eb-a960-5b00addc2440.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=7821afca-321f-44eb-a960-5b00addc2440</wfw:commentRss>
      <slash:comments>1</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
          <em>This post is the second one in a </em>
          <a href="http://www.miky-schreiber.com/Blog/CategoryView,category,BI%2cHR.aspx">
            <em>series</em>
          </a>
          <em> about HR
BI systems. The </em>
          <a href="http://www.miky-schreiber.com/Blog/PermaLink,guid,f874a87a-a82e-41f4-93d1-08c11160ef3a.aspx">
            <em>previous
post</em>
          </a>
          <em> dealt with pricing and prioritization of the project. In this post
I will describe the initial modeling process - familiarity with source systems, the main/major
problems and their effect on the ERD model.</em>
        </p>
        <p>
When starting with a new BI system design, the first question to be asked
is: What are the business questions that interests the customer? It depends,
of course, on who is the client, but that was covered in the previous
post. Unfortunately I can't reveal the content world for whom I built the system
and therefore to I'll use a University's content world as example.
Almost all the questions that the customers wanted to get answered had
totals that combine different subjects. For example:
</p>
        <ul>
          <li>
What is the average salary in each department? 
</li>
          <li>
How many professors and students were in each department in each year in
the last five years? 
</li>
          <li>
What is the ratio of students / lecturers in each department in the last years?
(Graph) 
</li>
          <li>
How many transitions between departments (by students and lecturers) took
place over the years? Which department is the most abandoned and which is the
most popular? 
</li>
          <li>
Display events of lecturer / student from admission to the university until this
day</li>
        </ul>
This list is only a small part of what our customers wanted to know. Like always,
there were no answers to these questions because it required integration
between multiple sources. While getting familiar with the content world, I noticed
several key points that will be treated later on in the project (and in future posts): 
<ul><li>
There's a very difficult problem of terminology in the organization ("one
truth"). For example, the most basic terms - a student and lecturer are not consensus.
For example, what happens when a student is also a lecturer? What is the lecturer's
department if he lectures in more than one department? If a student moved
to another department and returned after a semester is it considered to be one, two
transitions or maybe none? Of course, different people had different answers
to these questions. 
</li><li>
All sources came in the same format - appendix. For those who not familiar, appendix
is a table containing the ID, date and nature of change (kind of a diary on a given
subject, "Nigreret" in Hebrew). That is, each record shows only the new
/ current situation (but the table is historical). When you want to ask a question
relating to only one appendix (salary, for example) there is no problem. The
problems arise when asking questions that combine several appendixes (salary
and department, for example). There is a real need for carrying out the integration
of the appendixes. Well, This is what Datawarehouse meant to do. 
</li><li>
We did not find the Fact table. On one hand it seems that every appendix is
a fact table because it describes changes / events that took place in the
world, but ... Where are the measures? On the other hand, appendix looks like
a Slowly Changing Dimension (SCD) but ... Where's the fact table? No doubt that this
is a Factless Fact.</li></ul><p>
So we're got familiar with the main questions, the sources of information
(same format - easier for us) and the unique attributes of this project.
Now comes the hard question - how we'll design the data warehouse? Our initial
solution looks like this: 
</p><p><img src="http://www.miky-schreiber.com/Blog/content/binary/BI_HR_Pre_ERD.JPG" border="0" /></p><p /><p>
In other words - classic Star Schema. The appendixes are SCD and the Fact table is
in the center of the star. The fact will contain only one record per
person and it will contain only static data of the person, such as ID, name and date
of birth. This model takes us back to the questions about how the linkage between
dimensions and fact will be made and what is the measure the fact. We tried
to think how we answer the most basic questions required by the system (by the
way, this is the way to see the quality of the ERD):
</p><ol><li>
For each lecturer, what is the salary, department and current position? - We'll
take from every dimension/appendix the current record of the lecturer. The link
between the dimensions will be carried out by the fact (lecturer's record) using
the ID column 
</li><li>
What is the salary, department and position of the lecturer on date X? - From every
dimension we'll take only the record which its starting date is from before
X and nearest to X and again we'll link the dimensions through the fact table. Hence,
we should calculate analytically (analytical functions) for each record its end date
(the day before the start of the next record) and then it's easy to query with those
kind of questions by using "between" statements. 
</li><li>
Average of the salaries in each department - we'll pull out the list of lecturers
in each department (from the department appendix). We'll choose a date that
interests us, and for each person we'll pull only the record which contains that date.
Then we'll do group by. Ostensibly it's not trivial in every classical reporting
tool, but let's think for a moment: what do we do here? Group by members of a particular
dimension by another dimension counting, where dimensions are linked through the fact
table. Looks fine. So - What's the problem? The problem is that we are trying to aggregate/sum
by a dimension's attribute and not by a measure, which is not possible in every reporting
tool. (a good solution will work with many tools and not just with dedicated
tool for a particular ERD) 
</li><li>
Questions involving only one appendix - very easy and do not require datawarehouse
at all 
</li><li>
Show events from the first day in the university until that day (of a certain
person) - not possible in classic report tool. Let's say we want to see the "event
log" of the lecturer Y, sorted by starting date in ascending order. This requires
a union of all the appendixes / dimensions and sort of all records
by starting date, where there may be several different appendix records
that start in the same date (even very possible. change in the position usually
leads to a raise)</li></ol>
At that point we realized that we needed to retreat. That looked like a good
ERD at first but the last question is very basic and our model proved wrong. How do
we show the event log which is a combination of the appendixes? How can we identify
processes when they occurring, such as increase in the amount of courses and salary
which leads to a better position? These are all questions that the system should
give them an answer and a classic star schema can't help us here. 
<p /><p>
What have we done? Continued next post ...
</p><img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=7821afca-321f-44eb-a960-5b00addc2440" /><xhtml:img xmlns:xhtml="http://www.w3.org/1999/xhtml" src="http://feeds.feedburner.com/~r/MikySchreiberBlog/~4/FjIJxDAbpKk" height="1" width="1" /></body>
      <title>HR BI - Preliminary Design</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,7821afca-321f-44eb-a960-5b00addc2440.aspx</guid>
      <link>http://feedproxy.google.com/~r/MikySchreiberBlog/~3/FjIJxDAbpKk/PermaLink,guid,7821afca-321f-44eb-a960-5b00addc2440.aspx</link>
      <pubDate>Wed, 13 Oct 2010 14:39:33 GMT</pubDate>
      <description>&lt;p&gt;
&lt;em&gt;This post&amp;nbsp;is the second one in a &lt;/em&gt;&lt;a href="http://www.miky-schreiber.com/Blog/CategoryView,category,BI%2cHR.aspx"&gt;&lt;em&gt;series&lt;/em&gt;&lt;/a&gt;&lt;em&gt; about&amp;nbsp;HR
BI systems. The &lt;/em&gt;&lt;a href="http://www.miky-schreiber.com/Blog/PermaLink,guid,f874a87a-a82e-41f4-93d1-08c11160ef3a.aspx"&gt;&lt;em&gt;previous
post&lt;/em&gt;&lt;/a&gt;&lt;em&gt; dealt with pricing and prioritization of the project. In this post
I will describe the initial modeling process - familiarity with source systems, the&amp;nbsp;main/major
problems and their&amp;nbsp;effect on the ERD model.&lt;/em&gt;
&lt;/p&gt;
&lt;p&gt;
When&amp;nbsp;starting with&amp;nbsp;a new BI system design, the first question to be asked
is: What are the business questions that interests the customer?&amp;nbsp;It depends,
of course,&amp;nbsp;on who is the client, but&amp;nbsp;that was&amp;nbsp;covered in the previous
post. Unfortunately I&amp;nbsp;can't reveal the content world for whom I built the system
and therefore to&amp;nbsp;I'll&amp;nbsp;use a&amp;nbsp;University's content world as example.
Almost&amp;nbsp;all the&amp;nbsp;questions that the customers wanted to get answered&amp;nbsp;had
totals that combine different subjects. For example:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
What is the average salary in each department? 
&lt;li&gt;
How many&amp;nbsp;professors and students were in each department in&amp;nbsp;each year in
the&amp;nbsp;last five years? 
&lt;li&gt;
What is the ratio of students / lecturers in each department in the&amp;nbsp;last years?
(Graph) 
&lt;li&gt;
How many&amp;nbsp;transitions between&amp;nbsp;departments (by students and lecturers) took
place over the years?&amp;nbsp;Which department is the most abandoned and which is the
most popular? 
&lt;li&gt;
Display events of lecturer / student from admission to the university&amp;nbsp;until this
day&lt;/li&gt;
&lt;/ul&gt;
This list is only a small part of what our customers wanted to know.&amp;nbsp;Like always,
there&amp;nbsp;were no answers to these questions because it&amp;nbsp;required integration
between multiple sources. While getting familiar with the&amp;nbsp;content world, I noticed
several key points that will be treated later on in the project (and in future posts): 
&lt;ul&gt;
&lt;li&gt;
There's a very difficult problem of terminology&amp;nbsp;in the&amp;nbsp;organization ("one
truth"). For example, the most basic&amp;nbsp;terms - a student and lecturer are not consensus.
For example, what happens when a student is also a lecturer? What is the lecturer's
department if he lectures in&amp;nbsp;more than one&amp;nbsp;department? If a student&amp;nbsp;moved
to another department and returned after a semester is it considered to be one, two
transitions or&amp;nbsp;maybe none? Of course, different people&amp;nbsp;had different answers
to these questions. 
&lt;li&gt;
All sources came in the same format - appendix. For those who not familiar,&amp;nbsp;appendix
is a table containing the ID, date and nature of change (kind of a diary on a given
subject, "Nigreret" in Hebrew). That is, each record&amp;nbsp;shows only&amp;nbsp;the new
/ current situation (but the table is historical).&amp;nbsp;When you want to ask a question
relating to only one&amp;nbsp;appendix (salary, for example) there is no problem. The
problems arise when asking questions that combine several&amp;nbsp;appendixes (salary
and department, for example). There is a real need for carrying out the integration
of the appendixes. Well,&amp;nbsp;This is what&amp;nbsp;Datawarehouse meant to do. 
&lt;li&gt;
We did not find the Fact table. On&amp;nbsp;one hand it seems that&amp;nbsp;every appendix&amp;nbsp;is
a fact table&amp;nbsp;because it&amp;nbsp;describes changes / events that took place in the
world, but ... Where are the measures? On the other hand,&amp;nbsp;appendix looks like
a Slowly Changing Dimension (SCD) but ... Where's the fact table? No doubt that this
is&amp;nbsp;a Factless Fact.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
So we're got familiar with the main&amp;nbsp;questions,&amp;nbsp;the sources of information
(same format -&amp;nbsp;easier for us) and the unique&amp;nbsp;attributes of this project.
Now comes the hard question - how&amp;nbsp;we'll&amp;nbsp;design the data warehouse? Our initial
solution looks like this: 
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://www.miky-schreiber.com/Blog/content/binary/BI_HR_Pre_ERD.JPG" border=0&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;p&gt;
In other words - classic Star Schema. The appendixes&amp;nbsp;are SCD and the Fact table&amp;nbsp;is
in the center of the star.&amp;nbsp;The fact&amp;nbsp;will contain only one&amp;nbsp;record per
person and it will contain only static data of the person, such as ID, name and date
of birth. This model takes us back to the questions about how the linkage between
dimensions&amp;nbsp;and fact will be made&amp;nbsp;and what is the measure the fact. We tried
to think how we answer the most basic questions required&amp;nbsp;by the system (by the
way, this is&amp;nbsp;the way to see&amp;nbsp;the quality of&amp;nbsp;the ERD):
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
For each lecturer, what is the salary, department&amp;nbsp;and current position? -&amp;nbsp;We'll
take from&amp;nbsp;every dimension/appendix the current record of the lecturer. The link
between the dimensions will be carried out by the fact (lecturer's record)&amp;nbsp;using
the ID column 
&lt;li&gt;
What is the salary, department and&amp;nbsp;position of the lecturer on date X? -&amp;nbsp;From&amp;nbsp;every
dimension we'll take only the record&amp;nbsp;which its&amp;nbsp;starting date is from before
X and nearest to X and again we'll link the dimensions through the fact table. Hence,
we should calculate analytically (analytical functions) for each record its end date
(the day before the start of the next record) and then it's easy to query with those
kind of questions&amp;nbsp;by using "between" statements. 
&lt;li&gt;
Average of the salaries in each department - we'll pull out the list of&amp;nbsp;lecturers
in&amp;nbsp;each department (from the department appendix).&amp;nbsp;We'll choose a date&amp;nbsp;that
interests us, and for each person we'll pull only the record which contains that date.
Then we'll do group by.&amp;nbsp;Ostensibly it's not trivial in every classical reporting
tool, but let's think for a moment: what do we do here? Group by members of a particular
dimension by another dimension counting, where dimensions are linked through the fact
table. Looks fine. So - What's the problem? The problem is that we are trying to aggregate/sum
by a dimension's attribute and not by a measure, which is not possible in every reporting
tool. (a&amp;nbsp;good solution will work with many tools and not just with dedicated
tool for a particular ERD) 
&lt;li&gt;
Questions involving only one&amp;nbsp;appendix - very&amp;nbsp;easy and do not require datawarehouse
at all 
&lt;li&gt;
Show events from&amp;nbsp;the first day in the university until that day&amp;nbsp;(of a certain
person) - not possible in classic report tool. Let's say we want to see the "event
log" of the lecturer Y, sorted by starting date in ascending order. This requires
a union of all the&amp;nbsp;appendixes / dimensions and sort of&amp;nbsp;all&amp;nbsp;records
by&amp;nbsp;starting date, where there may be several different&amp;nbsp;appendix records
that start in the same date (even very possible. change in the&amp;nbsp;position usually
leads to a raise)&lt;/li&gt;
&lt;/ol&gt;
At that point we realized that we needed to retreat. That looked like a&amp;nbsp;good
ERD at first but the last question is very basic and our model proved wrong. How do
we&amp;nbsp;show the event log which is a combination of the appendixes? How can we identify
processes when they occurring, such as increase in the amount of courses and salary
which&amp;nbsp;leads to a better position? These are all questions that the system should
give them an answer and a classic star schema&amp;nbsp;can't help us here. 
&lt;p&gt;
&lt;/p&gt;
&lt;p&gt;
What have we done? Continued next post ...
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=7821afca-321f-44eb-a960-5b00addc2440" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,7821afca-321f-44eb-a960-5b00addc2440.aspx</comments>
      <category>BI;BI/DataWarehousing;BI/HR</category>
    <feedburner:origLink>http://www.miky-schreiber.com/Blog/PermaLink,guid,7821afca-321f-44eb-a960-5b00addc2440.aspx</feedburner:origLink></item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=54a70fb6-5c6f-4583-bb80-3ba13edfa58c</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,54a70fb6-5c6f-4583-bb80-3ba13edfa58c.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,54a70fb6-5c6f-4583-bb80-3ba13edfa58c.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=54a70fb6-5c6f-4583-bb80-3ba13edfa58c</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
Hey, look! A post about OBIEE with no Javascript!! ;-)
</p>
        <p>
OBIEE lets you have either actual values <strong>or</strong> percent in the data labels
in pie charts. I've found a trick to show both: add a new column (I called it concat)
in the criteria tab which is the concatenation of the entity (shop name, for example)
and its measure value. This is a string column, so we'll need to convert the
measure into string. It'll look something like that: LOGICAL_TABLE.ENTITY_NAME ||
' ' || cast(FACT_TABLE.MY_MEASURE as char). Edit the graph/pie view and make the concat
column be the legend axis and the measure be the pie values axis. In the "Chart Type
Special" menu (the button with the star) choose "Show values as Percentage of total".
The result will be that the pie values will have both entity name, actual value (because
now, for OBIEE, its part of the enitity name) and the percent.
</p>
        <p>
Enjoy!
</p>
        <img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=54a70fb6-5c6f-4583-bb80-3ba13edfa58c" />
      <xhtml:img xmlns:xhtml="http://www.w3.org/1999/xhtml" src="http://feeds.feedburner.com/~r/MikySchreiberBlog/~4/XfioHWZDPBA" height="1" width="1" /></body>
      <title>Actual Values and Percent in OBIEE Pie Chart's Data Labels</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,54a70fb6-5c6f-4583-bb80-3ba13edfa58c.aspx</guid>
      <link>http://feedproxy.google.com/~r/MikySchreiberBlog/~3/XfioHWZDPBA/PermaLink,guid,54a70fb6-5c6f-4583-bb80-3ba13edfa58c.aspx</link>
      <pubDate>Wed, 13 Oct 2010 09:34:04 GMT</pubDate>
      <description>&lt;p&gt;
Hey, look! A post about OBIEE with no Javascript!! ;-)
&lt;/p&gt;
&lt;p&gt;
OBIEE lets you have either actual values &lt;strong&gt;or&lt;/strong&gt; percent in the data labels
in pie charts. I've found a trick to show both: add a new column (I called it concat)
in the criteria tab which is the concatenation of the entity (shop name, for example)
and&amp;nbsp;its measure value. This is a string column, so we'll need to convert the
measure into string. It'll look something like that: LOGICAL_TABLE.ENTITY_NAME ||
' ' || cast(FACT_TABLE.MY_MEASURE as char). Edit the graph/pie view and make the concat
column be the legend axis and the measure be the pie values axis. In the "Chart Type
Special" menu (the button with the star) choose "Show values as Percentage of total".
The result will be that the pie values will have both entity name, actual value (because
now, for OBIEE,&amp;nbsp;its part of the enitity name) and the percent.
&lt;/p&gt;
&lt;p&gt;
Enjoy!
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=54a70fb6-5c6f-4583-bb80-3ba13edfa58c" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,54a70fb6-5c6f-4583-bb80-3ba13edfa58c.aspx</comments>
      <category>BI;BI/Oracle;BI/Oracle/OBIEE</category>
    <feedburner:origLink>http://www.miky-schreiber.com/Blog/PermaLink,guid,54a70fb6-5c6f-4583-bb80-3ba13edfa58c.aspx</feedburner:origLink></item>
  </channel>
</rss>

