<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>SQLBI</title>
	
	<link>http://www.sqlbi.com</link>
	<description>We love Business Intelligence with Microsoft SQL Server</description>
	<lastBuildDate>Wed, 22 Feb 2012 23:52:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/sqlbi_blog" /><feedburner:info uri="sqlbi_blog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Incremental Processing in Tabular Using Process Add</title>
		<link>http://feedproxy.google.com/~r/sqlbi_blog/~3/Z1fds9GOlvU/</link>
		<comments>http://www.sqlbi.com/articles/incremental-processing-in-tabular-using-process-add/#comments</comments>
		<pubDate>Sat, 11 Feb 2012 05:31:36 +0000</pubDate>
		<dc:creator>Marco Russo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AMO]]></category>
		<category><![CDATA[Process]]></category>
		<category><![CDATA[SSAS]]></category>
		<category><![CDATA[Tabular]]></category>

		<guid isPermaLink="false">http://www.sqlbi.com/?p=760</guid>
		<description><![CDATA[In Analysis Services 2012 you can process a table in a Tabular model by several ways: you can process the&#8230; <a href="http://www.sqlbi.com/articles/incremental-processing-in-tabular-using-process-add/" class="more-link">Learn More &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>In Analysis Services 2012 you can process a table in a Tabular model by several ways: you can process the whole table, you can split the table in several partitions and process a single partition, you can merge partitions and you can incrementally process a single partition by using ProcessAdd, which is the topic of this article.</p>
<p><span id="more-760"></span>
<p>The ProcessAdd command allows you to add only a few rows to an existing partition and to do that you have to specify the query that reads data from the data source, applying the necessary WHERE condition in a SQL query or by using any other SQL statement to this purpose. The ProcessAdd command available in Process Partition(s) dialog box in SQL Server Management Studio (SSMS) does not allow you to specify a custom query for the process operation, in order to filter only new rows that have to be added to the partition. However, this is not a big issue: in fact, if you need to use ProcessAdd, probably you need to automate that command in a batch process. Thus, a programmatic approach is required. A XMLA script command is required and you will see how to programmatically obtain it by using AMO and PowerShell.</p>
<p>It is out of scope of this article describing how you should define the SQL command that only returns the new rows to be added to the table in a Tabular model. Remember that it is your responsibility avoiding duplicate rows in the destination table. There is no automatic detection of duplicates and if the table does not have unique columns you would obtain row duplicates in your table as a result, otherwise the process operation will stop with an error if a unique condition for a column is violated by loading new data.</p>
<p>Technically, when a ProcessAdd runs internally Analysis Services creates a new partition, process the whole partition and then merge it to the target partition (the one on which ProcessAdd command has been executed). You can use this same approach by using separate operations, but ProcessAdd can be more optimized for this specific activity.</p>
<p>In the following sections you will see how to execute and automate ProcessAdd by using different tools.</p>
<h1>ProcessAdd with XMLA Script</h1>
<p>In order to process a table partition in Tabular, you have to issue a process command to a partition of a measure group in the corresponding Multidimensional model that Analysis Services publish in order to make it queryable by any existing OLAP client tool. The Batch element contains a Process command, which specify the target partition / measure group / cube / database, followed by a Bindings element that replaces the existing query binding on the partition with a different query, which will be used just for this process command.</p>
<pre class="brush: xml; highlight: [20,21,22]; title: ; toolbar: false; notranslate">&lt;Batch xmlns=&quot;http://schemas.microsoft.com/analysisservices/2003/engine&quot;&gt;
    &lt;Process&gt;
        &lt;Object&gt;
            &lt;DatabaseID&gt;AdventureWorks Tabular Model SQL 2012&lt;/DatabaseID&gt;
            &lt;CubeID&gt;Model&lt;/CubeID&gt;
            &lt;MeasureGroupID&gt;Internet Sales_78de3956-70d9-429f-9857-c407f7902f1e&lt;/MeasureGroupID&gt;
            &lt;PartitionID&gt;Internet Sales_797b5664-d3d8-441f-ab29-b3cc76cdc1ff&lt;/PartitionID&gt;
        &lt;/Object&gt;
        &lt;Type&gt;ProcessAdd&lt;/Type&gt;
        &lt;WriteBackTableCreation&gt;UseExisting&lt;/WriteBackTableCreation&gt;
    &lt;/Process&gt;
    &lt;Bindings&gt;
        &lt;Binding&gt;
            &lt;DatabaseID&gt;AdventureWorks Tabular Model SQL 2012&lt;/DatabaseID&gt;
            &lt;CubeID&gt;Model&lt;/CubeID&gt;
            &lt;MeasureGroupID&gt;Internet Sales_78de3956-70d9-429f-9857-c407f7902f1e&lt;/MeasureGroupID&gt;
            &lt;PartitionID&gt;Internet Sales_797b5664-d3d8-441f-ab29-b3cc76cdc1ff&lt;/PartitionID&gt;
            &lt;Source xsi:type=&quot;QueryBinding&quot; xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; &gt;
                &lt;DataSourceID&gt;d58c09be-3cc9-4688-8d08-026c407f5ad7&lt;/DataSourceID&gt;
                &lt;QueryDefinition&gt;
                    SELECT * FROM FactInternetSales WHERE OrderDateKey &amp;gt;= 20120215
                &lt;/QueryDefinition&gt;
            &lt;/Source&gt;
        &lt;/Binding&gt;
    &lt;/Bindings&gt;
&lt;/Batch&gt;
</pre>
<p>The names that you can see in MeasureGroupID, PartitionID and DataSourceID elements contain a GUID in the name that has been automatically generated by the editor when you created the data model. In order to find the right values you have to investigate the object properties in SQL Server Management Studio, otherwise you have to use the AMO or PowerShell approach that are described later in this article, which automatically find the right name starting from the display name of the object.</p>
<p>The QueryDefinition element contains the query that will be used by the ProcessAdd operation. You can use any valid SQL statement here that will produce the same columns required by the partition that you are going to incrementally update. In the example a SELECT statement has been used, filtering all the orders having an order date greater than or equal to February 15<sup>th</sup> 2012. It is up to you to define a safe filter condition and to avoid loading the same rows multiple times (for example, with this filter condition if an order has a date of February 16<sup>th</sup> and the day after you filter orders greater than or equal to February 16<sup>th</sup>, that order row will be loaded twice).</p>
<h1>ProcessAdd with Integration Services</h1>
<p>The XMLA script described in the previous section can be generated by using the Analysis Services Processing Task component in Integration Services. Cathy Dumas described how to use this component in a <a href="http://cathydumas.com/2012/02/14/doing-a-processadd-in-ssis/">step-by-step article in her blog</a>. This is probably the simplest user interface that makes you able to obtain the corresponding XMLA Script for a Process Add. After you correctly defined the task in Integration Services, you can capture the XMLA Script command by running the task when a SQL Profiler trace is active on the Analysis Services server. The Command Begin event class contains in TextData property the XMLA batch command that you need. Getting XMLA in this way might be useful if you just need to replace a parameter in the SQL query that is specified in the QueryDefinition element of the XMLA script.</p>
<h1>ProcessAdd with AMO</h1>
<p>Using Analysis Management Objects (AMO) you can generate the same Process Add command you have seen in XMLA Script without worrying about internal GUIDs. You can read a description of the usage of AMO commands in a <a href="http://cathydumas.com/2012/01/23/doing-a-processadd-in-amo/">blog post written by Cathy Dumas</a>. The XMLA script you have seen before in this article can be obtained and executed by using the following C# code (you can capture the XMLA Script by using the <a href="http://technet.microsoft.com/en-us/library/ms345082(v=sql.110).aspx#CaptureLog">CaptureLog class and CaptureXML attribute</a>).</p>
<pre class="brush: cpp; title: ; toolbar: false; notranslate">namespace AmoAutomation {
    class Program {
        static void Main(string[] args) {
            Server server = new Server();
            server.Connect(@&quot;localhostTABULAR&quot;);
            Database db = server.Databases[&quot;AdventureWorks Tabular Model SQL 2012&quot;];
            DataSourceView dsv = db.DataSourceViews.GetByName(&quot;Sandbox&quot;);
            Cube cube = db.Cubes.GetByName(&quot;Model&quot;);
            MeasureGroup measureGroup = cube.MeasureGroups.GetByName(&quot;Internet Sales&quot;);
            Partition partition = measureGroup.Partitions.GetByName(&quot;Internet Sales&quot;);
            partition.Process(
                ProcessType.ProcessAdd,
                new QueryBinding(
                        dsv.DataSourceID,
                        &quot;SELECT * FROM FactInternetSales WHERE OrderDateKey &gt;= 20120215&quot;));
            server.Disconnect();
        }
    }
}
</pre>
<h1>ProcessAdd with PowerShell</h1>
<p>Once you know how to create the desired process command with AMO, you can easily translate that code in a PowerShell script. For example, the AMO code you have seen in the previous section can be translated into the following PowerShell script:</p>
<pre class="brush: cpp; title: ; toolbar: false; notranslate">[Reflection.Assembly]::LoadWithPartialName(&quot;Microsoft.AnalysisServices&quot;)
$server = New-Object Microsoft.AnalysisServices.Server
$server.connect(&quot;localhostK12&quot;)
$db = $server.Databases.Item(&quot;AdventureWorks Tabular Model SQL 2012&quot;)
$dsv = $db.DataSourceViews.GetByName(&quot;Sandbox&quot;)
$cube = $db.Cubes.GetByName(&quot;Model&quot;)
$measureGroup = $cube.MeasureGroups.GetByName(&quot;Internet Sales&quot;)
$partition = $measureGroup.Partitions.GetByName(&quot;Internet Sales&quot;)
$queryBinding = New-Object Microsoft.AnalysisServices.QueryBinding( $dsv.DataSourceID, &quot;SELECT * FROM FactInternetSales WHERE OrderDateKey &gt;= 20120215&quot; )
$partition.Process( &quot;ProcessAdd&quot;, $queryBinding )
$server.Disconnect()
</pre>
<h1>Conclusion</h1>
<p>Incremental processing of tables in a Tabular model is possible but it requires using commands on Multidimensional entities. This is because by now an API over the real Tabular model is not available. This article showed several ways to execute an incremental ProcessAdd command on a table in a Tabular model, providing a temporary query binding that identifies only the rows to load in the ProcessAdd batch, without changing underlying Tabular structure or views in SQL Server. You can use the same pattern, choosing the technique that better adapts to your periodic process batch operation.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=Z1fds9GOlvU:IJ5U8rIg_gY:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=Z1fds9GOlvU:IJ5U8rIg_gY:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=Z1fds9GOlvU:IJ5U8rIg_gY:WJiHqkZLMAk"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?i=Z1fds9GOlvU:IJ5U8rIg_gY:WJiHqkZLMAk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/sqlbi_blog/~4/Z1fds9GOlvU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.sqlbi.com/articles/incremental-processing-in-tabular-using-process-add/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.sqlbi.com/articles/incremental-processing-in-tabular-using-process-add/</feedburner:origLink></item>
		<item>
		<title>Memory Settings in Tabular Instances of Analysis Services</title>
		<link>http://feedproxy.google.com/~r/sqlbi_blog/~3/kv8uTtRx5XE/</link>
		<comments>http://www.sqlbi.com/articles/memory-settings-in-tabular-instances-of-analysis-services/#comments</comments>
		<pubDate>Sat, 28 Jan 2012 11:36:20 +0000</pubDate>
		<dc:creator>Marco Russo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[SSAS]]></category>
		<category><![CDATA[Tabular]]></category>
		<category><![CDATA[VertiPaq]]></category>

		<guid isPermaLink="false">http://www.sqlbi.com/?p=747</guid>
		<description><![CDATA[Memory settings in Analysis Services might be an important tuning of a correct setup, both for production and developer machines.&#8230; <a href="http://www.sqlbi.com/articles/memory-settings-in-tabular-instances-of-analysis-services/" class="more-link">Learn More &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>Memory settings in Analysis Services might be an important tuning of a correct setup, both for production and developer machines. Analysis Services 2012 can be installed in different ways and the new Tabular instance has new memory settings that are important to know.</p>
<p><span id="more-747"></span>
<p>I expect that many servers (and developer’s workstations) will have at least two instances: one for BISM Multidimensional and the other for BISM Tabular. A Multidimensional has all the memory settings that were already in previous versions of Analysis Services. You can find more info about <a href="http://msdn.microsoft.com/en-us/library/ms174514(v=SQL.110).aspx">Memory Properties</a> on MSDN Books On Line. In case you have more instances running on the same machine, you might want to consider altering the default settings, especially on a production machine where the two instances are used concurrently (it might be less important on a developer workstation where only one of the two instances is used at a particular time).</p>
<p>The more important settings (including their default value) are:</p>
<ul>
<li>Memory / HardMemoryLimit – 0</li>
<li>Memory / LowMemoryLimit – 65</li>
<li>Memory / TotalMemoryLimit – 80</li>
</ul>
<p>A Tabular instance shares these same settings and adds new properties that are specific to the Vertipaq engine:</p>
<ul>
<li>Memory / VertiPaqMemoryLimit – 60</li>
<li>Memory / VertiPaqPagingPolicy – 0</li>
</ul>
<p>The value of <b>VertiPaqPagingPolicy</b> defines how paging works with Tabular.</p>
<ul>
<li>Mode 0 – Locked – Lock all VertiPaq data in memory</li>
<li>Mode 1 – Allow system paging – Lock only hash dictionaries in memory, and allow VertiPaq data to exceed total physical memory</li>
</ul>
<p>In case you choose mode 0, the <b>VertiPaqMemoryLimit</b> defines the total amount of memory VertiPaq is allowed to lock in the working set (and hence the total that can be used for in-memory databases – remember that the Analysis Services service might use more memory for other reasons). In mode 1, it defines a limit for the physical memory that is used by VertiPaq, allowing paging for the remaining memory (virtual committed memory) above this limit.</p>
<p>The VertiPaqPagingPolicy setting provides a way to prevent VertiPaq data from interacting badly with the memory cleaning subsystem. In mode 1, it causes the cleaner subsystem to ignore memory allocated for VertiPaq data beyond VertiPaqMemoryLimit when calculating the price of memory.  In this mode, the server’s total memory usage will be permitted to exceed physical memory, constrained primarily by total virtual memory, paging data out to system pagefile.</p>
<p>If you want to reduce memory for an instance of Analysis Services, it makes sense to set VertiPaqMemoryLimit to a number that is lower than LowMemoryLimit.</p>
<p>Now, some more details about other memory settings in Tabular and their interaction with VertiPaqPagingPolicy and VertiPaqMemoryLimit regarding to cache eviction in VertiPaq. A big thank you to Brad Daniels and Akshai Mirchandani for sharing many details you can find in this article.</p>
<p><b>HardMemoryLimit</b> is the maximum memory the system can ever have. If SSAS exceed the hard memory limit, the system will aggressively kill active sessions in order to reduce memory usage. Sessions killed for this reason will receive an error about being cancelled due to memory pressure. In PowerPivot or BISM mode with a VertiPaqPagingPolicy other than 0, it is also the limit for the maximum working set of the process. If HardMemoryLimit is set to 0, it will use a default value of midway between the high memory limit and total physical memory (or total virtual address space, if you’re on a 32 bit machine where physical memory exceeds virtual memory).
<p><b>TotalMemoryLimit</b> is the high memory limit – it is used to calculate the memory price for normal cache eviction and memory cleaning. If it exceeds the total memory limit, the memory manager will evict all cached data which is not currently in use. TotalMemoryLimit must always be less than HardMemoryLimit.
<p><b>LowMemoryLimit</b> is the point where the system will begin cleaning memory out of caches. As memory usage increases above the low memory limit, SSAS get more aggressive about evicting cached data until it hits the high/total memory limit, at which point it evicts everything that isn’t pinned.
<p>Since cache eviction decisions and hard limit enforcement are normally done based on the process’s total memory usage, it has been necessary to change that calculation when allowing databases to exceed physical memory. Therefore, when VertiPaqPagingPolicy is nonzero, indicating that memory can grow beyond total physical, the system will track the total memory used by VertiPaq as a separate quantity (as reported in the MemoryVertiPaq* counters), and if the total memory used by VertiPaq exceeds the VertiPaqMemoryLimit, the memory used by VertiPaq in excess of the limit will be ignored for the purposes of determining what to evict.</p>
<p>
For example, say VertiPaqMemoryLimit is 100GB, LowMemoryLimit is 110GB, and TotalMemoryLimit is 120GB.&nbsp; Now assume that VertiPaq data structures are using 210GB of memory, and the process’ total memory usage is 215GB.&nbsp; This number is well above the TotalMemoryLimit (and probably above HardMemoryLimit), so ignoring VertiPaqMemoryLimit, the cleaning would be very aggressively, and would kill sessions. However, when PagingPolicy is 1, the memory used by VertiPaq in excess of the limit is ignored for the purpose of computing memory pressure. This means that the number used is &lt;Total Memory&gt; (215GB) – &lt;Total VertiPaq Memory&gt;(210GB) + &lt;VertiPaqMemoryLimit&gt;(100GB) = 105GB, which is below the LowMemoryLimit, so the cache is not cleaned at all.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=kv8uTtRx5XE:aXH6NZTdkJ8:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=kv8uTtRx5XE:aXH6NZTdkJ8:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=kv8uTtRx5XE:aXH6NZTdkJ8:WJiHqkZLMAk"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?i=kv8uTtRx5XE:aXH6NZTdkJ8:WJiHqkZLMAk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/sqlbi_blog/~4/kv8uTtRx5XE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.sqlbi.com/articles/memory-settings-in-tabular-instances-of-analysis-services/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.sqlbi.com/articles/memory-settings-in-tabular-instances-of-analysis-services/</feedburner:origLink></item>
		<item>
		<title>Optimizing High Cardinality Columns in VertiPaq</title>
		<link>http://feedproxy.google.com/~r/sqlbi_blog/~3/pcif0NBg4Z8/</link>
		<comments>http://www.sqlbi.com/articles/optimizing-high-cardinality-columns-in-vertipaq/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 06:45:00 +0000</pubDate>
		<dc:creator>Marco Russo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[PowerPivot]]></category>
		<category><![CDATA[Tabular]]></category>
		<category><![CDATA[VertiPaq]]></category>

		<guid isPermaLink="false">http://www.sqlbi.com/?p=737</guid>
		<description><![CDATA[VertiPaq is the internal column-based database engine used by PowerPivot and BISM Tabular models. High cardinality columns might be the&#8230; <a href="http://www.sqlbi.com/articles/optimizing-high-cardinality-columns-in-vertipaq/" class="more-link">Learn More &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>VertiPaq is the internal column-based database engine used by PowerPivot and BISM Tabular models. High cardinality columns might be the more expensive parts of a table. If you cannot remove a high cardinality column from a table, by using the techniques described in this article you might optimize its storage saving up to more than 90% of original space.</p>
<p><span id="more-737"></span>
<p>Because of its nature, in VertiPaq every table is stored by column instead than by row. For each column it creates a dictionary of distinct values and a sort of bitmap index that references the dictionary. Such a bitmap index can be highly compressed and high cardinality columns might have a dictionary cost that represents more than 90% of the cost for the column. You can save this space by splitting the column in two or more columns with a smaller number of distinct values. The resulting dictionary will be a fraction of the original one and you will be able to obtain the original value by combining the two values. You can use different techniques depending on the data type of the column you want to reduce. The following examples are based on a hypothetical SQL Server data source, you can easily adapt them to other data sources but you have to remember that these transformations have to be applied before data is imported by VertiPaq: creating calculated columns in a table would not save the space required by the original column.</p>
<p><strong>WARNING:</strong> By splitting a column in more column, you lose the ability to obtain a DISTINCTCOUNT calculation over a single column. You can obtain the same result by using COUNTROWS(SUMMARIZE(table,col1,col2,&#8230;,colN)) but performance can be much slower and there will be a high memory pressure at query time, especially when a large number of rows is involved in calculation.</p>
<h1>DATETIME Columns</h1>
<p>If you have a DATETIME column containing a sort of timestamp of an event (i.e. both date and time), it is more efficient to split it into two columns, one for the date and one for the time. You might use DATE and TIME data type in SQL Server, but in VertiPaq you will use always the same Date data type. The date column will have always the same time and the time column will have always the same Date. In this way you will have a maximum number of rows for date which is 365 multiplied by the number of years stored, and a maximum number of rows for time that depends on time granularity (for example, you have 86,400 seconds per day).
<p>For example, in SQL Server you might have a EventLog DATETIME column extracted in this way:
<pre class="brush: sql; title: ; toolbar: false; notranslate">SELECT
    EventLog,
    Action,
    Value
FROM Log</pre>
<p>You can split the EventLog column into two columns in this way:
<pre class="brush: sql; highlight: [2,3]; title: ; toolbar: false; notranslate">SELECT
    CAST( EventLog AS DATE ) AS EventDate,
    CAST( EventLog AS TIME( 0 ) ) AS EventTime,
    Action,
    Value
FROM Log</pre>
<p>This is an important optimization for any column of DATETIME data type. You can avoid importing the TIME column in VertiPaq in case you are not interested in this part. In the sample query above, the EventTime is rounded down to the second, so that you will not have more than 86,400 distinct values in that column.<br />
<h1>INT and BIGINT Columns</h1>
<p>A 32-bit or 64-bit integer value in SQL Server is stored in VertiPaq by using the same 64-bit integer format. The size used by VertiPaq is not relevant, because the actual storage cost is related to the dictionary, which depends on the number of distinct values. If you have a column with a large number of distinct values in an integer column, you can split it into two or more columns by using a smaller range of values. A typical scenario for this need is the transaction ID in a fact table, which might be required in order to retrieve a single transaction. For example, imagine you have a table with a TransactionID using all the numbers from 1 to 100,000,000. You have 100 millions of distinct values in the TransactionID dictionary.
<pre class="brush: sql; highlight: [2]; title: ; toolbar: false; notranslate">SELECT
    TransactionID,
    Quantity,
    Price
FROM Fact</pre>
<p>You can split this number into two number ranging from 1 to 10,000.
<pre class="brush: sql; highlight: [2,3]; title: ; toolbar: false; notranslate">SELECT
    TransactionID / 10000 AS TransactionHighID,
    TransactionID % 10000 AS TransactionLowID,
    Quantity,
    Price
FROM Fact</pre>
<p>As you can see, the TransactionID is no longer imported and you will be able to obtain it by using the following DAX measure:
<pre class="brush: dax; title: ; toolbar: false; notranslate">Fact[TransactionID] :=
IFERROR(
    VALUES( Fact[TransactionHighID] ) * 10000 + VALUES( Fact[TransactionLowID] ),
    BLANK()
)</pre>
<p>In case you want to apply a filter over a table, you have to split the TransactionID you are looking for into two parts (just replace &lt;TransactionID&gt; with the actual value in the following syntax):
<pre class="brush: dax; highlight: [3,4]; title: ; toolbar: false; notranslate">CALCULATETABLE(
    Fact,
    Fact[TransactionHighID] = INT( @TransactionID / 10000 ),
    Fact[TransactionLowID] = MOD( @TransactionID, 10000 )
)</pre>
<p>The original cost of the TransactionID column for 100 million values is near to 3GB in VertiPaq, whereas the splitted version requires less than 200MB. This is more than 90% saving!
<p>It is possible to further reduce this number by splitting the original column in more than two columns. For example, you can obtain the 3 column version by using this SQL syntax:
<pre class="brush: sql; highlight: [2,3,4]; title: ; toolbar: false; notranslate">SELECT
    TransactionID / 1000000 AS TransactionHighID,
    (TransactionID / 1000) % 1000 AS TransactionMidID,
    TransactionID % 1000 AS TransactionLowID,
    Quantity,
    Price
FROM Fact</pre>
<p>And you can also go further with the same technique, up to one column per each digit. We tested the split up to 8 columns and it is interesting to consider no only the storage cost, but also the processing time. The following table shows the results measure on a 8-core server.<br />
<table border="0" cellspacing="0" cellpadding="2" width="435">
<tbody>
<tr>
<td valign="top" width="158"><strong>Number of Columns</strong></td>
<p><strong><br />
</strong>
<td valign="top" width="107"><strong>Process Time</strong></td>
<p><strong><br />
</strong>
<td valign="top" width="88"><strong>Cores Used</strong></td>
<td valign="top" width="80"><strong>Disk Size</strong></td>
</tr>
<tr>
<td valign="top" width="158">1 (original)</td>
<td valign="top" width="107">02:48</td>
<td valign="top" width="88">1</td>
<td valign="top" width="80">
<p align="right">2,811 MB</p>
</td>
</tr>
<tr>
<td valign="top" width="158">2</td>
<td valign="top" width="107">03:21</td>
<td valign="top" width="88">up to 8</td>
<td valign="top" width="80">
<p align="right">191 MB</p>
</td>
</tr>
<tr>
<td valign="top" width="158">3</td>
<td valign="top" width="107">03:49</td>
<td valign="top" width="88">up to 8</td>
<td valign="top" width="80">
<p align="right">129 MB</p>
</td>
</tr>
<tr>
<td valign="top" width="158">4</td>
<td valign="top" width="107">04:01</td>
<td valign="top" width="88">up to 8</td>
<td valign="top" width="80">
<p align="right">97 MB</p>
</td>
</tr>
<tr>
<td valign="top" width="158">8</td>
<td valign="top" width="107">05:32</td>
<td valign="top" width="88">up to 8</td>
<td valign="top" width="80">
<p align="right">105 MB</p>
</td>
</tr>
</tbody>
</table>
<p>As you can see, the processing of a single column is a single-thread operation. Processing the multiple-column version requires more resources and only having multiple cores you will see similar execution times. With a smaller number of available cores, processing would have required longer processing times. However, more complex evaluations should be done in case partitioning is involved and this is out of the scope for this article.<br />
<h1>String Columns</h1>
<p>You can split a string columns by using the same technique you have seen for INT and BIGINT columns. The only difference is that strings have to be split by using string functions. This is a useful technique in case the TransactionID in your fact table contains non-numeric characters. For example, you can split an alphanumeric TransactionID Column with fixed 10-character length in this way:
<pre class="brush: sql; highlight: [2,3]; title: ; toolbar: false; notranslate">SELECT
    LEFT( TransactionID, 5 ) AS TransactionHighID,
    SUBSTRING( TransactionID, 6, LEN( TransactionID ) - 5 ) AS TransactionLowID,
    Quantity,
    Price
FROM Fact
</pre>
<p>The split algorithm for a string column should consider the distribution of the final result, in order to populate two smaller dictionaries in order to obtain a good space saving.<br />
<h1>Conclusion</h1>
<p>High cardinality columns in PowerPivot and BISM Tabular models can be particularly expensive. The best practice is to remove them from the model, especially when these columns are not relevant for data analysis, such as a GUID or timestamp of a SQL Server table. However, whenever the information they contain is required, you can optimize these columns by splitting the value in two or more columns with a smaller number of distinct values. This will require some more effort when accessing the column value, but the saving can be so high in large tables that it could definitely worth the effort.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=pcif0NBg4Z8:0yxxbdHbnGk:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=pcif0NBg4Z8:0yxxbdHbnGk:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=pcif0NBg4Z8:0yxxbdHbnGk:WJiHqkZLMAk"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?i=pcif0NBg4Z8:0yxxbdHbnGk:WJiHqkZLMAk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/sqlbi_blog/~4/pcif0NBg4Z8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.sqlbi.com/articles/optimizing-high-cardinality-columns-in-vertipaq/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.sqlbi.com/articles/optimizing-high-cardinality-columns-in-vertipaq/</feedburner:origLink></item>
		<item>
		<title>Budget and Other Data at Different Granularities in PowerPivot</title>
		<link>http://feedproxy.google.com/~r/sqlbi_blog/~3/qlFFdZ7A2ns/</link>
		<comments>http://www.sqlbi.com/articles/budget-and-other-data-at-different-granularities-in-powerpivot/#comments</comments>
		<pubDate>Sat, 07 Jan 2012 12:02:23 +0000</pubDate>
		<dc:creator>Marco Russo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[BISM]]></category>
		<category><![CDATA[DAX]]></category>
		<category><![CDATA[PowerPivot]]></category>
		<category><![CDATA[Tabular]]></category>

		<guid isPermaLink="false">http://www.sqlbi.com/?p=710</guid>
		<description><![CDATA[PowerPivot and BISM Tabular handle relationships between tables by using a single column that requires to be a unique column&#8230; <a href="http://www.sqlbi.com/articles/budget-and-other-data-at-different-granularities-in-powerpivot/" class="more-link">Learn More &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>PowerPivot and BISM Tabular handle relationships between tables by using a single column that requires to be a unique column in the lookup table. In this article you will see how to handle relationships between tables that have different granularities, which is a typical scenario whenever you have a monthly budget table and a daily level sales table.</p>
<p><span id="more-710"></span>
<p>The starting point is a <a href="http://www.sqlbi.com/wp-content/uploads/Budget-PowerPivot2.zip">PowerPivot workbook</a> containing sales divided by Product, Date and Territory (it is an extraction of AdventureWorks sample database). The budget will be assigned to Product Category, Year and Month. Thus, you do not have the same granularity between Sales and Budget, because Territory is not present in budget and both Dates and Product tables are referenced at different granularities.</p>
<p>This is the <strong>Product</strong> table.</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/image2.png"><img style="display: inline; background-image: none;" title="Product table" border="0" alt="Product table" src="http://www.sqlbi.com/wp-content/uploads/image_thumb2.png" width="888" height="158"></a></p>
<p>This is the <strong>Dates</strong> table.</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/image3.png"><img style="display: inline; background-image: none;" title="Dates table" border="0" alt="Dates table" src="http://www.sqlbi.com/wp-content/uploads/image_thumb3.png" width="611" height="216"></a></p>
<p>This is the <strong>Territory</strong> table.</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/image4.png"><img style="display: inline; background-image: none;" title="Territory table" border="0" alt="Territory table" src="http://www.sqlbi.com/wp-content/uploads/image_thumb4.png" width="498" height="268"></a></p>
<p>And this is the <strong>Sales</strong> table.</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/image5.png"><img style="display: inline; background-image: none;" title="Sales table" border="0" alt="Sales table" src="http://www.sqlbi.com/wp-content/uploads/image_thumb5.png" width="474" height="213"></a></p>
<p>As you can see, every row in the Sales table is related to a product to a single date, other than to a single territory. You can define the following relationships.</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/SNAGHTMLb0b444.png"><img style="margin: 0px; display: inline; background-image: none;" title="SNAGHTMLb0b444" border="0" alt="SNAGHTMLb0b444" src="http://www.sqlbi.com/wp-content/uploads/SNAGHTMLb0b444_thumb.png" width="244" height="160"></a></p>
<p>After you created the required relationships, you can browse data in a PivotTable. For instance, you can see OrderQuantity value grouped by Product Category and Month in the following screenshot.</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/image6.png"><img style="display: inline; background-image: none;" title="OrderQuantity by Product Category and Month" border="0" alt="OrderQuantity by Product Category and Month" src="http://www.sqlbi.com/wp-content/uploads/image_thumb6.png" width="278" height="565"></a></p>
<p>In an Excel table you define the <strong>Budget</strong> table with Category, Year, Month and Budget columns (the Budget value will be compared to OrderQuantity).</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/image7.png"><img style="display: inline; background-image: none;" title="Budget table" border="0" alt="Budget table" src="http://www.sqlbi.com/wp-content/uploads/image_thumb7.png" width="287" height="504"></a></p>
<p>At this point you cannot create a relationship between Budget and Dates or Product tables. The identity key in Dates table is the Date and in Product table is the Product, whereas you have a budget defined at an higher hierarchical level. But we do not have the notion of hierarchies in PowerPivot and BISM Tabular. An Analysis Services developer at this point might argue that a Multidimensional model can define relationships between fact and dimensions at different granularities, but this feature is not available here because the notion of attribute relationships inside a dimension simply doesn’t exist in Tabular. Thus, we have to rely on some DAX measure in order to make the comparison between OrderQuantity and Budget in a the same PivotTable, possibly without the need of modifying the data we already imported. In fact, a common solution is to create a table derived by Sales that group data by Product Category, Year and Month, resulting in a table that has the same granularity of the Budget one. However, this solution would require some form of ETL and is less flexible than the one proposed in this article.</p>
<p>If you consider the existing data, you might create a relationship between Budget[Category] and Product[Category] columns. This is not physically possible, but we will use this “virtual” relationship later in some DAX expression. A similar “virtual” relationship exists between Budget and Dates table, by using Year and Month columns. In order to to simplify the following DAX expression, it is better having a single column that allows a “virtual” relationship between these tables. You can create the YearMonth column in the Dates table by using the following DAX expression.</p>
<p>
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; toolbar: false; notranslate">=Dates[Year]*100+Dates[MonthNumber]</pre>
</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/image8.png"><img style="display: inline; background-image: none;" title="YearMonth definition in Dates table" border="0" alt="YearMonth definition in Dates table" src="http://www.sqlbi.com/wp-content/uploads/image_thumb8.png" width="766" height="111"></a></p>
<p>And then you can create the YearMonth column in the Budget table in this way.</p>
<p>
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; toolbar: false; notranslate">=Budget[Year]*100+Budget[Month]</pre>
</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/image9.png"><img style="display: inline; background-image: none;" title="image" border="0" alt="image" src="http://www.sqlbi.com/wp-content/uploads/image_thumb9.png" width="487" height="116"></a></p>
<p>Now you can write the <strong>BudgetCalc</strong> measure by using the following DAX expression.</p>
<p>
<pre class="brush: dax; title: ; toolbar: false; notranslate">BudgetCalc :=
CALCULATE(
    SUM( Budget[Budget] ),
    FILTER( ALL( Budget[YearMonth] ),
            COUNTROWS( FILTER( VALUES( Dates[YearMonth] ), Dates[YearMonth] = Budget[YearMonth] ) ) &gt; 0 ),
    FILTER( ALL( Budget[Category] ),
            COUNTROWS( FILTER( VALUES( Product[category] ), Product[Category] = Budget[Category] ) ) &gt; 0 )
)</pre>
</p>
<p>The CALCULATE statement calculates the sum of Budget values for all the corresponding rows in Budget table that have the same Year, Month and Product Category. Year and Month selection are handled by using the single YearMonth column, which simplify the writing of the required filter condition. In the new version of PowerPivot (SQL Server 2012) and in BISM Tabular, you can use a simpler syntax by using the new CONTAINS functions, such as in the following example.</p>
<p>
<pre class="brush: dax; title: ; toolbar: false; notranslate">BudgetCalc :=
CALCULATE(
    SUM( Budget[Budget] ),
    FILTER( ALL( Budget[YearMonth] ),
            CONTAINS( VALUES( Dates[YearMonth] ), Dates[YearMonth], Budget[YearMonth] ) ),
    FILTER( ALL( Budget[Category] ),
            CONTAINS( VALUES( Product[Category] ), Product[Category], Budget[Category] ) )
)
</pre>
</p>
<p>At this point you can add the BudgetCalc measure to the previous PivotTable obtaining a comparison between OrderQuantity and Budget values.</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/image10.png"><img style="display: inline; background-image: none;" title="BudgetCalc result browsed in a PivotTable" border="0" alt="BudgetCalc result browsed in a PivotTable" src="http://www.sqlbi.com/wp-content/uploads/image_thumb10.png" width="354" height="565"></a></p>
<p>However, you have some issue if you drill down data at the Subcategory level: the BudgetCalc is repeated for each Subcategory and this might misinterpreted by reading the result.</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/image11.png"><img style="display: inline; background-image: none;" title="image" border="0" alt="image" src="http://www.sqlbi.com/wp-content/uploads/image_thumb11.png" width="383" height="236"></a></p>
<p>The same problem arises if you try to drilldown data by any Territory attribute (such as Region, Country, or Group), and by Day or Date attributes of the Dates table.</p>
<p>In order to avoid this issue, it would be better to display a BLANK value for budget whenever you are navigating the PivotTable at a detail level that is not represented in the Budget table. In a Multidimensional model (formerly known as UDM in previous versions of Analysis Services), the IgnoreUnrelatedDimensions property helps you in obtaining this behavior. In order to do that in a Tabular model, you can create an <strong>IsBudgetValid</strong> measure that will return TRUE if the current selection in the PivotTable has a valid correspondent budget definition, FALSE otherwise.</p>
<p>
<pre class="brush: dax; title: ; toolbar: false; notranslate">IsBudgetValid :=
(
    COUNTROWS( Sales )
=
    CALCULATE( COUNTROWS( Sales ),
               ALL( Sales ),
               VALUES( Dates[YearMonth] ),
               VALUES( Product[Category] )
    )
)
</pre>
<p>The IsBudgetValid measure checks that the number of rows that are active in the Sales table corresponds to the number of rows that would be active by removing from the filter context any column that is not part of the “virtual” relationship with the Budget table. Because we have just Category and YearMonth “virtual” relationships from Budget to Product and Dates table, the CALCULATE used to make the comparison replaces the filter context on Sales by removing the existing one using ALL and then inserting the filters on Category and YearMonth by using the VALUES function. In order to maintain this formula, you just have to insert a VALUES call for every column used as “virtual” relationship in the Budget table, without worrying about other changes in other part of the model.</p>
<p>As you can see in the following screenshot, the IsBudgetValid measure is FALSE for every row containing a wrong budget measure.</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/image12.png"><img style="display: inline; background-image: none;" title="image" border="0" alt="image" src="http://www.sqlbi.com/wp-content/uploads/image_thumb12.png" width="477" height="239"></a></p>
<p>As a side note, a much more complex alternative is to count the active rows for every dimension (every table but Sales), but this approach would require much more work to maintain the measure if other tables was added to the model.</p>
<p>
<pre class="brush: dax; title: ; toolbar: false; notranslate">IsBudgetValidAlternative :=
(
    COUNTROWS( Product )
    = CALCULATE( COUNTROWS( Product ),
                 ALLEXCEPT( Product, Product[Category] ) )
) &amp;&amp; (
    COUNTROWS( Dates )
    = CALCULATE( COUNTROWS( Dates ),
                 VALUES( Dates[YearMonth] ),
                 ALL( Dates ) )
) &amp;&amp; (
    COUNTROWS( Territory )
    = COUNTROWS( ALL( Territory ) )
) </pre>
</p>
<p>The IsBudgetValidAlternative measure checks that the number of rows that are active in every table of the PowerPivot workbook corresponds to the number of rows in the same table that corresponds to the granularity of the Budget table in the same filter context. For example, the number of rows in Product table is calculated in a CALCULATE function that alters the filter context in the Product table by using the ALLEXCEPT function, removing any filter other than the Category column, which is the only reference to the Product table from the Budget table. Such a number is compared with the number of rows active in Product obtained by calling a simple COUNTROWS. The same process has to be repeated for every table related to Sales, with a difference for Dates table caused by a particular handling of such table by PowerPivot engine. Finally, all unrelated tables should be tested by comparing the number of rows in the filter context with the number of rows in the whole table. As you can understand, this approach is much more expensive to write and to maintain.</p>
<p>Finally, you can create the <strong>Budget</strong> measure that shows the BudgetCalc value only when IsBudgetValid returns TRUE.</p>
<p>
<pre class="brush: dax; gutter: false; title: ; toolbar: false; notranslate">Budget :=
IF( [IsBudgetValid], [BudgetCalc], BLANK() )</pre>
</p>
<p>The following screenshot show the Budget value browser by Month, Category and Model: as you can see, at the Model level the Budget is blank because the comparison cannot be made.</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/image13.png"><img style="display: inline; background-image: none;" title="image" border="0" alt="image" src="http://www.sqlbi.com/wp-content/uploads/image_thumb13.png" width="388" height="298"></a></p>
<p>In this article you have seen how to create DAX measures that compare data stored in tables at different granularities. You should use regular relationships in the PowerPivot or BISM Tabular model for the table with the higher granularity, and you can use DAX measures in order to display values from other tables that store data at an higher level of granularity. In the DAX expression you can define condition that produces the same effect of a “virtual” relationship defined using columns in a table that are not unique.</p>
<p><strong>Acknowledgment</strong>: I want to thank Gerhard Brückl for inspiring me in writing this article.</p>
<p>Download demo file: <a href="http://www.sqlbi.com/wp-content/uploads/Budget-PowerPivot2.zip">Budget-PowerPivot2.zip</a></p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=qlFFdZ7A2ns:4n3WTUhgBpk:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=qlFFdZ7A2ns:4n3WTUhgBpk:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=qlFFdZ7A2ns:4n3WTUhgBpk:WJiHqkZLMAk"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?i=qlFFdZ7A2ns:4n3WTUhgBpk:WJiHqkZLMAk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/sqlbi_blog/~4/qlFFdZ7A2ns" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.sqlbi.com/articles/budget-and-other-data-at-different-granularities-in-powerpivot/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.sqlbi.com/articles/budget-and-other-data-at-different-granularities-in-powerpivot/</feedburner:origLink></item>
		<item>
		<title>Execute DAX queries through OLE DB and ADOMD.NET</title>
		<link>http://feedproxy.google.com/~r/sqlbi_blog/~3/OX5Vg2nOJ9M/</link>
		<comments>http://www.sqlbi.com/articles/execute-dax-queries-through-ole-db-and-adomd-net/#comments</comments>
		<pubDate>Thu, 05 Jan 2012 13:05:43 +0000</pubDate>
		<dc:creator>Marco Russo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[BISM]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[DAX]]></category>
		<category><![CDATA[MDX]]></category>
		<category><![CDATA[Tabular]]></category>

		<guid isPermaLink="false">http://www.sqlbi.com/?p=664</guid>
		<description><![CDATA[If you want to execute a DAX query from your code to a Tabular model, you might think that an&#8230; <a href="http://www.sqlbi.com/articles/execute-dax-queries-through-ole-db-and-adomd-net/" class="more-link">Learn More &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>If you want to execute a DAX query from your code to a Tabular model, you might think that an OLE DB connection should be the right way to get your data. It works, but if you your code is written in .NET a better solution is using the ADOMD.NET library. In this article you will see why ADOMD.NET is better in this scenario.</p>
<p><span id="more-664"></span>
<p>The OLE DB MSOLAP provider can be used to send MDX, DAX and DMX queries to an Analysis Services database. The result can be only a scalar or a table, so you cannot use an MDX query returning results on more than two axes. When you use a DAX query, the result will always fit well in a table, so you can iterate a DataReader to get the result. The following is a simple program you can write in C# to read the result of a DAX query (in this example the DAX query is sent to the Adventure Works DW Tabular model, you can replace the query with any valid DAX query for your model).</p>
<p>
<pre class="brush: cpp; title: ; toolbar: false; notranslate">using System;
using System.Data.OleDb;

class Program {
    static void Main(string[] args) {
        string connectionString =
@&quot;Provider=MSOLAP;Data Source=localhost;Catalog=AdventureWorks Tabular Model SQL 2012&quot;;
        string queryString = @&quot;
EVALUATE
CALCULATETABLE(
    VALUES('Product Subcategory'[Product Subcategory Name]),
    'Product Category'[Product Category Name] = &quot;&quot;Bikes&quot;&quot; )
&quot;;
        using (var connection = new OleDbConnection(connectionString)) {
            connection.Open();
            using (var command = new OleDbCommand(queryString, connection)) {
                using (var reader = command.ExecuteReader()) {
                    while (reader.Read()) {
                        Console.WriteLine(reader[0]);
                    }
                }
            }
        }

    }
}
</pre>
</p>
<p>The result returned in the DataReader is correctly typed (which is important if you have to manipulate the value of a number returned from a query without worrying about number formatting). The issue with OLE DB MSOLAP provider is related to the parameters. In fact, if you want to pass a parameter to the previous query, you will get an error. For example, consider the following example, in which the modified/added lines have been highlighted.</p>
<p>
<pre class="brush: cpp; highlight: [12,17]; title: ; toolbar: false; notranslate">using System;
using System.Data.OleDb;

class Program {
    static void Main() {
        string connectionString =
@&quot;Provider=MSOLAP;Data Source=localhost;Catalog=AdventureWorks Tabular Model SQL 2012&quot;;
        string queryString = @&quot;
EVALUATE
CALCULATETABLE(
    VALUES('Product Subcategory'[Product Subcategory Name]),
    'Product Category'[Product Category Name] = @Category )
&quot;;
        using (var connection = new OleDbConnection(connectionString)) {
            connection.Open();
            using (var command = new OleDbCommand(queryString, connection)) {
                command.Parameters.AddWithValue(&quot;@Category&quot;, &quot;Bikes&quot;);
                using (var reader = command.ExecuteReader()) {
                    while (reader.Read()) {
                        Console.WriteLine(reader[0]);
                    }
                }
            }
        }
    }
}
</pre>
</p>
<p>As you can see, the constant “Bikes” have been replaced by the @Category parameter in the DAX query and the AddWithValue function called on command.Parameters add the corresponding parameter to the OleDbCommand object. However, if you execute this code you get an error as a result.</p>
<p>
<pre class="brush: xml; gutter: false; title: ; toolbar: false; notranslate">Unhandled Exception: System.Data.OleDb.OleDbException: The following system error occurred:  The parameter is incorrect.
Error Code = 0x80070057, External Code = 0x00000000:.
   at System.Data.OleDb.OleDbCommand.ProcessResults(OleDbHResult hr)
   at System.Data.OleDb.OleDbCommand.ApplyParameterBindings(ICommandWithParameters commandWithParameters, tagDBPARAMBINDINFO[] bindInfo)
   at System.Data.OleDb.OleDbCommand.CreateAccessor()
   at System.Data.OleDb.OleDbCommand.InitializeCommand(CommandBehavior behavior, Boolean throwifnotsupported)
   at System.Data.OleDb.OleDbCommand.ExecuteCommand(CommandBehavior behavior, Object&amp;amp; executeResult)
   at System.Data.OleDb.OleDbCommand.ExecuteReaderInternal(CommandBehavior behavior, String method)
   at System.Data.OleDb.OleDbCommand.ExecuteReader(CommandBehavior behavior)
   at System.Data.OleDb.OleDbCommand.ExecuteReader()
   at Program.Main() in C:DevsrcQueryTabularProgram.cs:line 17</pre>
</p>
<p>The problem is related to a problem in System.Data.OleDb assembly in .NET, as described in a <a href="https://connect.microsoft.com/SQLServer/feedback/details/251601/run-mdx-query-with-parameters-using-oledb">Connect bug</a> that has never been fixed despite the “Fixed” state of the issue. The only possible workaround is using ADOMD.NET instead of OLE DB. It might seem a strange approach, because MD means MultiDimensional and a query in DAX is all but Multidimensional. However, because ADOMD.NET allows you to put parameters in an AdomdCommand and to read the result in a DataReader, you can easily translate the previous example in the following one, which uses ADOMD.NET (which is a separate download, look for it in the <a href="http://it.bing.com/search?q=sql+server+feature+pack">more recent SQL Server Feature Pack</a>) instead of OLE DB.</p>
<p>
<pre class="brush: cpp; highlight: [2,14,17,18]; title: ; toolbar: false; notranslate">using System;
using Microsoft.AnalysisServices.AdomdClient;

class Program {
    static void Main(string[] args) {
        string connectionString =
@&quot;Provider=MSOLAP;Data Source=localhost;Catalog=AdventureWorks Tabular Model SQL 2012&quot;;
        string queryString = @&quot;
EVALUATE
CALCULATETABLE(
    VALUES('Product Subcategory'[Product Subcategory Name]),
    'Product Category'[Product Category Name] = @Category )
&quot;;
        AdomdConnection connection = new AdomdConnection();
        connection.ConnectionString = connectionString;
        connection.Open();
        AdomdCommand cmd = new AdomdCommand(queryString);
        cmd.Parameters.Add(&quot;Category&quot;, &quot;Bikes&quot;);
        cmd.Connection = connection;
        using (var reader = cmd.ExecuteReader()) {
            while (reader.Read()) {
                Console.WriteLine(reader[0]);
            }
        }
    }
}</pre>
</p>
<p>In conclusion, you should always consider using ADOMD.NET whenever you want to query an Analysis Services database from your own .NET code, regardless your query is written in DAX, MDX od DMX.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=OX5Vg2nOJ9M:C2nWOmQI7RA:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=OX5Vg2nOJ9M:C2nWOmQI7RA:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=OX5Vg2nOJ9M:C2nWOmQI7RA:WJiHqkZLMAk"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?i=OX5Vg2nOJ9M:C2nWOmQI7RA:WJiHqkZLMAk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/sqlbi_blog/~4/OX5Vg2nOJ9M" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.sqlbi.com/articles/execute-dax-queries-through-ole-db-and-adomd-net/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.sqlbi.com/articles/execute-dax-queries-through-ole-db-and-adomd-net/</feedburner:origLink></item>
		<item>
		<title>Understanding Circular Dependencies in Tabular and PowerPivot</title>
		<link>http://feedproxy.google.com/~r/sqlbi_blog/~3/YfrEfEkAPqQ/</link>
		<comments>http://www.sqlbi.com/articles/understanding-circular-dependencies/#comments</comments>
		<pubDate>Tue, 03 Jan 2012 08:00:00 +0000</pubDate>
		<dc:creator>Alberto Ferrari</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[BISM]]></category>
		<category><![CDATA[DAX]]></category>
		<category><![CDATA[Tabular]]></category>

		<guid isPermaLink="false">http://www.sqlbi.com/articles/understanding-circular-dependencies/</guid>
		<description><![CDATA[When you design a data model for Tabular you should pay attention to a though topic, which is that of&#8230; <a href="http://www.sqlbi.com/articles/understanding-circular-dependencies/" class="more-link">Learn More &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>When you design a data model for Tabular you should pay attention to a though topic, which is that of circular dependencies in formulas. It is very important to learn how to handle circular dependencies now because in SQL 2012 (and in PowerPivot 2012) there will be a stronger validation of circular dependencies. Some of the checks described in this article have been introduced with the release RC0 of SQL Server 2012 and were not present before. <span id="more-630"></span>Thus, you might author formulas in previous versions of SQL 2012 (or PowerPivot) and end up with a non-working solution when the final bits will be available, due to these additional (and perfectly understandable) constraints.<br />
Before to start speaking about circular dependencies, it is worth to introduce simple, linear dependencies. Let us look at an example with the following calculated column on the classical AdventureWorks DimProduct table</p>
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; notranslate"> DimProduct[Profit] := DimProduct[ListPrice] - DimProduct[StandardCost] </pre>
<p>The formula depends on two columns. In such a case, we say that the column Profit depends from ListPrice and StandardCost. You might then create a new column, like ProfitPct with the formula:</p>
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; notranslate"> DimProduct[ProfitPct] := DimProduct[Profit] / DimProduct[ListPrice] </pre>
<p>Now, ProfitPct depends from Profit and ListPrice. Thus, when Analysis Services will need to compute the calculated columns, it knows that ProfitPct will need to be computed only after Profit has been calculated and stored. Otherwise, it will not be able to recover a valid value for the formula.</p>
<p>Linear dependency is not something you should normally worry about. It is used internally by the SSAS engine to detect the correct order of computation of calculated columns during the processing of the database. On a normal Tabular data model, with many calculated columns, the dependency of calculations turns into a complex graph which, again, SSAS handles gracefully.</p>
<p>Circular dependency, on the other hand, is a situation that happens when a loop appears in this graph. For example, a clear situation where circular dependency appears is if you try to modify the definition of Profit to this formula:</p>
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; notranslate"> DimProduct[Profit] := DimProduct[ProfitPct] * DimProduct[StandardCost] </pre>
<p>Because ProfitPct depends on Profit and, in this new formula, Profit depends on ProfitPct, SSDT refuses to modify the formula and shows the error “<em>A circular dependency was detected</em>”</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/F11PV13.png"><img style="border: 0px currentColor; display: inline; background-image: none;" title="Circular Dependency Error" src="http://www.sqlbi.com/wp-content/uploads/F11PV13_thumb.png" alt="Circular Dependency Error" width="570" height="109" border="0" /></a></p>
<p>All this describes what circular dependencies are from the point of view of columns, i.e. you have detected the existence of a dependency looking at the expression, without paying attention to the table structure. Nevertheless, there is a more subtle and complex type of dependency which is introduced by the usage of CALCULATE and/or filters inside any expression. Let us see the topic with an example, starting from a subset of columns of DimProduct in AdventureWorks:</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/F11PV14.png"><img style="border: 0px currentColor; display: inline; background-image: none;" title="Starting data model" src="http://www.sqlbi.com/wp-content/uploads/F11PV14_thumb.png" alt="Starting data model" width="423" height="194" border="0" /></a></p>
<p>We are interested in understanding the dependency list for a new calculated column which makes use of the CALCULATE function, like the following one:</p>
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; notranslate"> SumOfListPrice := CALCULATE (SUM (DimProduct[ListPrice])) </pre>
<p>At first glance, it may seem that the column depends from ListPrice only, as this is the only column used in the formula. Nevertheless, by using CALCULATE, we are asking to transform the current row context into a filter context and this modifies the dependency list. If we expand the meaning of the CALCULATE call, the formula really says:</p>
<blockquote><p>Sum the value of ListPrice for all the row in the DimProduct table which have the same value for ProductKey, ProductAlternateKey, StandardCost and ListPrice.</p></blockquote>
<p>Reading the formula in this way, it is now clear that the formula depends on all of the columns of DimProduct because, the newly introduced filter context will filter all the columns of the table, thus depending on their value. Nevertheless, creating the column, everything goes fine and you get this nice result</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/F11PV15.png"><img style="border: 0px currentColor; display: inline; background-image: none;" title="Data model with first column defined" src="http://www.sqlbi.com/wp-content/uploads/F11PV15_thumb.png" alt="Data model with first column defined" width="583" height="160" border="0" /></a></p>
<p>Now, we might try to define a new calculated column, using the very same formula, in the same table. Thus, we try to add NewSumOfListPrice with the following formula, which is identical to the previous one.</p>
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; notranslate"> NewSumOfListPrice := CALCULATE (SUM (DimProduct[ListPrice])) </pre>
<p>Surprisingly, SSDT refuses to create this new formula and returns an error</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/F11PV16.png"><img style="border: 0px currentColor; display: inline; background-image: none;" title="Circular dependency error" src="http://www.sqlbi.com/wp-content/uploads/F11PV16_thumb.png" alt="Circular dependency error" width="608" height="116" border="0" /></a></p>
<p>Analysis Services has detected a circular dependency in the formula, which was not detected before. And, because we did not change anything in the formula, the error seems a very strange one. Something has changed indeed, and it is the number of columns in the table. If we were able to add the NewSumOfListPrice to the table, we would reach a situation where the two formulas have these meanings:</p>
<ul>
<li><strong>SumOfListPrice</strong>: “Sum the value of ListPrice for all the row in the DimProduct table which have the same value for ProductKey, ProductAlternateKey, StandardCost, ListPrice and <strong>NewSumOfListPrice</strong>”.</li>
<li><strong>NewSumOfListPrice</strong>: “Sum the value of ListPrice for all the row in the DimProduct table which have the same value for ProductKey, ProductAlternateKey, StandardCost, ListPrice and <strong>SumOfListPrice</strong>”.</li>
</ul>
<p>Having added the calculated column, these columns become part of the filter context introduced by CALCULATE and, as a consequence, they are part of the dependency list. Reading the previous definition, it is clear that there is a circular dependency between the two formulas and this is the reason why Analysis Services refuses to allow us to create the NewSumOfListPrice column.</p>
<p>Understanding this error is not very easy. But, on the other hand, finding a solution is pretty straightforward, even if not very intuitive. The problem is that any calculated column containing CALCULATE (or a call to any measure, which adds an automatic CALCULATE) creates a dependency from all of the columns of the table.</p>
<p>The scenario would be different if the table had a row identifier (a primary key, in SQL terms). If the table has a column which acts as a row identifier, then all columns containing a CALCULATE could depend from the row identifier, thus reducing their dependency list to a single column which, by the way, is not a calculated one. Guess what? Tabular is smart enough to behave this way!</p>
<p>In the DimProduct table there is such a column: it is ProductKey. In order to mark the ProductKey as a row identifier you have two options:</p>
<ol>
<li>You can create a relationship from any table into DimProduct using ProductKey as the destination column. Performing this operation will ensure that ProductKey is a unique value for DimProduct.</li>
<li>You can manually set the property of Row Identifier for ProductKey to TRUE using the Properties window inside SSDT or the corresponding feature in PowerPivot</li>
</ol>
<p>One of these operations will make the engine learn that the table has a row identifier and, in such a scenario, you will be able to define the NewSumOfListPrice column avoiding circular dependency.</p>
<p>As a final advice, it is always a good idea to set the Row Identifier property of a table if such a row exists in the data model, because the Vertipaq engine will make use of this information to optimize all calculations. Nevertheless, row identifiers occupy space in the data model and their memory usage is pretty high because they have the maximum number of distinct values (a different value for each row). Thus, if a row identifier is not needed inside a table (for example for fact tables), it is always a good idea to avoid loading it inside the Tabular data model. Then, in case you face the circular dependency issue, it might be necessary to add the row identifier column to the table, so that the problem is solved.</p>
<p>There are other techniques to avoid circular dependencies. For example, using ALLEXCEPT to remove the calculated columns from the set of columns that become part of the dependency list is a viable option, but it makes all formulas more complicated. On the other hand, using ALLEXCEPT might be useful for very big tables, where the addition of a row identifier would cause memory footprint to grow too much.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=YfrEfEkAPqQ:ahs_5ZJuhpQ:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=YfrEfEkAPqQ:ahs_5ZJuhpQ:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=YfrEfEkAPqQ:ahs_5ZJuhpQ:WJiHqkZLMAk"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?i=YfrEfEkAPqQ:ahs_5ZJuhpQ:WJiHqkZLMAk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/sqlbi_blog/~4/YfrEfEkAPqQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.sqlbi.com/articles/understanding-circular-dependencies/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.sqlbi.com/articles/understanding-circular-dependencies/</feedburner:origLink></item>
		<item>
		<title>From SQL to DAX: String Comparison</title>
		<link>http://feedproxy.google.com/~r/sqlbi_blog/~3/xIF1ObcMKH8/</link>
		<comments>http://www.sqlbi.com/articles/from-sql-to-dax-string-comparison/#comments</comments>
		<pubDate>Thu, 29 Dec 2011 06:34:22 +0000</pubDate>
		<dc:creator>Marco Russo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[DAX]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.sqlbi.com/?p=588</guid>
		<description><![CDATA[In DAX string comparison requires you more attention than in SQL, for several reasons: DAX doesn’t offer the same set&#8230; <a href="http://www.sqlbi.com/articles/from-sql-to-dax-string-comparison/" class="more-link">Learn More &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>In DAX string comparison requires you more attention than in SQL, for several reasons: DAX doesn’t offer the same set of features you have in SQL, a few text comparison functions in DAX are only case-sensitive and others only case-insensitive, and performance might have a relevant impact in your query according to the comparison technique you use.</p>
<p><span id="more-588"></span>
<p>If you compare two strings by using common operators (=, &lt;&gt;, &lt;, &gt;, &lt;=, &gt;=) the string comparison can be case-insensitive, according to the collation setting of your Analysis Services instance (for PowerPivot <a href="http://powerpivotgeek.com/2010/03/24/collation-localization-and-globalization/">it depends on the workbook</a>). However, functions like FIND and SUBSTITUTE are always case-sensitive, whereas SEARCH is always case-insensitive. FIND can be 10% faster than SEARCH just because it is case-sensitive.</p>
<p>One issue you might have if you come from SQL is the different operator you have to perform somewhat similar to a regular expression. In fact, the LIKE operators in SQL has a similar correspondent function in DAX: the <a href="http://technet.microsoft.com/en-us/library/ee634235(SQL.110).aspx">SEARCH</a> function has a slightly different semantic because it returns the position found instead of a True/False value. Moreover, it has different wildcard characters, as you will see later in this article.</p>
<p>For example, consider the following syntax in SQL:</p>
<p>
<pre class="brush: sql; gutter: false; title: ; notranslate">Name LIKE '%SQLBI%'</pre>
</p>
<p>In Tabular and PowerPivot v2 (for SQL Server 2012) you can write the following equivalent syntax:</p>
<p>
<pre class="brush: dax; gutter: false; title: ; notranslate">SEARCH( &quot;SQLBI&quot;, Table[Name], 1, 0 ) &gt; 0</pre>
</p>
<p>The condition above returns 0 if SQLBI is not found in the Name column of Table. The search starts at first character (the third parameter) and in case of no match you get 0 as a result (defined by the fourth parameter). If you are using PowerPivot for SQL Server 2008 R2 (or PowerPivot v1), you have to wrap SEARCH into an IFERROR because the fourth parameter have been introduced only in SQL Server 2012 release and an error is thrown in PowerPivot version 1.</p>
<p>
<pre class="brush: dax; gutter: false; title: ; notranslate">IFERROR( SEARCH( &quot;*SQLBI*&quot;, Table[Name], 1 ), 0 ) &gt; 0</pre>
</p>
<p>Unfortunately, the presence of IFERROR has a big impact on performance and you can observe a 4x slower execution times with this syntax. Thus, avoid IFERROR if possible.</p>
<p>In case you need to define a more complex filter, you have to adapt to the different wildcard characters. Instead of using % and _ you would use in the SQL LIKE operator, you have to use * and ? in the DAX SEARCH function.</p>
<p>For example, consider the following condition in SQL:</p>
<p>
<pre class="brush: sql; gutter: false; title: ; notranslate">Name LIKE '%SQLBI%Methodology%at%work%'</pre>
</p>
<p>The correspondent syntax in DAX is:</p>
<p>
<pre class="brush: dax; gutter: false; title: ; notranslate">SEARCH( &quot;SQLBI*Methodology*at*work&quot;, Table[Name], 1, 0 ) &gt; 0</pre>
</p>
<p>However, if you change the SQL condition to:</p>
<p>
<pre class="brush: sql; gutter: false; title: ; notranslate">Name LIKE '%SQLBI%Methodology%at%work'</pre>
</p>
<p>you do not have an equivalent syntax in DAX, because you cannot check that the string ends with “work” based only on the return value of the SEARCH call. Moreover, if you want to improve performance, you should avoid SEARCH whenever possible. For example, using LEFT instead of SEARCH in order to check whether a string begins with a particular text might improve performance of a 5x-10x factor. Here are a few hints in order to translate LIKE in the best pattern.</p>
<table border="0" cellspacing="0" cellpadding="2" width="630">
<tbody>
<tr>
<td valign="top" width="246"><strong>SQL</strong></td>
<p><strong><br /></strong></p>
<td valign="top" width="382"><strong>DAX</strong></td>
</p>
</tr>
<tr>
<td valign="top" width="246">
<pre class="brush: sql; class-name: noborder; gutter: false; title: ; notranslate">Name LIKE 'SQLBI'</pre>
</td>
<td valign="top" width="382">
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; notranslate">Table[Name] = &quot;SQLBI&quot;</pre>
</td>
</tr>
<tr>
<td valign="top" width="246">
<pre class="brush: sql; class-name: noborder; gutter: false; title: ; notranslate">Name LIKE 'SQLBI%'</pre>
</td>
<td valign="top" width="382">
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; notranslate">LEFT( Table[Name], 5 ) = &quot;SQLBI&quot;</pre>
</td>
</tr>
<tr>
<td valign="top" width="246">
<pre class="brush: sql; class-name: noborder; gutter: false; title: ; notranslate">Name LIKE '%SQLBI'</pre>
</td>
<td valign="top" width="382">
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; notranslate">RIGHT( Table[Name], 5 ) = &quot;SQLBI&quot;</pre>
</td>
</tr>
<tr>
<td valign="top" width="246">
<pre class="brush: sql; class-name: noborder; gutter: false; title: ; notranslate">Name LIKE '%SQLBI%'</pre>
</td>
<td valign="top" width="382">
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; notranslate">SEARCH( &quot;SQLBI&quot;, Table[Name], 1, 0 ) &gt; 0</pre>
<p>
for PowerPivot v1:</p>
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; notranslate">IFERROR( SEARCH( &quot;SQLBI&quot;, Table[Name], 1 ), 0 ) &gt; 0</pre>
</td>
</tr>
<tr>
<td valign="top" width="246">
<pre class="brush: sql; class-name: noborder; gutter: false; title: ; notranslate">Name LIKE 'SQLBI%Methodology'</pre>
</td>
<td valign="top" width="382">
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; notranslate">LEFT( Table[Name], 5 ) = &quot;SQLBI&quot;
&amp;&amp; RIGHT( Table[Name], 11 ) = &quot;Methodology&quot;</pre>
</p>
</td>
</tr>
<tr>
<td valign="top" width="246">
<pre class="brush: sql; class-name: noborder; gutter: false; title: ; notranslate">Name LIKE 'SQLBI%Methodology%'</pre>
</td>
<td valign="top" width="382">
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; notranslate">LEFT( Table[Name], 5 ) = &quot;SQLBI&quot;
&amp;&amp; SEARCH( &quot;Methodology&quot;, Table[Name], 1, 0 ) &gt; 0</pre>
</p>
</td>
</tr>
<tr>
<td valign="top" width="246">
<pre class="brush: sql; class-name: noborder; gutter: false; title: ; notranslate">Name LIKE '%SQLBI%Methodology%'</pre>
</td>
<td valign="top" width="382">
<pre class="brush: dax; class-name: noborder; gutter: false; title: ; notranslate">SEARCH( &quot;SQLBI*Methodology&quot;, Table[Name], 1, 0 ) &gt; 0</pre>
</td>
</tr>
</tbody>
</table>
<p>In conclusion, in DAX it is better to avoid the use of SEARCH unless you really need to search a pattern using wildcards, and it is highly suggested to avoid using IFERROR for performance reasons (you pay the penalty for every error raised – using IFERROR when the errors are really rare is fine).</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=xIF1ObcMKH8:DPvzBckpwnI:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=xIF1ObcMKH8:DPvzBckpwnI:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=xIF1ObcMKH8:DPvzBckpwnI:WJiHqkZLMAk"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?i=xIF1ObcMKH8:DPvzBckpwnI:WJiHqkZLMAk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/sqlbi_blog/~4/xIF1ObcMKH8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.sqlbi.com/articles/from-sql-to-dax-string-comparison/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.sqlbi.com/articles/from-sql-to-dax-string-comparison/</feedburner:origLink></item>
		<item>
		<title>From SQL to DAX: Projection</title>
		<link>http://feedproxy.google.com/~r/sqlbi_blog/~3/CMLiV2O-sOI/</link>
		<comments>http://www.sqlbi.com/articles/from-sql-to-dax-projection/#comments</comments>
		<pubDate>Tue, 27 Dec 2011 17:33:00 +0000</pubDate>
		<dc:creator>Marco Russo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[DAX]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.sqlbi.com/?p=581</guid>
		<description><![CDATA[Using DAX as a query language you will lack a fundamental feature like projection. In fact, DAX allows you add&#8230; <a href="http://www.sqlbi.com/articles/from-sql-to-dax-projection/" class="more-link">Learn More &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>Using DAX as a query language you will lack a fundamental feature like projection. In fact, DAX allows you add columns to an existing table, but you cannot simply remove some column from a table. In order to do that, you have to use SUMMARIZE, or you can use ADDCOLUMNS starting from a column that you want to include in the query output and that has to be unique for each row of the result.</p>
<p><span id="more-581"></span>
<p>The solution based on SUMMARIZE is better from a performance point of view, but it is interesting also to look at a possible solution based on ADDCOLUMNS for educational purposes.</p>
<p>Let’s start with the classical SELECT in SQL:</p>
<pre class="brush: sql; title: ; toolbar: false; notranslate">SELECT *
FROM Product</pre>
<p>It corresponds to this DAX query:</p>
<pre class="brush: dax; title: ; toolbar: false; notranslate">EVALUATE Product</pre>
<p>Now, a common projection consists in selecting just a few columns from the source table. For example, the following SQL query only gets 3 rows from Product table.</p>
<pre class="brush: sql; title: ; toolbar: false; notranslate">SELECT [Product Id], [Product Name], [List Price]
FROM Product</pre>
<p>In DAX you can obtain the same result in two ways. You can leverage on the uniqueness of the Product Id column, which uniquely identify any row in the Product table, and you can query such column adding the other columns by using the ADDCOLUMN function. In order to retrieve the corresponding value for each product, you can use CALCULATE and VALUES (so that you will get an error in case Product Id wouldn’t be a unique key of the table).</p>
<pre class="brush: dax; title: ; toolbar: false; notranslate">EVALUATE
ADDCOLUMNS(
    DISTINCT( Product[Product Id] ),
    &quot;Product Name&quot;, CALCULATE( VALUES( Product[Product Name] ) ),
    &quot;List Price&quot;, CALCULATE( VALUES( Product[List Price] ) )
)
</pre>
<p>The technique based on ADDCOLUMNS allows you to rename the columns after the first one. However, from the point of view of performance and easiness of read, a better solution is the one based on SUMMARIZE, even if you lose the ability to rename columns in this way.</p>
<pre class="brush: dax; title: ; toolbar: false; notranslate">EVALUATE
SUMMARIZE(
    Product,
    Product[Product Id],
    Product[Product Name],
    Product[List Price]
)
</pre>
<p>Technically, the solution based on SUMMARIZE is equivalent to a SELECT DISTINCT statement like the following one:
<pre class="brush: sql; title: ; toolbar: false; notranslate">SELECT DISTINCT [Product Id], [Product Name], [List Price]
FROM Product
</pre>
<p>In general, I always suggest using SUMMARIZE in order to perform a projection over a table, because it is easier to read and it has better performance than other options you have in DAX.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=CMLiV2O-sOI:uPmGd7ysKWQ:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=CMLiV2O-sOI:uPmGd7ysKWQ:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=CMLiV2O-sOI:uPmGd7ysKWQ:WJiHqkZLMAk"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?i=CMLiV2O-sOI:uPmGd7ysKWQ:WJiHqkZLMAk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/sqlbi_blog/~4/CMLiV2O-sOI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.sqlbi.com/articles/from-sql-to-dax-projection/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.sqlbi.com/articles/from-sql-to-dax-projection/</feedburner:origLink></item>
		<item>
		<title>Ratio Over Subtotals with Normalized Tables in DAX</title>
		<link>http://feedproxy.google.com/~r/sqlbi_blog/~3/OE7z4-nl1t8/</link>
		<comments>http://www.sqlbi.com/articles/ratio-over-subtotals-with-normalized-tables-in-dax/#comments</comments>
		<pubDate>Mon, 26 Dec 2011 06:57:41 +0000</pubDate>
		<dc:creator>Marco Russo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[DAX BISM]]></category>

		<guid isPermaLink="false">http://www.sqlbi.com/?p=559</guid>
		<description><![CDATA[Calculating a ratio in DAX is relatively simple in case the underlying data model is a star schema, but you&#8230; <a href="http://www.sqlbi.com/articles/ratio-over-subtotals-with-normalized-tables-in-dax/" class="more-link">Learn More &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>Calculating a ratio in DAX is relatively simple in case the underlying data model is a star schema, but you have to consider additional complexities whenever you have a more normalized model, even just a snowflake schema. In this article, you will see how to perform the right computation.</p>
<p><span id="more-559"></span>
<p>Consider the following requirement: in Adventure Works DW, you need to query all the products belonging to a selection of product models and, for each product, you want to see the category it belongs to and the ratios of Sales Amount over the corresponding product model and product category. The model is a column of the Product Table (Model Name), whereas the category is a column of a related table (Product Category, related to Product through an intermediate Product Sub-Category table).</p>
<p>The initial DAX query can be the following one, that displays Product Name, Model Name and Category Name for each product of model Sport-100 and Road-650, which belong to Bikes and Accessories categories, respectively:</p>
<p>
<pre class="brush: dax; highlight: [9,10,11]; title: ; toolbar: false; notranslate">EVALUATE
CALCULATETABLE(
    SUMMARIZE( 'Internet Sales',
               Product[Product Name],
               Product[Model Name],
               'Product Category'[Product Category Name],
               &quot;Product Amount&quot;, SUM( 'Internet Sales'[Sales Amount] )
    ),
    FILTER( Product,
            Product[Model Name] = &quot;Sport-100&quot;
            || Product[Model Name] = &quot;Road-650&quot; )
)
ORDER BY Product[Product Name]
</pre>
</p>
<p>In order to calculate the ratio of each product over the corresponding model and category, you need to obtain the value of Sales Amount for the model and the category in each row. The Model Amount is pretty straightforward, because it can be calculated by using the ALLEXCEPT function, which keeps just the filter on Model Name over the Product table. By filtering the Product table, you override the existing filter over products and obtain the required value.</p>
<p>
<pre class="brush: dax; title: ; toolbar: false; notranslate">MEASURE Product[Model Amount]
    = CALCULATE( SUM( 'Internet Sales'[Sales Amount] ),
                 ALLEXCEPT( Product, Product[Model Name] ) )
</pre>
</p>
<p>You cannot use the same approach for the category. The Product Category Name is a column of a the Product Category table, which is related to Product through the Product Sub-Category table. Because ALLEXCEPT can operate on just one table, we need to filter all the products belonging to the same Category in another way. The first approach is by using a FILTER condition that returns all those Product that belongs to the same category by using a CONTAINS condition.</p>
<p>
<pre class="brush: mdx; highlight: [5,6,7,8]; title: ; toolbar: false; notranslate">MEASURE Product[Category Amount]
    = CALCULATE(
          SUM( 'Internet Sales'[Sales Amount] ),
          ALL( 'Internet Sales' ),
          FILTER( ALL( Product ),
                  CONTAINS( RELATEDTABLE( 'Product Category' ),
                            'Product Category'[Product Category Name],
                            VALUES( 'Product Category'[Product Category Name] ) ) ) )</pre>
</p>
<p>
The ALL( &#8216;Internet Sales&#8217; ) is required in order to consider all the products of the same Category even if they belongs to other models (remember that there is a filter over Model Name in the initial query). The measure above produces the expected result, but it has two strong requirements: first, it requires an iterative FILTER operation over all the products; second, it requires that Product Category Name column of Product Category is part of the columns grouped by the SUMMARIZE function in the initial query. In order to avoid both constraints, a better approach can be used, even if it is less intuitive. </p>
<p>Instead of filtering the Product table with an explicit filter, you can leverage on filter contexts that are automatically propagated through relationships by using a function that returns just the Product Category that you need to filter. For example, you might define this measure by using VALUES over Product Category Name:</p>
<p>
<pre class="brush: dax; highlight: [5]; title: ; toolbar: false; notranslate">MEASURE Product[Category Amount]
    = CALCULATE(
          SUM( 'Internet Sales'[Sales Amount] ),
          ALL( 'Internet Sales' ),
          VALUES( 'Product Category'[Product Category Name] ) )</pre>
</p>
<p>Such a definition will work our initial query that contains the Product Category Name in its results, as you can see in the following example.</p>
<p>
<pre class="brush: dax; title: ; toolbar: false; notranslate">DEFINE
    MEASURE Product[Model Amount]
        = CALCULATE(
              SUM( 'Internet Sales'[Sales Amount] ),
              ALLEXCEPT( Product, Product[Model Name] ) )
    MEASURE Product[Category Amount]
        = CALCULATE(
              SUM( 'Internet Sales'[Sales Amount] ),
              ALL( 'Internet Sales' ),
              FILTER( ALL( Product ),
                      CONTAINS( RELATEDTABLE( 'Product Category' ),
                                'Product Category'[Product Category Name],
                                 VALUES( 'Product Category'[Product Category Name] ) ) ) )
EVALUATE
    CALCULATETABLE(
        SUMMARIZE(
            'Internet Sales',
            Product[Product Name],
            Product[Model Name],
            'Product Category'[Product Category Name],
            &quot;Product Amount&quot;, SUM( 'Internet Sales'[Sales Amount] ),
            &quot;Model Amount&quot;, [Model Amount],
            &quot;Category Amount&quot;, [Category Amount]
        ),
        FILTER( Product,
                Product[Model Name] = &quot;Sport-100&quot;
                || Product[Model Name] = &quot;Road-650&quot; )
    )
ORDER BY Product[Product Name]
</pre>
</p>
<p>The query above produces the following result:</p>
<table border="0" cellspacing="0" cellpadding="2" width="500">
<tbody>
<tr>
<td valign="top" width="108">Product[Product Name]</td>
<td valign="top" width="24">Product[Model Name]</td>
<td valign="top" width="66">Product Category[Product Category Name]</td>
<td valign="top" width="66">[Product Amount] </td>
<td valign="top" width="66">[Model Amount]</td>
<td valign="top" width="101">[Category Amount]</td>
</tr>
<tr>
<td valign="top" width="108">Road-650 Black, 44</td>
<td valign="top" width="24">Road-650</td>
<td valign="top" width="66">Bikes    </td>
<td valign="top" width="66">47566.6422</td>
<td valign="top" width="66">645379.5038</td>
<td valign="top" width="101">28318144.6507</td>
</tr>
<tr>
<td valign="top" width="108">Road-650 Black, 48</td>
<td valign="top" width="24">Road-650</td>
<td valign="top" width="66">Bikes</td>
<td valign="top" width="66">45553.2394 </td>
<td valign="top" width="66">645379.5038</td>
<td valign="top" width="101">28318144.6507</td>
</tr>
<tr>
<td valign="top" width="108">…</td>
<td valign="top" width="24">&nbsp;</td>
<td valign="top" width="66">&nbsp;</td>
<td valign="top" width="66">&nbsp;</td>
<td valign="top" width="66">645379.5038    </td>
<td valign="top" width="101">28318144.6507</td>
</tr>
<tr>
<td valign="top" width="108">Road-650 Red, 62</td>
<td valign="top" width="24">Road-650    </td>
<td valign="top" width="66">Bikes    </td>
<td valign="top" width="66">57381.9812    </td>
<td valign="top" width="66">645379.5038    </td>
<td valign="top" width="101">28318144.6507</td>
</tr>
<tr>
<td valign="top" width="108">Sport-100 Helmet, Black    </td>
<td valign="top" width="24">Sport-100    </td>
<td valign="top" width="66">Accessories    </td>
<td valign="top" width="66">72954.15</td>
<td valign="top" width="66">225335.6</td>
<td valign="top" width="101">700759.96</td>
</tr>
<tr>
<td valign="top" width="108">Sport-100 Helmet, Blue    </td>
<td valign="top" width="24">Sport-100    </td>
<td valign="top" width="66">Accessories    </td>
<td valign="top" width="66">74353.75</td>
<td valign="top" width="66">225335.6</td>
<td valign="top" width="101">700759.96</td>
</tr>
<tr>
<td valign="top" width="108">Sport-100 Helmet, Red    </td>
<td valign="top" width="24">Sport-100    </td>
<td valign="top" width="66">Accessories    </td>
<td valign="top" width="66">78027.7</td>
<td valign="top" width="66">225335.6</td>
<td valign="top" width="101">700759.96</td>
</tr>
</tbody>
</table>
<p>However, if you remove the Product Category Name column from the SUMMARIZE output, the Category Amount measure definition would raise an error because it would try to perform a CONTAINS operation over all the product categories that are included in products filtered by the query, whereas the CONTAINS requires just one value to operate.</p>
<p>Because you need to obtain the Product Category from the “current” product for every line, you can use another SUMMARIZE call, which uses the Product table as a starting point to group data and returns just the Product Category Name as a result. Because this measure will be evaluated for every row of the complete DAX query, only the categories that are referenced by products considered in the output row will be considered, and if the output row contains just one product, you are sure that only its corresponding categories will be used. </p>
<p>
<pre class="brush: dax; highlight: [5]; title: ; toolbar: false; notranslate">MEASURE Product[Category Amount]
    = CALCULATE(
          SUM( 'Internet Sales'[Sales Amount] ),
          ALL( 'Internet Sales' ),
          SUMMARIZE( Product, 'Product Category'[Product Category Name] ) )</pre>
</p>
<p>
With the correct Category Amount you can now calculate the corresponding ratios, as you can see in the following final query.</p>
<p>
<pre class="brush: dax; highlight: [21,22]; title: ; toolbar: false; notranslate">DEFINE
    MEASURE Product[Product Amount] = SUM( 'Internet Sales'[Sales Amount] )
    MEASURE Product[Model Amount]
        = CALCULATE(
              SUM( 'Internet Sales'[Sales Amount] ),
              ALLEXCEPT( Product, Product[Model Name] ) )
     MEASURE Product[Category Amount]
        = CALCULATE(
              SUM( 'Internet Sales'[Sales Amount] ),
              ALL( 'Internet Sales' ),
              SUMMARIZE( Product, 'Product Category'[Product Category Name] ) )
EVALUATE
    CALCULATETABLE(
        SUMMARIZE(
            'Internet Sales',
            Product[Product Name],
            Product[Model Name],
            &quot;Product Amount&quot;, [Product Amount],
            &quot;Model Amount&quot;, [Model Amount],
            &quot;Category Amount&quot;, [Category Amount],
            &quot;Ratio Model&quot;, [Product Amount] / [Model Amount],
            &quot;Ratio Category&quot;, [Product Amount] / [Category Amount]
        ),
        FILTER( Product,
                Product[Model Name] = &quot;Sport-100&quot;
                || Product[Model Name] = &quot;Road-650&quot; )
    )
ORDER BY Product[Product Name]
</pre>
</p>
<p>This is the result produced.</p>
<table border="0" cellspacing="0" cellpadding="2" width="500">
<tbody>
<tr>
<td valign="top" width="90">Product[Product Name]</td>
<td valign="top" width="80">Product[Model Name]</td>
<td valign="top" width="57">[Product Amount] </td>
<td valign="top" width="57">[Model Amount]</td>
<td valign="top" width="57">[Category Amount]</td>
<td valign="top" width="57">[Ratio Model]</td>
<td valign="top" width="57">[Ratio Category]</td>
</tr>
<tr>
<td valign="top" width="57">Road-650 Black, 44</td>
<td valign="top" width="57">Road-650</td>
<td valign="top" width="57">47566.6422</td>
<td valign="top" width="57">645379.5038</td>
<td valign="top" width="57">28318144.6507</td>
<td valign="top" width="57">0.0737</td>
<td valign="top" width="57">0.0017</td>
</tr>
<tr>
<td valign="top" width="57">Road-650 Black, 48</td>
<td valign="top" width="57">Road-650</td>
<td valign="top" width="57">45553.2394 </td>
<td valign="top" width="57">645379.5038</td>
<td valign="top" width="57">28318144.6507</td>
<td valign="top" width="57">0.0706</td>
<td valign="top" width="57">0.0016</td>
</tr>
<tr>
<td valign="top" width="57">…</td>
<td valign="top" width="57">&nbsp;</td>
<td valign="top" width="57">&nbsp;</td>
<td valign="top" width="57">645379.5038 </td>
<td valign="top" width="57">28318144.6507</td>
<td valign="top" width="57">&nbsp;</td>
<td valign="top" width="57">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="57">Road-650 Red, 62</td>
<td valign="top" width="57">Road-650 </td>
<td valign="top" width="57">57381.9812 </td>
<td valign="top" width="57">645379.5038 </td>
<td valign="top" width="57">28318144.6507</td>
<td valign="top" width="57">0.0889</td>
<td valign="top" width="57">0.0020</td>
</tr>
<tr>
<td valign="top" width="57">Sport-100 Helmet, Black </td>
<td valign="top" width="57">Sport-100 </td>
<td valign="top" width="57">72954.15</td>
<td valign="top" width="57">225335.6</td>
<td valign="top" width="57">700759.96</td>
<td valign="top" width="57">0.3237</td>
<td valign="top" width="57">0.1041</td>
</tr>
<tr>
<td valign="top" width="57">Sport-100 Helmet, Blue </td>
<td valign="top" width="57">Sport-100 </td>
<td valign="top" width="57">74353.75</td>
<td valign="top" width="57">225335.6</td>
<td valign="top" width="57">700759.96</td>
<td valign="top" width="57">0.3300</td>
<td valign="top" width="57">0.1061</td>
</tr>
<tr>
<td valign="top" width="57">Sport-100 Helmet, Red </td>
<td valign="top" width="57">Sport-100</td>
<td valign="top" width="57">78027.7</td>
<td valign="top" width="57">225335.6</td>
<td valign="top" width="57">700759.96</td>
<td valign="top" width="57">0.3463</td>
<td valign="top" width="57">0.1113</td>
</tr>
</tbody>
</table>
<p>It is interesting to note that in DAX you have to write different queries according to the underlying data model in case you want to obtain values of a group for which an item belongs to. This is not required when working with MDX, even with the same data model, as you can see in the following query.</p>
<pre class="brush: mdx; title: ; toolbar: false; notranslate">WITH
    MEMBER Measures.[Model Amount]
        AS ( Measures.[Internet Total Sales],
             [Product].[Product Name].[All] )
    MEMBER Measures.[Category Amount]
        AS ( Measures.[Internet Total Sales],
             [Product].[Product Name].[All],
             [Product].[Model Name].[All] )
    MEMBER Measures.[Ratio Model]
        AS Measures.[Internet Total Sales] / Measures.[Model Amount], FORMAT=&quot;Percent&quot;
    MEMBER Measures.[Ratio Category]
        AS Measures.[Internet Total Sales] / Measures.[Category Amount], FORMAT=&quot;Percent&quot;
SELECT
    { Measures.[Internet Total Sales],
      Measures.[Model Amount],
      Measures.[Category Amount],
      Measures.[Ratio Model],
      Measures.[Ratio Category] } ON 0,
    NonEmpty( [Product].[Product Name].[Product Name].MEMBERS
              * { [Product].[Model Name].[Sport-100], [Product].[Model Name].[Road-650] }
              * [Product Category].[Product Category Name].[Product Category Name],
              Measures.[Internet Total Sales] ) ON 1
FROM [Internet Sales]</pre>
<p>In MDX the calculation of Model Amount and Category Amount does not require different formulas and is not affected from the underlying data model. This is an important difference between these two languages. The flexibility of DAX, which allows more flexibility over calculation even when relationships are not defined in the data model, comes at the cost of requiring a greater knowledge about the data model in order to define the correct and more efficient DAX query.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=OE7z4-nl1t8:OiKthG1uE8I:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=OE7z4-nl1t8:OiKthG1uE8I:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=OE7z4-nl1t8:OiKthG1uE8I:WJiHqkZLMAk"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?i=OE7z4-nl1t8:OiKthG1uE8I:WJiHqkZLMAk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/sqlbi_blog/~4/OE7z4-nl1t8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.sqlbi.com/articles/ratio-over-subtotals-with-normalized-tables-in-dax/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.sqlbi.com/articles/ratio-over-subtotals-with-normalized-tables-in-dax/</feedburner:origLink></item>
		<item>
		<title>Sorting Dates Columns in PowerPivot v2 and BISM Tabular</title>
		<link>http://feedproxy.google.com/~r/sqlbi_blog/~3/UZ74TkEyJXY/</link>
		<comments>http://www.sqlbi.com/articles/sorting-dates-columns-in-powerpivot-v2-and-bism-tabular/#comments</comments>
		<pubDate>Thu, 08 Dec 2011 10:51:33 +0000</pubDate>
		<dc:creator>Marco Russo</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[BISM]]></category>
		<category><![CDATA[PowerPivot]]></category>

		<guid isPermaLink="false">http://www.sqlbi.com/?p=534</guid>
		<description><![CDATA[Sorting a column by using another column is one of the nice features of the new version of PowerPivot introduced&#8230; <a href="http://www.sqlbi.com/articles/sorting-dates-columns-in-powerpivot-v2-and-bism-tabular/" class="more-link">Learn More &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>Sorting a column by using another column is one of the nice features of the new version of PowerPivot introduced with SQL Server 2012 (we call it PowerPivot v2, but it really v11 because it is aligned with SQL Server). For example, this allows you to sort the Month name in the right order (January, February, …) instead than in alphabetical order. Usually, you should have a month number column that allows you to correctly sort the month name. However, sometime it is necessary to build a calculated column in order to provide the right sort, as you will see in this article.</p>
<p><span id="more-534"></span><br />
<h1>Sorting Dates Columns in PowerPivot</h1>
<p>First of all, we will use a Calendar table as an example, also because it is a very common scenario in which you will want to use this feature. In order to create a Calendar table you can use the <a href="https://datamarket.azure.com/dataset/1542c52d-d466-4094-a801-9ef8fd7c6485">DateStream</a> data source on Windows Azure Marketplace. In order to get just a range of years, you have to manually change the URL by using the <a href="http://www.bp-msbi.com/2011/10/range-queries-with-azure-datamarket-feeds/">URL/OData syntax described by Boyan Penev</a> in his blog.</p>
<p>Now, in order to sort the <strong>MonthName</strong>, you click the Sort by Column button in the Home ribbon of the PowerPivot window after you selected the column to sort. For example, in the following picture you can see how to sort the MonthName column by the MonthOfYear value.</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/SortByColumn-01.png"><img style="display: inline; background-image: none;" title="Sort by Column Dialog Box" border="0" alt="Sort by Column Dialog Box" src="http://www.sqlbi.com/wp-content/uploads/SortByColumn-01_thumb.png" width="665" height="527"></a></p>
<p>At this point, when you will browse your data with a PivotTable or Power View you will see the months sorted in the proper order. However, you can discover that it is not so easy to set the proper sort for the <strong>MonthInCalendar</strong> column, which contains 12 months for each year and has to be sorted by Year and Month Number.</p>
<p>The problem is that we do not have a corresponding column to use for the sorting. We might try to use:</p>
<ul>
<li><strong>DateKey</strong> that has a different value for each day. Thus, different rows for the same month and year have different values for DateKey and you get the error “Cannot sort MonthInCalendar by DateKey because at least one value in MonthInCalendar has multiple distinct values in DateKey. For example, you can sort [City] by [Region] because there is only one region for each city, but you cannot sort [Region] by [City] because there are multiple cities for each region”, as you can see in the following screenshot.</li>
</ul>
<p><a href="http://www.sqlbi.com/wp-content/uploads/SortByColumn-02.png"><img style="display: inline; background-image: none;" title="Sort by Column Error" border="0" alt="Sort by Column Error" src="http://www.sqlbi.com/wp-content/uploads/SortByColumn-02_thumb.png" width="706" height="559"></a></p>
<p><strong><a href="http://www.sqlbi.com/wp-content/uploads/SortByColumn-03.png"><img style="margin: 30px 150px 0px 50px; float: right; display: inline; background-image: none;" title="Wrong Sort by Column Result" border="0" alt="Wrong Sort by Column Result" align="right" src="http://www.sqlbi.com/wp-content/uploads/SortByColumn-03_thumb.png" width="314" height="331"></a></strong></p>
<ul>
<li><strong>MonthOfYear</strong> that has a total of 12 numbers corresponding to the months, but does not have the year. You do not get an error, but as a result you would see January month for all the years in your calendar, then all February months and so on, as you can see in the following picture.</li>
</ul>
<p>&nbsp;</p>
<ul>
<li><strong>YearKey</strong> that has a value for each year, but toes not have the month and produces as a result that months are sorted in alphabetical order within each year</li>
</ul>
<p>In order to solve this issue, we need a column that has the same granularity as the MonthInCalendar column (year and month). We can create a calculated column <strong>YearMonth</strong> with the formula [YearKey] * 100 + [MonthOfYear], as you can see in the following picture:</p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/SortByColumn-04.png"><img style="display: inline; background-image: none;" title="Define YearMonth Column" border="0" alt="Define YearMonth Column" src="http://www.sqlbi.com/wp-content/uploads/SortByColumn-04_thumb.png" width="585" height="139"></a></p>
<p><a href="http://www.sqlbi.com/wp-content/uploads/SortByColumn-05.png"><img style="margin: 0px 300px 0px 50px; float: right; display: inline; background-image: none;" title="Correct Sort Order by Year and Month" border="0" alt="Correct Sort Order by Year and Month" align="right" src="http://www.sqlbi.com/wp-content/uploads/SortByColumn-05_thumb.png" width="96" height="307"></a>Such a column has to be specified as the By column in the Sort by Column dialog box that you have seen before, producing as a result the expected sort order.</p>
<p>The same technique can be applied to other composite columns such as, for instance, <strong>QuarterInCalendar</strong>. These columns are important in case you want to display, for example, a chart where data are plotted month by month. Using a column such as MonthInCalendar is also useful when you define a hierarchy Year-Month-Day. In fact, you have to remember that in a report it could be not obvious what is the year to which a month belongs to, and having a column containing both month name and year could be fundamental in order to make the report easy to read.</p>
<h1>Sorting Dates Columns in BISM Tabular</h1>
<p>The Sort by Column problem that you have seen in PowerPivot is identical in BISM Tabular. The only difference is that the user interface for specifying the Sort by Column of an attribute is included in the Properties pane, but you can also use the same Sort by Column dialog box that you have seen in PowerPivot by selecting the <strong>Column / Sort / Sort by Column…</strong> menu item after you selected the column of which you want to control the sort order.</p>
<h1>Conclusions</h1>
<p>In this article you have seen how to control the sort order of a column in PowerPivot and BISM Tabular. The lesson learned is that you may have to create a calculated column in order to provide the sort order information with the same granularity of the column that you want to sort in a custom way.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=UZ74TkEyJXY:Mot3HV5NeAg:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=UZ74TkEyJXY:Mot3HV5NeAg:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/sqlbi_blog?a=UZ74TkEyJXY:Mot3HV5NeAg:WJiHqkZLMAk"><img src="http://feeds.feedburner.com/~ff/sqlbi_blog?i=UZ74TkEyJXY:Mot3HV5NeAg:WJiHqkZLMAk" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/sqlbi_blog/~4/UZ74TkEyJXY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.sqlbi.com/articles/sorting-dates-columns-in-powerpivot-v2-and-bism-tabular/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.sqlbi.com/articles/sorting-dates-columns-in-powerpivot-v2-and-bism-tabular/</feedburner:origLink></item>
	</channel>
</rss>

