I recently implemented Power BI models by extracting data from Azure Data Lake Storage (ADSL) Gen 2, and I wanted to share a lesson learned when using this connector.

In this project, I had tens of thousands of CSV files stored in the data lake and organized in different folders according to the Folder Path.

The user interface of Power Query provides the user with two options:

- File System View
- CDM Folder View

The CDM Folder View is not useful for plain CSV files. By providing the URL of the storage and choosing the File System View, you get access to all the files stored in the data lake regardless of their hierarchical structure. The interesting thing is that when you provide a path, you get the list of all the files included in any subfolder. The list is a plain list, and the hierarchical folder is simply an attribute in this result (Folder Path).

With Power Query you can apply filters to the list obtained by the File System View option, thus restricting the access to only those files (or a single file) you are interested in. However, there is no query folding of this filter. What happens is that every time you refresh the data source, the list of all these files is read by Power Query; the filters in M Query to the folder path and the file name are then applied to this list only client-side. This is very expensive because the entire list is also downloaded when the expression is initially evaluated just to get the structure of the result of the transformation.

A better way to manage the process is to specify in the URL the complete folder path to traverse the hierarchy, and get only the files that are interesting for the transformation – or the exact path of the file is you are able to do that. For example, the data lake I used had one file for each day, stored in a folder structure organized by yyyy\mm, so every folder holds up to 31 files (one month).

By specifying the folder for April 2020 (2020/04), I get only the 30 files in the list – instead of the tens of thousands obtained when specifying the root folder for the data lake.

In my project, I modified the transformation of a single CSV file into an M function, so the Sample File transformation was extracting a single file from the data lake; it did so by applying a filter to the list of thousands of file names read from the data lake.

This was the initial code to extract the 15.csv file from a certain folder of the data lake:

let Source = AzureStorage.DataLake("https://<storagename>.dfs.core.windows.net/ "), #"Filtered Rows" = Table.SelectRows( Source, each ( [Folder Path] = "https://<storagename>.dfs.core.windows.net/weblogs/2020/04/") and ([Name] = "15.csv")), Navigation1 = #"Filtered Rows"{0}[Content] in Navigation1

This is an improved version of the code, where the filter over the folder has been moved to the initial URL:

let Source = AzureStorage.DataLake( "https://<storagename>.dfs.core.windows.net/weblogs/2020/04/"), #"Filtered Rows" = Table.SelectRows(Source, each ([Name] = "15.csv")), Navigation1 = #"Filtered Rows"{0}[Content] in Navigation1

This way, we are still reading all the files of April 2020 in the data lake, corresponding to the final 2020/04 branch of the URL. This step alone solved the performance issue – but if you need a single file and you know its name, there is a better way to do that. In the connection string, specify the complete path including the filename.

This way, the list only has one file.

In my code, I changed the complete path including the filename in the argument of *AzureStorage.DataLake*:

let Source = AzureStorage.DataLake( "https://<storagename>.dfs.core.windows.net/weblogs/2020/05/01.csv" ), CsvContent = Source[Content] {0} in CsvContent

The performance difference is huge, especially when you use this transformation to create an M function in Power Query. The same code must be executed for every file, so reading three years of data requires calling the same function over 1,000 times. Reading a list of one file 1,000 times is much better than reading a list of thousands of files 1,000 times!

The lesson is that when using the *AzureStorage.DataLake* connector in Power Query you should always provide a path that restricts the files returned, or just one file if you know the filename.

This article explains how to show different customers with the same name in a Power BI report by using zero-width spaces, thus simplifying the presentation without adding visible characters to make the names unique.

Sometimes, we deal with cases where two or more entities of the same type have the same name. If we take the example of customers, the data model stores both the customer code and the customer name to correctly identify each customer. The customer code is what identifies each of the two customers with the same name as being unique.

The Contoso sample database contains many customers with the same name. For example, look at the following table: there are two customers named Jordan Allen. They have the same name but a different customer code. Indeed, they live in different states and their customer history is likely to be very different.

If we just leave the customer name in the table and remove all the columns that make them different, Power BI merges the two customers, showing misleading figures. Looking at the following matrix, the user may incorrectly conclude that the customer Jordan Allen purchased 810.94 USD worth of products. However, this never happened because that number is the sum of the purchases of two different customers with the same name.

The query generated by Power BI groups by customer name. If two customers share the same name, then they are shown as a single customer.

One option to solve the scenario is to append the customer code to the customer name, to guarantee that customer names are always unique. Using the customer code is an effective technique, but the result is unpleasant to the eye because we pollute the name column with the customer code.

There is another technique that we can use to make the names unique, while keeping the visualization of the name the same. We can use the zero-width space, which is a very special kind of space character in the Unicode set.

Zero-width spaces are special characters that affect how to print words used in some writing systems. In this article, we are not interested in their usage for printing, but instead in their unique characteristic of being invisible to the eye. You can learn more at https://en.wikipedia.org/wiki/Whitespace_character. Among the many zero-width spaces, we chose the zero-width non-joiner Unicode 8204.

The idea is the following: if we add a zero-width space at the end of the second Jordan Allen, it looks the same as the first Jordan Allen, but it is different because his name contains the hidden space at the end. What if we have more than two customers with the same name? Well, DAX is there for us. We can author a new calculated column for the customer name, where we append to the original name a suitable number of zero-width spaces to guarantee that – no matter what – the customer name is unique.

To create the column, we must identify the customers with the same name, rank them using the customer code, and then add spaces: zero spaces to the first customer, one space to the second customer, two spaces to the third customer, and so on. In general, once we have the ranking position of the customer, we add as many zero-width spaces as its ranking position minus one. The *Name Unique* calculated column implements this technique:

Name Unique = VAR CustomersWithSameName = CALCULATETABLE ( SUMMARIZE ( Customer, Customer[Customer Code], Customer[Name] ), ALLEXCEPT ( Customer, Customer[Name] ) ) VAR Ranking = RANKX ( CustomersWithSameName, Customer[Customer Code],, ASC, DENSE ) VAR Blanks = REPT ( UNICHAR ( 8204 ), Ranking - 1 ) VAR Result = Customer[Name] & Blanks RETURN Result

To human eyes, the result in the *Name Unique* column is identical to the original *Name* column. However, DAX sees the result differently because there are zero-width spaces at the end of the name of different customers with the same name. Indeed, the *Name Unique* column in the matrix produces the visualization below.

The highlighted customers with the same name are displayed in different rows, despite having the same name. The zero-width space appended to the customer name does not increase the visual length of the name itself. Indeed, the Contoso database contains 385 customers with no name. If we appended a regular space, the length of the last of those empty customers’ names would be enormous: 385 spaces. By using a zero-width space, the name still looks empty, even though the longest one has 385 characters.

To simplify the user’s browsing experience, it may be worth showing the *Name Unique* column, hiding the original *Name* column, and renaming them so that the user sees only a *Name* column containing the unique names. This way, the expected behavior of separating customers with the same name is the default behavior. That said, the original *Customer[Name]* column is still useful. For example, the original *Customer[Name]* column is necessary to check whether a customer name is empty or not. The *Customer[Name Unique]* column never contains an empty string, whereas *Customer[Name]* might contain an empty string.

This technique works just fine with most local cultures and languages. Be mindful that the zero-width space has a specific meaning in some languages. Therefore, it is worth checking that you can safely add a zero-width space at the end of a name in your specific culture without changing its visualization. However, if you can afford to use it, this technique opens up some exciting opportunities in Power BI that are not limited to just separating the names in a column. For example, this technique is useful to implement advanced sorting techniques in hierarchies, as we will show in a future article.

]]>DAX Studio is the best tool to analyze the performances of DAX queries. In this webinar recorded for powerbi.tips, Marco Russo shows how to collect metrics in DAX Studio and investigate on storage engine and formula engine cost of some sample queries.

]]>In this article, we share an idea to reduce the friction in using Power BI, by extending the DAX language with new visual calculations.

At SQLBI, we teach DAX. We have been teaching DAX to thousands of people all over the world. Both in-person and online, we have met an extremely diverse crowd of students: different cultures, ages, backgrounds, experiences, and requirements. If there is a single statement all our students would agree on, it is the following: learning DAX is much harder than one would think. Moreover, we think anybody – even non-SQLBI students – would agree on this: DAX is not simple. We feel the same way! It took both of us years of hard work and study before we felt we had reached proficiency in DAX.

This article is not a rant about DAX. At SQLBI, we love DAX. Instead, the topic of this article is an analysis of the reasons why DAX is hard to learn; this article searches for ways to make DAX easier. We conclude with a proposal: adding visual calculations to the language would greatly reduce the friction in using Power BI. Be prepared: this is a long article. Not hard to read, but quite long. There are many topics to cover, and we hope the article is just the beginning of an interesting conversation with our readers and with the Microsoft engineers.

Before diving into the reasons why DAX is hard, let us see if that applies to any programming language. If all programming languages were hard and DAX was no exception, there would be nothing to discover. Being hard would just be a standard.

It turns out that programming can be easy. Think about the Excel formula language: everybody uses Excel. Not everybody is a master of Excel, and yet they use the language at whatever level of complexity is required for their specific needs. You start using Excel with simple math, and then you discover ranges; maybe you delve into tables, arrays, and CSE formulas. Anyway, you do not need to master all the details of the language to build gorgeous reports. You start learning, and you stop when your needs are met. At every step on your Excel learning curve, you feel powerful; the sense of satisfaction for each of your small victories is huge.

Excel is somewhat unique in this sense. It relies on an extremely powerful language, which is also surprisingly easy to use. In our opinion, the secret ingredient of Excel is that the language is *visual*. In Excel, there is a direct connection between what you see on the screen and the result you want to obtain. Do you want to sum two values? Click on the first value, hit plus, click on the second value, job done. Throughout the entire process, you see what you are doing. If the number is wrong, you immediately see the result.

Let us elaborate on this with an example. Imagine you have some data in an Excel workbook, and you want to compute a sales running total. The solution is straightforward: the value of the previous row plus the current row.

Do you see how visual Excel is? Using colors, it drives your eyes easily to the cells referenced in code. The result is satisfying, to say the least: you compute a running total by knowing only how to sum two cells, and how to copy & paste to repeat the formula in all the rows where you want the result to appear.

Visual programming goes one step further than that. Not only do you see the result immediately, based on numbers visible on the screen; The code for the cell is visible in the same place too. You do not need to author the formula in a different environment: formulas, parameters, and results are all found within the same worksheet.

If you want to achieve the same result in Power BI, things are quite different. Frustrating, to put it mildly. Let us analyze what you need to do:

- First, you need to know that you want to write a measure. This is not trivial: if you want a result in a Power BI report, you need a measure.
- Second, you need to know your model. In our case, we have a single table only, with four rows. In the real world, things are very different. The table contains columns and values that might be different, in name and content, than what you see on the screen.
- Third, you need to figure out an algorithm that works the same way on all the rows. A single DAX expression is evaluated in every cell of the report. In our Excel example, the first cell (D3) has a specific formula: being the first, its value is just the current value. You sum the previous value with the current value starting from the second row. In Excel, two cells in the same table can thus have different formulas, whereas Power BI does not offer this feature: all the cells are controlled by the same formula.

Let us say that you come up with the correct algorithm, either on your own or by searching on the Web; you find out that the formula is, “sum the values of sales that are found before or equal to the current date”.

The last step is the hardest: how do you write the code? Here is a possible answer:

Running Total := VAR CurrentDate = MAX ( Sales[Date] ) VAR PreviousSales = FILTER ( ALL ( Sales ), Sales[Date] <= CurrentDate VAR Result = SUMX ( PreviousSales, Sales[Amount] ) RETURN Result

And here is what you need to know, to author it:

- You need MAX in order to find the current date, because you want the last date visible in the current filter context; the latter happens to be a filter applied to the Sales table by the visual when you are slicing by date.
- You need ALL over Sales, in order to get rid of the filter context. It would otherwise limit the number of rows visible in Sales.
- You need FILTER in order to restrict the result of ALL to the sales that took place before the current date.
- You need SUMX in order to iterate over the result of FILTER; this in turn retrieves the value of Amount in the row context of the iteration introduced by SUMX.

None of these concepts is intuitive. Most newbies fail in authoring a simple running total because they are not familiar with things like row context, filter context, iteration. It does not matter how easy it is for an experienced DAX developer to author this formula. In the self-service BI world, anybody should be able to author a running total easily. No PhD required, just some googling and a hint of common sense.

There is more. The entities that you manipulate in Excel are cells. Yes, you can work with arrays, tables, and more advanced structures. But by just referencing cells and ranges, you can already leverage tremendous analytical power. In Power BI, there are no cells. There are tables, columns, relationships, and measures. You do not see these entities in your report. Tables and relationships are the foundation of the model you are working on – they are mostly invisible in a report. In a report, you see the result of calculations and aggregations. Tables are in the background, they are not what you see. Still, measures work on tables and columns.

Bear in mind that we are not preaching that Excel is better than Power BI. We are not comparing them at all. All we are doing is trying to understand what makes Excel so much easier than Power BI, because we want to steal the secrets of Excel to make Power BI a better tool. Being visual, in our opinion, is the most powerful of those secret ingredients of Excel.

Simplicity in Excel comes with limitations. The very fact that you can mix cells with different formulas in the same table comes at a price. Indeed, Excel presents severe limitations when it comes to the number of rows allowed in a table. Theoretically, you can reach a limit of one million rows in a worksheet. In practice, as soon as you reach a few hundred thousand rows, performance becomes so bad that you start rethinking your report. Power BI, on the other hand, handles tens of millions of rows in the blink of an eye. Billions are also a viable option – not scary at all when using Power BI Premium.

Similarly, Excel is super-easy as long as you work with a single table. If you need to work on a model containing multiple tables, you end up relying on VLOOKUP – again with a huge performance hit. Power BI, on the other hand, handles models with multiple tables seamlessly.

Therefore, we can conclude that programming need not be hard. DAX can be made simpler by adding some visual elements to it. That said, making DAX more visual should not disrupt its performance. After all, we do not want to go back to Excel; we want to improve DAX.

The reason DAX is not visual like Excel is, lies deep in the roots of the tool. Power BI is the result of merging multiple tools into a single product. Over time, Microsoft developed Analysis Services Tabular (AS), Power View, and Power Query. Merging these three products was the genesis of Power BI. Although the old tools were merged into a single product, they remain different tools based on different technologies:

- AS is the engine that stores your data: it is based on tables, relationships, measures, and columns. It is an extremely powerful database that provides Power BI with its unprecedented power and speed.
- Power Query is the ETL tool introduced in Power Pivot first. It has a great ETL language, and it can mashup data from virtually any source. It provides Power BI with its connectivity to external data sources.
- Power View was originally introduced in Excel and SharePoint as the next data visualization tool. Only when it was moved into Power BI did Power View really spread its wings and start to fly, at the heights we see now. Power View lost its name and identity – nobody talks about Power View anymore – but it gained the unconditional love of Power BI users.

Now the thing is that these three products are integrated together in Power BI, but they are not a single tool: they are still three souls in one body. For sure, they communicate together. Power Query loads from external sources and provides data to AS, which stores it in its database. Power View queries AS to produce visuals. AS is the core of Power BI: it sits in the middle between the data sources and the final visualization. AS is also responsible for any calculations over the data. This is the reason why measures are stored in AS.

Although AS is the core, you do not interact with AS. What you see in the end is Power View, the tool that produces the visuals in your reports and dashboards. These visuals are generated by Power View out of queries executed by AS. AS has no clue whether the result of a query lands in a matrix, a line chart or a slicer. AS receives a query, computes it, and returns the result. At the cost of repeating ourselves: AS is responsible for calculations, Power View for the visualization.

In AS calculations occur through measures. Measures are entities stored and executed by AS. AS speaks tables and relationships. AS does not even know about the existence of visuals. This is the reason why a measure cannot reference values in a visual. A measure lives in a different world: the world of AS. In other words, there is no way a measure can reference something like “the previous row in a matrix”, or “the subtotal at the year level”. These are visual representations of values, but AS is not visual at all.

So far, we have seen that in order for a language to be easy, it needs to be visual. Moreover, we also saw that DAX, Power BI’s programming language, cannot be visual. Should we conclude that DAX will never be a visual language – therefore we need to live with its complexities? Well, the purpose of this article is quite the opposite: we would like to build extensions to the DAX language to get the best of both worlds: a language that can speak both “tables and relationships” and “visuals, rows and columns”. The extension we would like to discuss is *Visual Calculations*. Let us elaborate on this.

The data in each visual is populated by sending a query to the underlying model in AS. The result of this query is a table containing the most detailed data gathered to produce the visual. Let us call this table *ReportData*.

You can think of ReportData as the result of exporting the data of a matrix in Power BI to a .CSV file, for later massaging with Excel.

ReportData in and of itself is just a name for the result of a query. What if we extend DAX to add calculations on top of this table? For example, the percentage against the grand total could be expressed this way:

% Total := DIVIDE ( ReportData[Sales Amount], TOTAL ( ReportData[Sales Amount] ) )

There is no TOTAL function in DAX. Think about it as a new function, that returns the total of the table resulting from the query that the matrix sent to the report.

The same calculation in DAX would be:

% Total := DIVIDE ( SUM ( Sales[Sales Amount] ), CALCULATE ( SUM ( Sales[Sales Amount] ), ALLSELECTED () ) )

If the new functionality were restricted to only totals, it would not be very useful. What about the percentage over the category total? We bet many DAX developers would have to think for a while before coming up with the correct answer:

% CategoryTotal := DIVIDE ( SUM ( Sales[Sales Amount] ), CALCULATE ( SUM ( Sales[Sales Amount] ), ALLSELECTED (), VALUES ( Product[Category] ) ) )

Whereas by working with ReportData, the calculation could be as easy as using a hypothetical PARENTTOTAL function:

% Category := DIVIDE ( ReportData[Sales Amount], PARENTTOTAL ( ReportData[Sales Amount] ) )

Before we move on, you can notice that the new syntax we are suggesting works regardless of the columns used in the report. Indeed, by using the content of ReportData and a higher-level function like PARENTTOTAL, it would work on the report – returning the percentage against the parent total no matter which columns are being used for slicing and dicing. The DAX code, on the other hand, works in a report that is slicing by category; it would fail if you sliced by other columns.

What about the running total we were showing at the beginning of the article? In this visual language, it could look something like:

Running Total := ReportData[Sales] + PREVIOUS ( ReportData[Running Total] )

It looks like a recursive definition. It is not. Using PREVIOUS, you could reference the previous row in ReportData. The engine can be smart enough to implement it using a windowing function, or an iteration.

Are we suggesting a new programming language? Not at all. The idea is to extend DAX, not to replace it. This feature could be implemented as part of DAX, with a user interface on top of it to make it simpler. What follows is neither DAX nor a new project being developed, it is just an example of what such DAX extension might look like:

EVALUATE ADDVISUALCOLUMNS ( SUMMARIZECOLUMNS ( 'Product'[Category], 'Product'[Subcategory], "Amt", [Sales Amount] ), "Pct", DIVIDE ( [Amt], PARENTTOTAL ( [Amt] ) ) )

The ADDVISUALCOLUMNS processes the result of SUMMARIZECOLUMNS by adding new columns. In the new column definition, you may access specialized functions to navigate through the table returned by the inner SUMMARIZECOLUMNS.

The advantage of this approach is that the calculations in ADDVISUALCOLUMNS reference values that are directly shown on the screen in a Power BI visual. Remember: any number you see in a visual is returned by a SUMMARIZECOLUMNS query executed on AS. The visual knows the names of these columns and their content. This extension of the language would let the visual know which value you are clicking on when you select any cell of the visual.

DAX would not become totally visual. It would still be structured with the usual tables and relationships. Yet the existence of ReportData tables makes certain calculations possible, without the names of the tables and columns in the model being known: you only need to point and click on a cell in the visual. Like you would do in Excel.

Obviously, we do not want all users to learn this syntax. A good user interface in Power BI could let users define the set of visual calculations in a simpler way, with a code generator that implements the more intricate DAX syntax.

The goal of visual calculations is not to replace DAX. DAX is still there for power users and model builders. More casual users can experience great satisfaction by building non-trivial calculations in a visual way. If and when they are ready to move to the next level, the full AS DAX is waiting for them.

On the other hand, developers and more experienced users could use this new feature as an enhancement of DAX to author queries and/or complex calculations. Indeed, ADDVISUALCOLUMNS would be extremely handy for everyone. Just think, for example, that you could use this feature to precompute a table and then perform further calculations on the resulting table, reducing the need to compute certain values again and again – better usage of the cache, better performance.

Of course, it is just an idea. A lot of work needs to be done in defining the features of visual calculations. For instance, we need a definition of a set of features. It should include the navigation through the table – previous row, next row, running sums – and maybe some time intelligence logic because time intelligence is almost everywhere. Do we need ADDVISUALROWS, in addition to ADDVISUALCOLUMNS? We do not have an answer yet; it is something to think about.

Moreover, the result of any query in DAX consists of a table with many rows. Power BI visuals oftentimes pivot their data, showing for example the year on columns instead of rows. Do we need to be able to refer to the previous column, too? In some way, the goal of this feature is to express calculations using the entities seen in the visual. Therefore, the concepts of row, column, and subtotal should be present in the navigation functions.

For example, should subtotal rows be present in the ReportData table or not? For non-additive measures, the presence of sub-totals would be needed, along with more complexity in the language. Avoiding them might prevent some calculations, keeping the language simpler.

We could go on and show further ideas on this feature for days, but the goal of this post is not to define a new feature. Instead, we want to show the generic idea and – most importantly – ask for your help.

We published a new idea on ideas.powerbi.com (DAX calculations at visual level of reports), referencing this article and providing a brief description of the idea described here. There are at least two ways you can help:

- Use the comments on this article to express your opinion. What kind of functions do you think would be useful to have for the visual calculation? Does the idea sound interesting? Do you think you or your colleagues may benefit from visual calculations?
- Vote for the idea if you like it, and spread the word so that – if it reaches a good number of votes – Microsoft engineers might be tempted to actively think about visual calculations and finally develop the feature.

Finally, if you honestly believe that this idea would only mess up Power BI by adding a useless layer of complexity, then say that. We are truly interested in unbiased opinions from you. Together, we can better understand the implications of visual calculations and – hopefully – apply pressure to bring it to life in the best way.

]]>Starting from the May 2020 version of Power BI Desktop, regardless of the Windows locale settings DAX always uses standard separators by default. This change does not affect most Power BI users around the world; but if you use the comma as a decimal separator and 10,000 is just ten and not ten thousand, then you are affected by this change.

First of all, you can restore the previous behavior as I describe in this post, but the recommendation and the default are now to use the standard DAX separators. I want to describe why this (I think good) choice was made and how I contributed to making this happen.

If you used Power BI Desktop before the May 2020 version and your Windows locale settings use list and decimal separators other than comma (,) and period (.), you always had a different user experience when writing DAX code. In all examples and articles, you have been seeing code like this:

Taxes := SUMX ( Sales, Sales[Quantity] * Sales[Net Price] ) * 0.10

In your Power BI Desktop, this code should have been written as:

Taxes := SUMX ( Sales; Sales[Quantity] * Sales[Net Price] ) * 0,10

Over time, Power BI Desktop inherited from Excel this idea of customizing separators in a formula. But Excel goes beyond that and translates the function names. While this could be a good idea for the user interface, when applied to a language it has the negative side effect of making the code samples available on the Internet harder to use with various languages.

Luckily, Power BI never went so far and only limited the translation to the separators. The more common situation is having the semicolon (;) as a list separator and the comma (,) as a decimal separator. However, having a different presentation of the DAX formulas in the user interface creates several issues:

- Internally, DAX is not translated according to locale settings. The engine uses the period character (.) as a decimal separator and the comma (,) as a list separator. The PBIX file always uses this DAX syntax. Therefore, the Power BI user interface must translate the expressions back and forth by changing these separators.
- There could be bugs in this conversion. I have seen several of these bugs over the years, and some of them were never fixed because of the ambiguity of certain expressions when using the comma as a decimal point separator.
- The copy & paste of an expression from a web page is often a pain when there is a different separator convention at play. This “conversion tax” does not help people be more productive.
- Tools like DAX Studio and DAX Formatter had to deploy an additional effort to support options for separators. The time spent developing and maintaining these features could have been better spent on more productive features.

In October 2019, I suggested implementing an option to choose separators characters in Power BI Desktop regardless of the Windows locale settings.

As part of the Power BI Contributor program, I have been able to work on this feature from December 2019 to April 2020. While the feature did not require months of development, the process of defining requirements, getting approval, implementing the code, and testing use cases requires coordination with many people – and this takes time.

Starting from May 2020, by default Power BI Desktop uses what I call “the standard separators” and ignores the Windows locale settings. We finally have one DAX syntax. You can copy and paste a code sample without having to worry about separators.

In order to minimize the impact, the following message appears the first time you edit DAX code in Power BI.

To be more accurate, this message dialog box is shown when:

- You enter the DAX formula bar to edit a measure, a calculated column, or a calculated table,
**AND**the separators used by the Windows locale settings are different than the “standard” ones for DAX (comma as a list separator, period as a decimal separator),**AND**it is the first time you enter the DAX formula bar,**AND**you have never clicked the “Dismiss” button in the message dialog box.

So if you close the window by clicking on the close button, the same message appears again when you edit another DAX formula. Once you click “Dismiss”, the message is no longer displayed when you edit a DAX formula.

At this point, you should be happy. Really, you should!

Now, what if you prefer to keep your old habits? What if you just need to help Power BI users who do not like the new behavior, and you cannot convince them that it is right? You can restore the previous behavior of Power BI, which respects the localized separators. The Regional Settings dialog box shows the DAX separators section you see below, if and only if the Windows locale settings provide a different choice of list and decimal separators other than the standard ones used by DAX.

You cannot see the “DAX separators section” if your Windows locale settings are set to “English US”.

DAX Formatter and DAX Studio will continue to support the conversion of separators. However, going forward the feature should be taken as is – no more bug fixes, no more improvements, more time available for other features in the backlog!

]]>This article shows how to use the USERELATIONSHIP function in DAX to change the active relationship in a CALCULATE function.

If two tables are linked by more than one relationship, you can decide which relationship to activate by using USERELATIONSHIP. Indeed, you can only have one active relationship between any two tables. For example, you start with a model where *Sales* is related to *Date* through the *Date* column, like in the following picture.

If you create a second relationship, based on the *Delivery Date *columns, the second relationship is an inactive relationship.

Inactive relationships are – by nature – non-active. Consequently, if you filter by *Year* the report shows the sales whose *Order Date* belongs to the filtered year. *Delivery Date* is not used as part of the default filtering. In order to filter *Sales* based on *Delivery Date*, you can temporarily activate a relationship using USERELATIONSHIP.

USERELATIONSHIP is a CALCULATE modifier, which instructs CALCULATE to temporarily activate the relationship. When CALCULATE has computed its result, the default relationship becomes active again.

The *Delivery Amount* measure below computes the amount delivered within a certain time period, in contrast with the ordered amount returned by the *Sales Amount* measure:

Delivery Amount := CALCULATE ( [Sales Amount], USERELATIONSHIP ( Sales[Delivery Date], 'Date'[Date] ) )

You can compare *Delivery Amount* with *Sales Amount* in the following figure.

The arguments of USERELATIONSHIPS are the two columns that form the relationship. The order of the columns is not relevant, even though it is common practice to use the column of the many-side of the relationship (*Sales* in our example) first, and the one-side (*Date* in the example) second. With that said, it is just a standard convention; the opposite order does not alter the result.

The relationship must already be in the model. You cannot use USERELATIONSHIP to create a relationship on the fly. USERELATIONSHIP can only activate an existing relationship.

]]>

This article shows how to use DAX and conditional formatting together to highlight the minimum and maximum values in a matrix in Power BI.

Conditional formatting is a handy feature in Power BI; combined with DAX skills, conditional formatting can produce excellent results. In this example, we show how to highlight relevant information in a matrix by formatting the desired cells.

We look at the sales of Contoso in different countries and years. If we look for geographical areas where some brands do not sell well, we can start the investigation with a matrix like the one below.

If we were to search for specific kinds of information by just looking at the numbers, we would be on our own. The matrix itself is not highlighting the most relevant information, nor guiding our eyes where it matters. The conditional feature in Power BI can improve data visualization, highlighting the higher and lower values with different background colors.

However, there are cases where we want to identify the best and worst figures in a matrix full of numbers. In other words, we want to highlight the cell with the maximum value – the best combination of continent and brand in the previous picture – and the minimum value – the worst combination of brand and continent. The following picture shows a matrix that highlights the best/worst cases.

We can obtain this result by mixing conditional formatting of the background with a measure that returns a number indicating whether the current cell is the minimum or the maximum of all the cells in the matrix. As we discuss later, we can use the same technique to highlight cells with an arbitrary business rule.

We need a *MinMax* measure that rebuilds the full result of the matrix in a temporary table. It then identifies the minimum and maximum values among all the ones computed. Finally, it checks if the current value of the measure is the minimum, the maximum, or any other value:

MinMax = VAR Vals = CALCULATETABLE( ADDCOLUMNS ( SUMMARIZE ( Sales, 'Product'[Brand], Store[Continent] ), "@SalesAmt", [Sales Amount] ), ALLSELECTED () ) VAR MinValue = MINX ( Vals, [@SalesAmt] ) VAR MaxValue = MAXX ( Vals, [@SalesAmt] ) VAR CurrentValue = [Sales Amount] VAR Result = SWITCH ( TRUE, CurrentValue = MinValue, 1, -- 1 for MIN CurrentValue = MaxValue, 2 -- 2 for MAX ) RETURN Result

The *MinMax* measure must return a number because otherwise, it cannot be used in the conditional formatting panel. The *MinMax* measure returns 1 only for the cell with the minimum value, 2 for the cell with the maximum value, and blank otherwise.

We configure the conditional formatting of the matrix so that it uses the information provided by the *MinMax* measure to color the cell background with the following rule.

By using this configuration, we obtain the desired result.

We used this technique to highlight the minimum and maximum values in the matrix. In other reports, you can use the same approach to highlight cells based on different business rules. For example, you can highlight the presence of large transactions in the underlying data even though the report only shows aggregated values, or you can highlight a difference with a reference value that goes beyond a predefined threshold.

Because the measure must internally rebuild the exact values shown in the matrix, the measure used for the conditional formatting strongly depends on the matrix where it is being used. For example, if a user removes the *Continent* attribute from the columns and replaces it with the *Year* attribute, then the custom formatting rule stops working or could return unexpected results.

As you can see, simple measures like *MinMax* can make a significant difference in the usability of a report. Therefore, they are the perfect candidate for report measures, built in a specific report for the sole purpose of improving the consumption user experience.

This article describes how to optimize a slow Power BI report with a large number of card visuals, by obtaining the same graphical layout with a smaller number of visuals.

Every visual element in a Power BI report must complete a number of tasks to provide its result. Visuals showing data must generate one or more DAX queries to retrieve the required measures by applying the correct filters. The execution of these queries increases the wait for the end user and increases the workload on the server – especially when multiple users access a report at the same time. In order to improve the performance and scalability of a report, the best practice is to reduce the number of visuals consuming data on any one page of a report.

The focus is on a single page of the report. Power BI only fetches data and builds the visualizations required for the active page of a report. When the user switches their focus to a different page, the waiting time only depends on the visuals of the new page. The content of the other pages of that report is not relevant as far as performance is concerned. The goal is to reduce the number of visuals in a single page of a report. This makes it challenging to achieve the desired report layout; however, we can look for the right visualization once we realize that the number of visuals in the same page is negatively affecting the user experience.

For example, the following report shows one card visual for each box representing the sales of a brand in a different color.

The report contains 33 card visuals, one column chart, and one matrix, each populated by a different DAX query. There are another four text boxes and one image that do not come from DAX queries. The rendering time of this page executed on a very powerful desktop is 2.7 seconds. We consider this to be unbelievably bad performance, since the slowest DAX query in the report runs in 22ms. You can see the exact timing on your computer by enabling the **Performance Analyzer** pane. You then:

- Click on
**Start recording**. - Click
**Refresh visuals**. **Sort by**Total time in a**Descending**

By clicking on **Refresh visuals** again, we could see a different sort order; still, the overall execution time should be similar. In this example, the slowest visual in the page was the “Red – Contoso – Card” which displays 44.50K as the *Sales Amount* of red products for the Contoso brand. By expanding this node in the list provided in the Performance Analyzer pane, you can see that most of the time is spent in “Other”.

The meaning of “Other” is that this visual had to wait for other tasks to be completed before being able to execute the DAX query and render its result. Because the report contains 40 visuals, 35 of which execute at least one DAX query, some visuals must wait for other visuals to complete their task before they are able to perform any action.

You can also export this result in a JSON file using the Export link, and then analyze the events in more detail. For example, the actual cost of the DAX query execution in the Analysis Services (AS) engine is just 123ms, whereas the overall cost of the aggregated DAX query events is 539ms. The difference between these numbers is the overhead generated by 35 requests made to the AS engine. Benchmark data is available in the PerformanceOriginalCards.pbix report included in the samples files.

How can we improve the performance, all the while returning the same layout? We need a single visual that produces the same illustrative effect as many cards. For example, by using Card with States by OKVIZ we can create a grid of cards where a measure is repeated for each value of a column. This custom visual is available for free in the AppSource marketplace, and the latest private version is also available at https://okviz.com/card-with-states/ . Even though the marketplace displays the message “May require additional purchase”, all the features are free. All you need to do is register on the OKVIZ website for your free lifetime license.

We replaced each set of card visuals for one of the colors (red, black, and blue) with a single instance of Card with States. We do this by specifying the corresponding color measure (*Red Sales* for the red color) and the *Brand* column as a Category.

By setting the properties of the visuals properly, we obtain an identical rendering for our report.

The real difference is the performance obtained by reducing the number of visuals in the report. If we repeat the Refresh visuals operation in Performance Analyzer, we see that the slowest visual is refreshed in 310ms and that we now have only 5 visuals being generated by DAX queries.

Because we removed 30 visuals from the report, first we greatly reduced the workload on the AS engine, and second we improved the scalability of our report when executed by multiple concurrent users. Last but not least, the refresh of the page went down from 2.7 seconds to 0.3 seconds, saving 88% in execution time. This is a massive improvement of one order of magnitude, achieved without changing any part of the data model.

If you export that result in a JSON file using the Export link, the detailed analysis (included in PerformanceCardWithStates.pbix) shows the following:

- the actual cost of the DAX query execution in the AS engine is now 12ms (formerly 123ms),
- the overall cost of the aggregated DAX query events is 21ms (formerly 539ms).

If you look at the previous screenshots, you might notice that the rendering time of Card with States is 181ms – see Visual display – which is much more than the original card (13ms). However, the aggregated rendering time of all the visuals in the new report is now 764ms; that same operation in the original report took an aggregated rendering time of 1,331ms.

The following table compares the metrics between the original and the optimized reports.

Original report | Optimized report | |
---|---|---|

Card visual | Card (default) | Card with States |

Number of card visuals | 33 | 3 |

Number of DAX queries | 35 | 5 |

Execute Query (AS) event duration | 123ms | 12ms |

Execute DAX Query event duration | 539ms | 21ms |

Render event duration | 1,331ms | 764ms |

User waiting time duration | 2,700ms | 310ms |

You may wonder how it is possible that the report completes the operations in 310ms – which is half of the total rendering time in the optimized report. You want to keep in mind that Power BI can parallelize the execution of different tasks. There is a limit to this parallelism and 35 visuals cannot be executed and rendered in parallel. When you have only 5 visuals generating queries in a report, the parallelism is more effective.

This article shows that the number of visuals in a report can degrade the performance by one order of magnitude, compared to the results obtained by using a smaller number of visuals. At the time of writing, Power BI does not have a generic “small multiples” feature to automatically create multiple copies of the same visual based on the value of an attribute, such as the *Brand* attribute used in this article. The AppSource marketplace already offers several custom visuals with a specialized “small multiples” option, like Small Multiple Line Chart and Sparkline. The latest version of Card with States by OKVIZ is the first visual providing the “small multiples” option to replace multiple instances of the Card visual, which is a very common source of performance issues in Power BI.

This article describes how to use conditional formatting with a DAX expression to color the rows of a table in Power BI based on the order number instead of using the alternate rows color formatting option.

Power BI offers the capability to show tables with alternating colored rows so to make it easier to read the content of a report.

You might want to use a similar alternate coloring style for a different purpose: highlighting all the lines of one same order. If you look carefully at the figure above, you can discover that there are indeed five orders visible, each with one or more products. But there is no visual indication of the rows belonging to the same order. A better format using alternate background colors would be the following.

The background color of the rows depends on *Sales[Order Number]*. The background color switches between white and light gray every time the order number changes, so all the rows of the same order have the same background color and can be easily identified. You cannot obtain this visualization by only using a Power BI style, because the coloring of a row depends on the actual data in it. You can achieve this goal by using the conditional formatting feature in Power BI. You can set the background color of a cell according to the value of a measure. Therefore, you need a DAX formula that returns two values: one for the white rows and one for the gray rows. The value returned by the measure must alternate between those two values with each consecutive order number.

A possible (but incomplete) solution would be a measure that returns 0 for even order numbers and 1 for odd order numbers. However, this formula does not guarantee that the values are changing for every order. In case you filter two orders with an even number in a row, they would be colored the same. If you look at the figures, all order numbers are even! Nevertheless, the idea of using odd and even order numbers looks promising. We only need to transform the order number into an integer that increases by one for each order, regardless of the actual order number.

The RANKX function in DAX serves exactly this purpose. We can rank the rows in the report by their order number, and then color the table in the report depending on whether the rank is odd or even. Besides, due to its simplicity this formula is a perfect opportunity to expand on one of the lesser known arguments of RANKX: its third parameter.

Let us start with the code:

RankOrder = VAR OrderNumbers = -- By using ALLSELECTED ( Sales[Order Number] ) CALCULATETABLE ( -- we would get all the order numbers VALUES ( Sales[Order Number] ), -- if there were no explicit filter on the ALLSELECTED ( ) -- order numbers. Therefore, we use ) -- VALUES / ALLSELECTED to reduce the number of orders -- evaluated in the ranking. VAR CurrentOrder = SELECTEDVALUE ( Sales[Order Number] ) VAR Result = IF ( HASONEVALUE ( Sales[Order Number] ), -- Check that only one order is visible. RANKX ( OrderNumbers, -- Table to build the lookup values. Sales[Order Number], -- Expression to use during the iteration. CurrentOrder, -- Value to rank. We must use SELECTEDVALUE ASC -- to retrieve the current row in the report. ) ) RETURN Result

The *OrderNumbers* variable stores the order numbers that are visible in the table visualization. Be mindful that we did not use *ALLSELECTED( Sales[Order Number] )* because it would return all the values of the *Sales[Order Number]* column. Indeed, ALLSELECTED as a table function returns the filtered values only if the column is being actively filtered. We retrieve the order numbers visible in the current visual by using the VALUES function evaluated with ALLSELECTED as a CALCULATE modifier.

The *CurrentOrder* variable is important too. Each row of the table visual is showing exactly one order number. Therefore, each cell has a filter context that filters exactly one order. In this scenario, SELECTEDVALUE returns the value of the order number currently being filtered. You cannot omit SELECTEDVALUE, because the *Sales[Order Number]* column reference alone requires a row context which is not present in our formula.

The *Result* variable evaluates the ranking if and only if there is only one order number visible, to avoid any calculation for the total and for any subtotal that groups two or more orders. If there is only one order in the filter context, then RANKX provides the ranking.

If you have used RANKX in the past, you were likely ranking a measure like *Sales Amount*. When you apply RANKX to a measure, you typically only use the first two arguments of RANKX. RANKX evaluates the measure during the iteration over the table provided in the first argument; because of the context transition, RANKX evaluates the expression of the measure in a filter context obtained by the row being iterated. After the iteration, RANKX re-evaluates the same measure in the original filter context to obtain the value to rank.

In our case, we are ranking a column value and not the result of a measure. Therefore, we do not rely on a context transition. During the iteration, RANKX evaluates the *Sales[Order Number]* expression within a row context corresponding to the row being iterated. At the end of the iteration, RANKX must obtain the value to rank, but can no longer use the *Sales[Order Number]* expression because the row context is missing in the original evaluation context.

The third argument of RANKX is now useful: RANKX uses the third argument to get the value to rank. If the third argument is missing, RANKX evaluates the same expression passed in the second argument in the original evaluation context, which is why you have probably never used it. In our example, we provide *CurrentOrder* as the third argument of RANKX, getting the rank of the order number displayed in the report.

You can see the result of the *RankOrder* measure in the following picture.

*RankOrder* is a number increasing by one each time the row shows a different order number, regardless of the actual *Order Number* value. We can create the *BandNumber* measure to transform this number into 0 or 1:

BandNumber := MOD ( [RankOrder], 2 )

The *BandNumber* measure is easy to use in the conditional formatting options of Power BI to change the background of all the columns you want to format. Unfortunately, you have to apply this format to every column shown in the visual because it is not possible to set conditional formatting for the entire row.

The report now shows alternate colors depending on the order number. You can hide the *RankOrder* and *BandNumber* measures; indeed, they are just internal measures whose sole purpose is to improve data visualization.

As you see, DAX is not only useful to compute super-complex expressions. You can also rely on DAX to format your report the way you like, by combining the power of DAX with the flexibility of conditional formatting.

]]>VertiPaq Analyzer is a set of tools and libraries to analyze VertiPaq storage structures for a data model in Power BI and Analysis Services Tabular. In this webinar recorded for powerbi.tips, Marco Russo shows how VertiPaq Analyzer has been integrated into DAX Studio, explaining how to use these features to optimize the size and the performance of a data model.

]]>This article describes in which conditions the precedence of calculation groups might return unexpected results when filtering calculation items in both the visuals and the measures present in a report.

When there are multiple calculation groups in a model, we must define their precedence. This ensures we get the expected result when applying different calculation groups to a measure. More details are available in the Understanding Calculation Group Precedence article.

Even in a well-designed model, there are scenarios where a poor combination of calculation groups precedence, CALCULATE, and context transition in a DAX formula can translate into the results being inaccurate. This article shows how much care needs be taken to work with calculation groups.

You create a model with two calculation groups: *Adder* and *Multiplier*. The purpose of *Adder* is to add a number to the selected measure, whereas *Multiplier* multiplies the selected measure by a factor. Each calculation group contains a single calculation item:

-- -- Calculation Group: Adder -- Calculation Item: Plus5 -- SELECTEDMEASURE () + 5 -- -- Calculation Group: Multiplier -- Calculation Item: Mul10 -- SELECTEDMEASURE () * 10

Following mathematics rules, we want to apply the multiplication before the addition. Therefore, the *Multiplier* calculation group has a higher precedence of 200, whereas *Adder* has a lower precedence of 100. We also define a measure to complete the data model to test in this article: the *Just2* measure simply returns the 2 value:

Just2 := 2

The article requires full knowledge of the precedence in calculation groups and the application of calculation items. Before covering the main topic of this article, we warm-up your brain and refresh your knowledge of calculation groups with four small puzzles, each one with a piece of DAX code and the result. Your job is to explain why DAX computed that result. Try to explain the result before reading further. The explanation is right after each figure, and these puzzles are super helpful to train your brain on how to think about calculation groups.

Let us start with the first test: why is the result 2, as if the calculation items were not being applied?

The answer is that a calculation item is only applied to a measure reference. If there are no measures in the code, the application item has nowhere to be applied. Because the first argument of CALCULATE is just a constant value, the calculation items are not applied at all. No measure, no application.

Warming up? Good. Now it is time for the second puzzle: why is the result 70 and not 25?

The precedence of calculation groups defines the order of ** application** of calculation items. The application is not an evaluation.

-- -- This the initial expression -- CALCULATE ( [Just2], Adder[Addend] = "Plus5", Multiplier[Factor] = "Mul10" ) -- -- The first calculation item being applied is Multiplier -- CALCULATE ( [Just2] * 10, Adder[Addend] = "Plus5" ) -- -- The second calculation item being applied is Adder -- ( [Just2] + 5 ) * 10

The order of the calculation items in CALCULATE does not affect the result. Calculation items are only applied following the precedence of their corresponding calculation groups.

Now, the third difficult question. Can we force the order of evaluation of calculation items by using CALCULATE? Why does not DAX follow the order we try to force by using nested CALCULATE functions?

The answer requires a bit more attention here, although the explanation is the same as the previous ones. A calculation item is applied on a measure reference. The outer calculation item from *Adder* does not have any effect on the inner CALCULATE, because it is applied to the *Just2* measure only inside the inner CALCULATE. Therefore, this code is identical to the previous puzzle. Nesting CALCULATE functions is not a way to change the order of application of calculation items, at least not in this example. Things can be much more complicated if we have multiple measure references instead of a simple CALCULATE. Nevertheless, the goal of this article is not to write more complex code. Instead, we want to show how to correctly understand the result of an expression when calculation groups are involved.

Let us complicate the scenario in the fourth puzzle. We define a new *Just2Times10* measure:

Just2Times10 := CALCULATE ( [Just2], Multiplier[Factor] = "Mul10" )

Then, we run the following query: we finally obtain 25 as a result, but why?

The answer is still the same: calculation items are applied on a measure reference. In this case, the measure reference is no longer *Just2*. Because the CALCULATE function evaluates the *Just2Times10* measure, the calculation item is applied to the *Just2Times10* measure. Therefore, *Adder* works on *Just2Times10* and *Multiplier* kicks in only later, when *Just2Times10* is being evaluated:

-- -- This the initial expression -- CALCULATE ( [Just2Times10], Adder[Addend] = "Plus5" ) -- -- The only calculation item being applied is Adder -- [Just2Times10] + 5 -- -- Just2Times10 invokes another calculation item -- CALCULATE ( [Just2], Multiplier[Factor] = "Mul10" ) + 5 -- -- Only now does Multiplier kick in -- ( [Just2] * 10 ) + 5

The code we have seen so far is only a refresher of calculation groups. This warm-up is the longer part of the article. The end goal is to explain that there is a big difference between using a calculation item in CALCULATE and defining a measure that applies the calculation item. When a measure formula applies a calculation item to another measure reference, it overrides the precedence of the calculation groups – whereas when the same CALCULATE applies multiple calculation items, the DAX engine chooses the precedence based on the precedence of the calculation groups involved.

For example, the following query contains two EVALUATE statements. The first EVALUATE groups by *Adder[Addend]* and evaluates the *Just2Times10* measure, whereas the second EVALUATE evaluates the *Just2* measure grouped by both *Adder[Addend]* and *Multiplier[Factor]*. The results of the two EVALUATE statements are different:

The first EVALUATE groups by *Adder[Addend]. * For this reason, *Just2Times10* is included in an implicit CALCULATE function that applies to each row of the result, a different calculation item from the *Adder* calculation group. The result produced by the *Just2Times10* measure reference is 25, as in our last puzzle.

The second EVALUATE groups by both *Adder[Addend]* and *Multiplier[Factor]*, so there is only one CALCULATE with the two calculation items applied to the *Just2* measure. DAX applies the calculation items according to the precedence of the corresponding calculation groups and the result is 70.

At the risk of sounding pedantic, let us repeat the concept once more. If you define a measure that applies a calculation item, you are not doing the same operation as if you add a calculation item to a slicer or rows or columns of a visual. By using a measure, you enforce the application order. In contrast, by using a report filter, you are letting the engine choose the application order based on the precedence setting of the calculation groups. These are not the same thing.

As you have seen, the scenario is rather complex to analyze and understand. Yet, as a DAX expert, you want to handle this complexity successfully. So approach any model you build by making sure that precedence, measures, and calculations are all fine-tuned to provide a smooth user experience. As soon as users decide to create their own measure in a report by using a calculation group with CALCULATE, they have the power to break your well-engineered calculations.

Multiple calculation groups in a model can be extremely complex to manage. Before using them in your models, make sure you take the time to understand how they work in depth, and to understand what the implications of using them can be. Well, do not just take the time. Invest lots of time! The good news is that calculation groups are extremely powerful once you master them. Remember: with great power comes great responsibility!

]]>This article explains how to control the result provided by a formula in a DAX measure when the meas-ure is being used in a report with granularity below the granularity supported by the formula.

In the article Obtaining accurate totals in DAX, we described the granularity of a formula as being the level of detail that the formula uses to compute its result. If you are not familiar with formula granularity, please read that article first, because this is the natural continuation.

The starting point is a measure computing the number of months where *Sales Amount* is greater than 30,000:

High Months = SUMX ( VALUES ( 'Date'[Calendar Year Month] ), IF ( [Sales Amount] &amp;gt; 30000, 1 ) )

In a report at the month level, the formula produces the expected results.

Nevertheless, when expanded at the date level that same formula produces incorrect figures. As you can see in the next figure, the *High Months* measure returns 1 for those days where *Sales Amount* is greater than 30,000.

The formula is designed to work at the level of the month and above – like quarter and year. Below the month level – like day – the results are inaccurate. At this point, the important question is what the result below the granularity of the formula should be. There are several possible answers; all of them are sort-of correct:

- It is fine as it is, no need to fix it; users should not use the measure at the day level.
- It always needs to show 1, because each individual date belongs to a month that satisfies the condition.
- It must show a blank, because that granularity is not supported.

If you are ok with the first option, then the remaining part of the article is useless. Still, you are quitting the game too early: sooner than later, a user will use and read the values at the unsupported granularity. A real DAX pro does not let their formulas compute an incorrect value. Under. Any. Circumstances.

The second option, which proves to be much better than the first one, requires some deep reasoning about the filter context. For example, when the report is showing data at the day level, the measure must move the granularity of the calculation to the corresponding month level by removing the filter on the day column, while keeping the filter on the month only. The right way to obtain this behavior is by using ALLEXCEPT in CALCULATE:

High Months (2) = SUMX ( VALUES ( 'Date'[Calendar Year Month] ), CALCULATE ( IF ( [Sales Amount] &amp;gt; 30000, 1 ), ALLEXCEPT ( 'Date', 'Date'[Calendar Year Month] ) ) )

ALLEXCEPT removes any filter from the

Datetable, except the filter that is currently being created by the context transition invoked by CALCULATE during the iteration over the months. Please note that in this case ALLEXCEPT is the correct solution – though we often suggest you use ALL/VALUES instead. Indeed, this version of the formula would not work:High Months (Wrong) = SUMX ( VALUES ( 'Date'[Calendar Year Month] ), CALCULATE ( IF ( [Sales Amount] &amp;gt; 30000, 1 ), ALL ( 'Date' ), VALUES ( 'Date'[Calendar Year Month] ) ) )The reason why this formula is wrong is quite subtle: when VALUES is being evaluated, the context transition still has not taken place. Therefore, the filter on the year/month is not effective yet. This is not an issue at the month or day level inside the matrix, but it provides an incorrect result at the grand total level, where no filter is active. ALLEXCEPT being a modifier, it executes after the context transition; it efficiently removes any filter except the one on

Date[Calendar Year Month]introduced by the context transition.

The third and last option requires blanking the value when the granularity of the report is below the granularity of the formula. The DAX code is not too complicated, but the reasoning behind it requires some special attention.

The formula must detect the granularity of the cell in the report before completing the calculation. If the cell is below the supported granularity, the result must be a blank. In order to detect the granularity of the cell, we can compare the number of rows in the *Date* table at the cell granularity (the filter context of the cell being evaluated) with the supported granularity (the granularity defined by the formula).

Because the formula should work at the month granularity, we create two variables. The *DaysAtCurrentGrain* variable contains the number of visible rows. The *DaysAtFormulaGrain* variable contains the number of rows in a new filter context where we enforce the month granularity. If the two variables return the very same number, then DAX is working in a filter context at the month granularity level or above that (such as the year). If the two numbers are different, it means that the formula is computing the result below the supported granularity, and the result must be a blank:

High Months (3) = VAR DaysAtCurrentGrain = COUNTROWS ( 'Date' ) VAR DaysAtFormulaGrain = CALCULATE ( COUNTROWS ( 'Date' ), ALL ( 'Date' ), VALUES ( 'Date'[Calendar Year Month] ) ) VAR IsSafeToCompute = (DaysAtCurrentGrain = DaysAtFormulaGrain) VAR Result = IF ( IsSafeToCompute, SUMX ( VALUES ( 'Date'[Calendar Year Month] ), IF ( [Sales Amount] &amp;gt; 30000, 1 ) ) ) RETURN Result

The *High Months (3)* measure returns a blank whenever the report goes below the supported granularity – guaranteeing that if a number is shown, it is the correct one.

As you can see, computing the right values at any granularity proves to be challenging. Building useful measures in a data model always requires thinking in terms of granularity and accuracy of the results.

DAX offers all the tools that are necessary to obtain the correct result. It is up to you to think in advance about the conditions under which your formula might fail. Once you have identified these critical conditions, you must define what the expected result at the given granularity is, and eventually implement the solution in DAX.

]]>This article describes how to compute visual totals accurately, in case a measure does not provide the right result at the total level but works correctly in other rows of the report.

In simple DAX measures, the total of a report is the sum of its individual rows. For more sophisticated measures, the total might seem wrong because the formula does not include an aggregation over the rows that are visible in the report. For example, if the total of a measure must be the sum of the values displayed in the rows of a report, we consider the expected result a “visual total”, which is a total that corresponds to the visual aggregation of its values sliced by different rows in the report.

Let us elaborate on the topic with a simple example: a measure counts on how many days inside a specific time period, *Sales Amount* is greater than 30,000. The measure below checks the condition and returns 1 if *Sales Amount* is greater than 30,000:

High Days := IF ( [Sales Amount] > 30000, 1 )

The measure works fine in a report that shows one day in each row. However, the sub-totals and the total look wrong because they show 1 instead of the sum of the individual days.

Be mindful that the total is correct, given the definition of the formula. Yet, it does not correspond to the “visual total” we would like to get. The reason is that the measure is evaluated for each cell of the report. The Total row does not correspond to the day level; it is a row that groups all the days into a single calculation. When DAX shows a value that looks wrong, it is not DAX that is not working. DAX is answering a question, perhaps the wrong one! Let us elaborate on this.

The measure checks whether the sales amount is greater than 30,000. It returns 1 if that is the case, and 0 otherwise. Because Sales Amount is greater than 30,000 in the Total row, the formula returns 1 as requested. The same happens in January. DAX answers the wrong question correctly!

The feeling that the total is wrong comes from the visual perception that we – as humans – obtain from the report. Let us look at the same report without the details at the day level.

The results are identical, and we still have the feeling that the yearly total is wrong. The perception is now that the Total should be the sum of the individual months. Not seeing the day rows, we no longer expect the Total to be the sum of individual days.

DAX does not see the report the way we do. Therefore, we need to rephrase the question to obtain the desired result: We modify the formula granularity, which is the level of detail the formula needs to analyze.

The first question to ask yourself is: at what granularity should I check the total? It might be at the day level, the month level, or whatever time period. Let us say we set the granularity at the day level. The question becomes: *at the day level, I want to check if the total is greater than 30,000. At any level above the day level, I want the sum of the values computed at the day level*. Asking the question this way gets us closer to the correct formulation, but it is not perfect just yet.

Before getting to the correct formula, one further step is needed. The above description is bottom-up; we started at the detail level, and then the next step was to provide the aggregated result. In DAX, you need to reason top-down: first the aggregation, then the details. In other words, the same question should be stated as: *for each day in the current time period, sum the result of this formula: if the sales of the day are greater than 30,000 then return 1, otherwise return 0.* The question is the same. This time, we first provide the aggregation, and then we specify the calculation to perform at the detail level.

All this reasoning leads to the right formula:

High Days Correct := SUMX ( 'Date', IF ( [Sales Amount] > 30000, 1 ) )

The formula iterates the *Date* table and computes the result for each day. Finally, it aggregates the daily results by using SUM.

It is worth noting that there is no need to specify a different formula to compute a value at the day and the aggregated levels. Indeed, when a single day is selected, the formula behaves correctly: it only performs one iteration. Over any longer period of time, the number of iterations is the number of days in the period. Moreover, because the formula contains the desired granularity, it works no matter the granularity of the report. For example, the following report shows the right data at the month and the year levels, although the individual days are not visible.

Once the question is phrased the right way, DAX behaves nicely. The granularity of the calculation depends on the definition of the formula. Each business and user might have different requirements. For example, to identify months with sales greater than 30,000, the granularity must be the month: *for each month in the current period, sum the result of this formula: if the sales of that month are greater than 30,000 then return 1, otherwise return 0. *The formula requires the same structure as the previous version; it just replaces the day with the month to perform the calculation at the right granularity.

The formula flows logically:

High Months = SUMX ( VALUES ( 'Date'[Calendar Year Month] ), IF ( [Sales Amount] > 30000, 1 ) )

Indeed, when used in the report at the month level, the results are correct.

The topic of granularity is not yet complete. There is still an issue with the formula above: the problem appears below the granularity supported by the formula. Indeed, the formula works at the month level – but what happens if we expand the month at the day level, showing details where the formula is not designed to work?

The result is correct at the month level, but the *High Months* measure also returns values at the day level. The scenario is similar to the previous one: when the formula works at a granularity that it is not designed for, the result is counterintuitive. Fixing this issue requires other deep reasonings about what to do at a higher level of detail (aka granularity). We save this interesting topic for a new article in the coming weeks.

For now, here is our advice: if the total looks wrong, it might be because the DAX formula does not perform the calculation at the right granularity. Think about the business problem, define the right question, and implement the correct DAX expression.

]]>This article describes how to control format strings in calculation groups. Before starting, we suggest you read the previous articles in this series.

Each calculation item can change the result of a measure through the DAX code in the *Expression* property. The author of a calculation item can also control the display format of the result, overwriting the standard behavior of the measure where the calculation item is applied. The *Format String Expression* property can contain a DAX expression that returns the format string to use after applying a calculation item to a measure reference.

For example, the dynamic format string can display the result of the **YOY%** calculation item as a percentage – that is the year-over-year as a percentage.

In this specific example, we are looking at sales volumes. The *Format String Expression* property of the **YOY%** calculation item is always a percentage, regardless of the original format of the underlying measure.

The model in this example has a **Currency Conversion** calculation group that applies a currency conversion exchange to all the measures representing currency amounts. The details of the calculation are not important for the purpose of this article. Instead, we focus on the goal of displaying the converted amount by using a different format for each currency available in the report.

The conversion should not affect measures that are not displaying sales values. For example, in the following report only the *Sales Amount* and *Margin* measures get the currency conversion and the corresponding format, whereas *# Quantity* and *Margin %* are not affected by the **Currency Conversion** calculation group.

The **Report Currency** calculation item applies a custom format string depending on the selected currency, checking whether the measure selected should be converted or not. This is done by the DAX expression included in the *Format String Expression* property.

The complete code of the Format String Expression property is the following:

VAR MeasureName = SELECTEDVALUE ( Metric[Metric], SELECTEDMEASURENAME () ) VAR SkipConversion = ( SEARCH ( "#", MeasureName, 1, 0 ) > 0 ) || ( SEARCH ( "%", MeasureName, 1, 0 ) > 0 ) VAR CurrencyFormat = SELECTEDVALUE ( 'Currency'[Currency Format], "#,0.00" ) RETURN IF ( SkipConversion, SELECTEDMEASUREFORMATSTRING (), CurrencyFormat )

The *MeasureName* variable retrieves the measure selected by looking at the *Metric[Metric]* selection, because the **Metric** selection can override the measure selected in the report. If the **Metric** calculation group is not used or does not have a valid selection, then the SELECTEDMEASURENAME function returns the name of the active measure.

The *SkipConversion* variable controls whether the measure should be converted based on the measure name.

The *CurrencyFormat* variable retrieves the format string from the corresponding column in the *Currency* table, where only one value should be selected to ensure the currency conversion can take place. In case the currency conversion is not needed, the SELECTEDMEASUREFORMATSTRING function returns the format string of the current measure. This format string may already have been defined by another calculation item with higher precedence.

There are three calculation groups in the model used in this example, with the following precedence order:

**Metric**: 10**Time Intelligence**: 20**Currency Conversion**: 30

The calculation group with the highest precedence order is applied first. This is valid for both the *Expression* and the *Format String Expression* properties of calculation items.

This is important considering how the *MeasureName* variable is evaluated in the **Report Currency** calculation item. Because **Report Currency** has a higher precedence, it is applied to the measure reference when the **Metric** calculation group is still in the filter context and has not yet been applied. For this reason it is possible to retrieve the current selection of *Metric[Metric]* from the filter context. In this case this technique is required to make sure that the behavior of the **Report Currency** calculation item is consistent regardless of the technique used to choose the measure to display in a report: both direct measure selection or **Metric** calculation group selection should work seamlessly.

The *Format String Expression* property enables the control of the format string of any measure in a dynamic way. The SELECTEDMEASUREFORMATSTRING function allows access to the existing format string – that could be the one originally defined by the measure or the one already modified by another calculation item. Controlling the format string is vital to provide a good user experience when using calculation items that can change the numeric representation of the result of a measure.

I am glad to announce that we just released an additional module about Calculation Groups to the Mastering DAX video course. All current students of the video course will immediately be getting access to this new section.

We covered calculation groups in The Definitive Guide to DAX book published last year. Yet we waited until the release of the feature in Analysis Services 2019 before recording the video on this section. This way, we were able to show the user interface required to create calculation groups in Tabular Editor. This was not a priority in the book, whose purpose is to explain the language and not the user interface. For avid readers, we have an entire section dedicated to Calculation Groups on SQLBI.com. The content is more updated than the book, especially the references to the user interface in Tabular Editor.

We decided not to show examples using SQL Server Data Tools for Visual Studio as an editor. The reason is that the user interface to manage calculation groups is far from ideal. You can safely edit a BIM file to add calculation groups in Tabular Editor and go back to Visual Studio if you want. If you want more details about the user interface in Visual Studio you should turn to the official calculation groups page by Microsoft.

When will this feature be available on Power BI Desktop? We hope very soon, but in the meantime continue to push the request with Microsoft by voting to Support creating calculation groups in Power BI Desktop.

**UPDATE 2020-03-28**

Just a few hours after our announcement, Christian Wade (Principal Program Manager in Microsoft) announced that Calculation Groups will be available in Power BI Desktop by this summer (2020). During the presentation of XMLA endpoint for Power BI Premium, Christian revealed that Power BI Desktop is going to be more integrated with external tools such as DAX Studio and Tabular Editor. As a consequence, Calculation Groups will be available in edit mode in Power BI Desktop by using Tabular Editor. And there is more, watch the entire Tabular Editor demo from minute 00:34 to 00:47.

A common requirement in authoring DAX code is to provide a default value for the result of a calculation that might otherwise be blank. The classic approach to solving this requirement is by using an IF statement. For example, the following code returns the value of either the *Sales[Net Price]* column or the *Sales[Unit Price]* column, whichever is first to not be blank:

IF ( ISBLANK ( Sales[Net Price] ), Sales[Unit Price], Sales[Net Price] )

When the expressions considered are measures instead of column references, it is a best practice to save the result of the first measure in a variable. The variable improves the performance by ensuring that the measure is evaluated only once:

VAR MyResult = [MyMeasure] VAR DefaultValue = [Default Measure] VAR Result = IF ( ISBLANK ( MyResult ), DefaultValue, MyResult ) RETURN Result

The alternative to this expression requires two references to the same *MyMeasure* measure, which could result in a double evaluation of the same expression to obtain the same result. The following expression evaluates *MyMeasure* a first time to test whether its value is blank or not, and then a second time to determine the result of the IF function if *MyMeasure* returns a non-blank value:

IF ( ISBLANK ( [MyMeasure] ), [Default Measure], [MyMeasure] )

By using the COALESCE function the code is more readable. COALESCE implements the same logic as in the previous examples, evaluating its arguments and returning the first argument that is not a blank:

COALESCE ( [MyMeasure], [DefaultMeasure] )

COALESCE accepts multiple arguments and returns the first argument that is not blank. If the first argument is blank, COALESCE returns the value provided by the expression in the second argument, and so on. If all the arguments are blank, COALESCE returns blank as well.

For example, consider a measure showing the Price of a product, using the average of *Sales[Net Price]* or using the *Product[Unit Price]* column if there are no sales transactions. Without COALESCE, the measure looks like this:

Price = VAR AvgPrice = AVERAGE ( Sales[Net Price] ) VAR UnitPrice = SELECTEDVALUE ( 'Product'[Unit Price] ) VAR Result = IF ( ISBLANK ( AvgPrice ), UnitPrice, AvgPrice ) RETURN Result

By using COALESCE, the code becomes much simpler:

Price = VAR AvgPrice = AVERAGE ( Sales[Net Price] ) VAR UnitPrice = SELECTEDVALUE ( 'Product'[Unit Price] ) VAR Result = COALESCE ( AvgPrice, UnitPrice ) RETURN Result

Performance-wise, COALESCE implements an IF condition. For this reason, you should not expect the latter code based on COALESCE to be faster than the former example based on an IF . Nevertheless, using COALESCE improves readability and this is already a great reason to start using it whenever needed. Moreover, the behavior of COALESCE may be further optimized in future versions of the DAX engine, so that the COALESCE function also ends up improving performance.

]]>**2020-03-15 UPDATE**: The original version of this article published in 2011 did not mention SELECTCOLUMNS, which was introduced in 2015. This article was rewritten in 2020 to provide updated coverage of projection functions in DAX.

The projection is provided by this classic SELECT in SQL:

SELECT * FROM Product

It corresponds to this DAX query:

EVALUATE 'Product'

A common projection consists in selecting just a few columns from the source table. For example, the following SQL query only fetches 3 columns from the Product table:

SELECT [ProductKey], [Product Name], [Unit Price] FROM Product

In DAX you can obtain the same result by using the SELECTCOLUMNS function, which requires you to specify the column name for each column of the result:

EVALUATE SELECTCOLUMNS ( 'Product', "ProductKey", 'Product'[ProductKey], "Product Name", 'Product'[Product Name], "Unit Price", 'Product'[Unit Price] )

Another common projection in SQL adds one or more columns to all the columns of a table. For example, the following SQL query adds the *Unit* *Margin* column to all the columns of the *Product* table:

SELECT *, [Unit Price] - [Unit Cost] AS [Unit Margin] FROM Product

You can get the same result in DAX by using the ADDCOLUMNS function:

EVALUATE ADDCOLUMNS ( 'Product', "Unit Margin", 'Product'[Unit Price] - 'Product'[Unit Cost] )

Both SELECTCOLUMNS and ADDCOLUMNS keep the duplicated rows included in the original table expression. Applying a DISTINCT condition can be done in one of two ways. Consider the following SQL query:

SELECT DISTINCT [ProductKey], [Product Name], [Unit Price] FROM Product

The more efficient way to get the same result is by using the SUMMARIZE function:

EVALUATE SUMMARIZE ( 'Product', 'Product'[ProductKey], 'Product'[Product Name], 'Product'[Unit Price] )

In case SUMMARIZE cannot be used over a complex table expression used as first argument instead of the simple *Product* table reference of this example, you could apply DISTINCT to the result of SELECTCOLUMNS . However, the following expression should only be used if SUMMARIZE cannot be used:

EVALUATE DISTINCT ( SELECTCOLUMNS ( 'Product', "ProductKey", 'Product'[ProductKey], "Product Name", 'Product'[Product Name], "Unit Price", 'Product'[Unit Price] ) )

By using SUMMARIZE you cannot change the column names. If you need to rename a column it is advisable to use a SELECTCOLUMNS consuming the result of a SUMMARIZE, in order to achieve the best possible performance. For example, consider the following SQL query:

SELECT DISTINCT [ProductKey] AS [Key], [Product Name] AS [Name], [Unit Price] AS [Price] FROM Product

The corresponding DAX version is the following:

EVALUATE SELECTCOLUMNS ( SUMMARIZE ( 'Product', 'Product'[ProductKey], 'Product'[Product Name], 'Product'[Unit Price] ), "Key", 'Product'[ProductKey], "Name", 'Product'[Product Name], "Price", 'Product'[Unit Price] )

Only if SUMMARIZE cannot be used, should you resort to this alternative, slower technique:

EVALUATE DISTINCT ( SELECTCOLUMNS ( 'Product', "Key", 'Product'[ProductKey], "Name", 'Product'[Product Name], "Price", 'Product'[Unit Price] ) )

In general, we always suggest using SUMMARIZE as an equivalent of SELECT DISTINCT in SQL, whereas ADDCOLUMNS and SELECTCOLUMNS are the DAX functions to use to get a projection without removing duplicated rows in the result.

]]>This article describes how to implement currency conversion for reporting purposes in Power BI.

Currency conversion applied to reporting can include many different scenarios.

- Data in multiple currencies, report with a single currency
- Data in multiple currencies, report with multiple currencies
- Data in a single currency, report with multiple currencies

The rule of thumb is to apply the currency exchange conversion upfront. Therefore, it is a good idea to solve all the scenarios requiring a single currency in the report by converting all amounts at the time of data import into the data model. When you need to create a report in multiple currencies, computing data in advance could be challenging and expensive. Therefore, a dynamic solution based on DAX measures and a properly designed data model makes more sense.

In this article, we only consider the third scenario, “Data in a single currency, report with multiple currencies”. The second scenario could be implemented by transforming data so that it is imported in the model in a single currency, moving the challenge over to the very scenario we describe in this article. An extended description of the three scenarios with other dynamic solutions is included in a chapter of the Analyzing Data with Microsoft Power BI and Power Pivot for Excel book.

The data model we consider includes a *Sales* table with all transactions recorded in USD as a currency. There is also a hidden *ExchangeRate* table that contains the exchange rate for every currency in every month, assigned to the first day of each month. The *AverageRate* column has the value we consider the average exchange rate of the month that must be applied as a conversion rate to all the transactions included in the same month. The following diagram shows the tables that are relevant to this description.

The complete model also includes other tables such as *Product* and *Customer*. The following report shows the *Sales Amount* measure for the sales of Cell phones in 2008. All the amounts are in USD, which is the single currency used to record transactions in the *Sales* table.

The requirement is to enable the user to choose a different reporting currency by using a slicer, obtaining something like the following report where the selected reporting currency is EUR.

The *Sales Currency* measure applies the required currency conversion to the result of the *Sales Amount* measure. In order to achieve good performance in DAX, it is better to aggregate all amounts found in the same currency and exchange rate, and then apply the conversion rate to that total, than to apply the conversion to every single transaction. Because the exchange rate is at the monthly granularity in this scenario, the *Sales Currency* measure is implemented by grouping *Sales Amount* by month, as shown in the following code:

Sales Currency := IF ( ISCROSSFILTERED ( 'Currency' ), VAR SelectedCurrency = SELECTEDVALUE ( 'Currency'[Currency Code] ) VAR DatesExchange = SUMMARIZE ( ExchangeRate, 'Date'[Calendar Year Month Number], 'ExchangeRate'[AverageRate] ) VAR Result = IF ( NOT ISBLANK ( SelectedCurrency ), IF ( SelectedCurrency = "USD", [Sales Amount], SUMX ( DatesExchange, [Sales Amount] * 'ExchangeRate'[AverageRate] ) ) ) RETURN Result, [Sales Amount] )

The *DatesExchange* variable creates a table in memory that has one row for every month. The assumption is that there is a single exchange rate for that month and a single currency selected. If this is not the case, this measure will return inaccurate numbers. It is thus important you validate the assumptions before implementing this calculation in your model.

The *Result* variable computes the conversion by – for each month – applying the exchange rate to the result of *Sales Amount* for that month. In case there is a multiple selection of currencies, the result is blank, whereas the value of *Sales Amount* is returned without performing any conversion in case *SelectedCurrency* is USD.

You might want to modify the business logic of *Sales Currency* to better adapt the formula to your specific requirements. For example, in case the user selects two or more currencies, the current implementation returns blank, but you might want to raise an error or display a different result in that case.

Defining the currency conversion with a specific measure like *Sales Currency* has two disadvantages:

**Duplicated code and measures**: The currency conversion should be applied to all the measures displaying a currency amount. If you have 3 measures (*Sales Amount*,*Total Cost*, and*Margin*) you have to create another three measures to display the value with the required reporting currency.**Format string**: The format string of the result of the*Sales Currency*measure does not include the currency symbol. When dealing with a report with a currency selected, it could be hard to understand the currency used for the report by just looking at the numbers.

By implementing the currency conversion in a calculation group, it is possible to solve both problems. At the time of writing, calculation groups are not available in Power BI. Thus, currently the remaining part of this article can only be implemented in Azure Analysis Services or Analysis Services 2019.

The expression of a calculation item that applies the currency conversion to any measure is similar to the definition of the *Sales Currency* measure. It just replaces the *Sales Amount* measure reference with a call to the SELECTEDMEASURE function:

-- -- Calculation Group: Currency Conversion -- Calculation Item : Report Currency -- VAR SelectedCurrency = SELECTEDVALUE ( 'Currency'[Currency Code], "USD" ) VAR DatesExchange = ADDCOLUMNS ( SUMMARIZE ( ExchangeRate, 'Date'[Calendar Year Month Number] ), "@ExchangeAverageRate", CALCULATE ( SELECTEDVALUE ( 'ExchangeRate'[AverageRate] ) ) ) VAR Result = IF ( ISBLANK ( SelectedCurrency ) || SelectedCurrency = "USD", SELECTEDMEASURE (), -- Single Currency non-US selected SUMX ( DatesExchange, SELECTEDMEASURE () * [@ExchangeAverageRate] ) ) RETURN Result

Because this conversion could be applied to any measure, including percentages and non-currency measures such as *# Quantity*, it is a good practice to check whether the measure should be converted or not. In order to minimize future maintenance requirements when new measures are added to the data model, we can define a naming convention for measure names:

**#**– If the measure name includes “#” then the measure represents a quantity and not a currency value. The currency conversion is not applied.**%**– If the measure name includes “%” then the measure represents a percentage and not a currency value. The currency conversion is not applied.

The calculation item definition is longer but more flexible this way:

-- -- Calculation Group: Currency Conversion -- Calculation Item : Report Currency -- VAR MeasureName = SELECTEDMEASURENAME () VAR SkipConversion = NOT ISCROSSFILTERED ( 'Currency' ) || ( SEARCH ( "#", MeasureName, 1, 0 ) > 0 ) || ( SEARCH ( "%", MeasureName, 1, 0 ) > 0 ) RETURN IF ( SkipConversion, [Sales Amount], VAR SelectedCurrency = SELECTEDVALUE ( 'Currency'[Currency Code] ) VAR DatesExchange = SUMMARIZE ( ExchangeRate, 'Date'[Calendar Year Month Number], 'ExchangeRate'[AverageRate] ) VAR Result = IF ( NOT ISBLANK ( SelectedCurrency ), IF ( SelectedCurrency = "USD", [Sales Amount], SUMX ( DatesExchange, [Sales Amount] * 'ExchangeRate'[AverageRate] ) ) ) RETURN Result )

By using the calculation groups, it is also possible to control the format string in a dynamic way through the Format String Expression property of the calculation item. For example, the following expression uses the *Currency[Currency Format]* column as a format string if there is a selection on the *Currency* table:

-- -- Calculation Group: Currency Conversion -- Calculation Item : Report Currency -- Format String Expression -- VAR MeasureName = SELECTEDMEASURENAME() VAR SkipConversion = NOT ISCROSSFILTERED ( 'Currency' ) || (SEARCH ( "#", MeasureName, 1, 0 ) > 0) || (SEARCH ( "%", MeasureName, 1, 0 ) > 0) VAR CurrencyFormat = SELECTEDVALUE ( 'Currency'[Currency Format], "#,0.00" ) VAR Result = IF ( SkipConversion, SELECTEDMEASUREFORMATSTRING(), CurrencyFormat ) RETURN Result

Using the previous format string expression, we get the following result.

Currency conversion should be applied to the imported data whenever possible, thus reducing the complexity of the calculation required at query time. When it is necessary to select the reporting currency at query time, pre-calculating all the reporting currencies could be impractical, so a dynamic currency conversion can be implemented in a DAX measure. You can also implement the currency conversion in a calculation group in order to avoid the proliferation of measure (with and without currency conversion) and to dynamically modify the format string of a measure.

]]>This article describes the types of relationships available in Power BI and Analysis Services, clarifying the differences in cardinality and filter propagation of physical relationships.

Power BI and Analysis Services rely on a semantic model based on the Tabular Object Model (TOM). Even though there are different tools to manipulate these models – we suggest using Tabular Editor – the underlying concepts are identical because the engine and the data model is the same: we call it the Tabular model. This article is based on a section of The Definitive Guide to DAX (second edition) and on the Different types of many-to-many relationships in Power BI video.

In a Tabular model, a physical relationship connects two tables through an equivalence over a single column. Under certain conditions that we will be describing later, a relationship can define a constraint on the content of a table, even though such constraint is different from the “foreign key constraint” many SQL developers are used to. However, the purpose of a relationship in a Tabular model is to transfer a filter while querying the model. Because transferring a filter is also possible by writing DAX code, we define two categories of relationships:

**Virtual relationships**: these are relationships that are not defined in the data model but can be described in the logical model. Virtual relationships are implemented through DAX expressions that transfer the filter context from one table to another. A virtual relationship can rely on more than one column and can be related to a logical expression that is not just a column identity. Virtual relationships can be used to define multi-column relationships and relationships based on a range.**Physical relationships**: these are the relationships defined in a Tabular model. The engine automatically propagates the filter context in a query according to the filter direction of the active relationship. Inactive relationships are ignored, but there are DAX functions that can manipulate the state and the filter direction of physical relationships: USERELATIONSHIP and CROSSFILTER.

In this article we first show virtual relationships. We then discuss in detail the different types of physical relationships, also introducing the concept of weak relationships. Finally, we introduce many-to-many relationships, describing the different kinds of many-to-many relationships available in a model and in Power BI, thus clarifying the differences between the available options and briefly discussing performance implications.

A virtual relationship is a filter applied to the filter context in a CALCULATE or CALCULATETABLE function. In order to propagate a filter from A to B, you have to read the values active in the filter context on A and apply a corresponding filter to B.

For example, imagine having to transfer the filter from *Customer* to *Sales*. The optimal way to do this is by using TREATAS:

CALCULATE ( [measure], TREATAS ( VALUES ( Customer[CustomerKey] ), -- Read filter from Customer Sales[CustomerKey] -- Apply filter to Sales ) )

The TREATAS function can also implement a logical relationship based on multiple columns. For example, the following code transfers the filter from a *Sales* table over to a *Delivery* table using *Order Number* and *Order Line Number*:

CALCULATE ( [measure], TREATAS ( SUMMARIZE ( Sales, Sales[Order Number], -- Read filter from Sales Sales[Order Line Number] -- based on multiple columns ), Delivery[Order Number], -- Apply filter to Delivery Delivery[Order Line Number] -- specifying corresponding columns ) )

**NOTE**: creating a virtual relationship based on multiple columns is not optimal from a performance standpoint. A physical, single-column relationship is orders of magnitude faster than a virtual relationship.

Because TREATAS is a relatively new function, there are other common techniques used in order to implement virtual relationships. We include these alternatives as a reference; Remember that TREATAS is the preferred way to create a virtual relationship based on column equivalency.

The first alternative technique is using INTERSECT:

CALCULATE ( [measure], INTERSECT ( ALL ( Sales[CustomerKey] ), -- Apply filter to Sales VALUES ( Customer[CustomerKey] ), -- Read filter from Customer ) )

The second alternative technique is using IN:

CALCULATE ( [measure], Sales[CustomerKey] IN VALUES ( Customer[CustomerKey] ) )

Another alternative technique is using FILTER and CONTAINS, which is older and slower than INTERSECT:

CALCULATE ( [measure], FILTER ( ALL ( Sales[CustomerKey] ), -- Apply filter to Sales CONTAINS ( VALUES ( Customer[CustomerKey] ), -- Read filter from Customer Customer[CustomerKey], Sales[CustomerKey] ) ) )

A virtual relationship can be based on a range, like the technique used to implement a dynamic segmentation:

CALCULATE ( [measure], FILTER ( ALL ( Sales[Net Price] ), -- Apply filter to Sales NOT ISEMPTY ( -- filtering prices FILTER ( -- within the active ranges PriceRange, PriceRange[Min Price] <= Sales[Net Price] && Sales[Net Price] < PriceRange[Max Price] ) ) ) )

A relationship can be *strong *or *weak*. In a ** strong** relationship the engine knows that the one-side of the relationship contains unique values. If the engine cannot check that the one-side of the relationship contains unique values for the key, then the relationship is

A weak relationship is not used as part of table expansion. Power BI has been allowing composite models since 2018; In a composite model, it is possible to create tables in a model containing data in both Import mode (a copy of data from the data source is preloaded and cached in memory using the VertiPaq engine) and in DirectQuery mode (the data source is only accessed at query time).

A single data model can contain tables stored in VertiPaq and tables stored in DirectQuery. Moreover, tables in DirectQuery can originate from different data sources, generating several DirectQuery data islands.

In order to differentiate between data in VertiPaq and data in DirectQuery, we say that data is either in the *continent *(VertiPaq) or in the *islands *(DirectQuery data sources).

The VertiPaq store is nothing but another data island. We call it the continent only because it is the most frequently used data island.

A relationship links two tables. If both tables belong to the same island, then the relationship is an intra-island relationship. If the two tables belong to different islands, then it is a cross-island relationship. Cross-island relationships are always weak relationships. Therefore, table expansion never crosses islands.

Relationships have a cardinality, of which there are three types. The difference between them is both technical and semantical. Here we do not cover the reasoning behind those relationships because it would involve many data modeling digressions that are outside of the scope of this article. Instead, we need to cover the technical details of physical relationships and the impact they have on the DAX code.

These are the three types of relationship cardinality available:

**One-to-many relationships:**This is the most common type of relationship cardinality. On the one-side of the relationship the column must have unique values; on the many-side the value can (and usually does) contain duplicates. Some client tools differentiate between one-to-many relationships and many-to-one relationships. Still, they are the same type of relationship. It all depends on the order of the tables: a one-to-many relationship between*Product*and*Sales*is the same as a many-to-one relationship between*Sales*and*Product*.**One-to-one relationships:**This is a rather uncommon type of relationship cardinality. On both sides of the relationship the columns need to have unique values. A more accurate name would be “zero-or-one”-to-“zero-or-one” relationship because the presence of a row in one table does not imply the presence of a corresponding row in the other table.**Many-to-many relationships:**On both sides of the relationship the columns can have duplicates. This feature was introduced in 2018, and unfortunately its name is somewhat confusing. Indeed, in common data modeling language “many-to-many” refers to a different kind of implementation, created by using pairs of one-to-many and many-to-one relationships. It is important to understand that in this scenario many-to-many does not refer to the many-to- many relationship, but instead to the many-to-many cardinality of the relationship.

In order to avoid ambiguity with the canonical terminology which uses many-to-many for a different kind of implementation, we use acronyms to describe the cardinality of a relationship:

- One-to-many relationships: We call them
**SMR**, which stands for Single-Many-Relationship. - One-to-one relationships: We use the acronym
**SSR**, which stands for Single-Single-Relationship. - Many-to-many relationships: We call them
**MMR**, which stands for Many-Many-Relationship.

Another important detail is that an MMR relationship is always weak, regardless of whether the two tables belong to the same island or not. If the developer defines both sides of the relationship as the many-side, then the relationship is automatically treated as a weak relationship with no table expansion taking place.

In addition, each relationship has a cross-filter direction. The cross-filter direction is the direction used by the filter context to propagate its effect. The cross-filter can be set to one of two values:

**Single:**The filter context is always propagated in one direction of the relationship and not the other way around. In a one-to-many relationship, the direction is always from the one-side of the relationship to the many-side. This is the standard and most desirable behavior.**Both:**The filter context is propagated in both directions of the relationship. This is also called a**bidirectional cross-filter**and sometimes just a bidirectional relationship. In a one-to-many relationship, the filter context still retains its feature of propagating from the one-side to the many-side, but it also propagates from the many-side to the one-side.

The cross-filter directions available depend on the type of relationship.

- In an
**SMR**relationship one can always choose single or bidirectional. - An
**SSR**relationship always uses bidirectional filtering. Because both sides of the relationship are the one-side and there is no many-side, bidirectional filtering is the only option available. - In an
**MMR**relationship both sides are the many-side. This scenario is the opposite of the SSR relationship: Both sides can be the source and the target of a filter context propagation. Thus, one can choose the cross-filter direction to be bidirectional, in which case the propagation always goes both ways. Or if the developer chooses single propagation, they also must choose which table to start the filter propagation from. As with all other relationships, single propagation is the best practice.

The following table summarizes the different types of relationships with the available cross-filter directions, their effect on filter context propagation, and the options for weak/strong relationship.

Type of Relationship |
Cross-filter Direction |
Filter Context Propagation |
Weak & Strong Type |

SMR | Single | From one-side to many-side | Weak if cross-island, strong otherwise |

SMR | Both | Bidirectional | Weak if cross-island, strong otherwise |

SSR | Both | Bidirectional | Weak if cross-island, strong otherwise |

MMR | Single | Must choose the source table | Always weak |

MMR | Both | Bidirectional | Always weak |

When two tables are linked through a strong relationship, the table on the one-side might contain the additional blank row in case the relationship is invalid –VertiPaq Analyzer reports a Referential Integrity violation when this happens. Thus, if the many-side of a strong relationship contains values not present in the table on the one-side, then a blank row is appended to the table on the one-side. The additional blank row is never added to tables involved in a weak relationship.

A physical relationship can automatically propagate a filter in the filter context based on the filter propagation direction. Moreover, a relationship defines a unique constraint on columns that are on the one-side of a relationship.

A logical many-to-many relationship between two tables representing two different business entities –also known as dimension tables in dimensional modeling – cannot be implemented using a single physical relationship.

For example, consider the case of two tables, *Customer* and *Account* in a bank. Every customer can have multiple accounts, and every account can be owned by several customers. The relationship between the *Customer* and *Account* tables requires a third table, *AccountsCustomers*, which contains one row for each existing relationship between *Account* and *Customer*.

The logical many-to-many relationship between Customer and Account is implemented through two physical SMR relationships. In order to enable the propagation of the filter context from *Customer* to *Account*, a bidirectional filter must be activated in the data model on the relationship between *Customer* and *AccountsCustomers*.

However, in order to avoid ambiguity caused by bidirectional filters in the data model, it is considered best practice to enable the bidirectional filter only in DAX measures that require that filter propagation. For example, if the relationship between *Customer* and *AccountsCustomers* is defined with a single direction filter, the bidirectional filter can be activated using CROSSFILTER as in the following measure:

SumOfAmt := CALCULATE ( SUM ( Transactions[Amount] ), CROSSFILTER ( AccountsCustomers[AccountKey], Accounts[AccountKey], BOTH ) )

It is important to note that a many-to-many relationship between business entities does not use any MMR relationship, which is required to solve another modeling issue related to different granularities.

A logical relationship can exist between two tables whose granularity is not compatible with a physical SMR relationship. For example, consider the following model where the granularity of the *Budget* table is defined at the *Customer[CountryRegion]* and *Product[Brand]* level. The *Customer* table has one row for each customer, so there could be multiple rows with the same value for the *CountryRegion* column. Similarly, there is one row for each *Product* and several rows can have the same value for the *Brand* column.

The cardinality of *Product* and *Customer* is the right one for the *Sales* table. In order to create a report that aggregates all the rows in *Budget* for a given product brand or customer country, one option is to apply two virtual relationships in a measure:

Budget Amount := CALCULATE ( SUM ( Budget[Budget] ), TREATAS ( VALUES ( 'Product'[Brand] ), Budget[Brand] ), TREATAS ( VALUES ( Customer[CountryRegion] ), Budget[CountryRegion] ) )

However, any technique that transfers the filter using a virtual relationship implemented through a DAX expression might suffer a performance penalty compared to a solution based on physical relationships.

In order to use the more efficient SMR relationships, we can create a *CountryRegions* calculated table that contains all the values existing in either *Customer[CountryRegion]* or *Budget[CountryRegion]*, and a *Brands* calculated table containing all the values existing in either *Product[Brand]* or *Budget[Brand]*:

Countries = DISTINCT ( UNION ( DISTINCT ( Customer[CountryRegion] ), DISTINCT ( Budget[CountryRegion] ) ) ) Brands = DISTINCT ( UNION ( DISTINCT ( Product[Brand] ), DISTINCT ( Budget[Brand] ) ) )

The *CountryRegions* and *Brands* tables can be connected using simple SMR relationships to the *Customer*, *Budget*, and *Product* tables. By enabling the bidirectional filter on the relationship between *Product* and *Brands*, we materialize through two physical relationships the virtual relationship that transfers the filter from *Product* to *Budget* using the *Brand* column in both tables. In a similar way, *Customer* transfers the filter to *Budget* through the bidirectional filter between *Customer* and *CountryRegions*.

Although we recommend not to use bidirectional filters in a data model, this is one of the few cases where this practice is safe. Indeed, the calculated tables and two physical SMR relationships we created are just an artifact to implement a virtual relationship between two tables; the resulting virtual relationships transfer the filter in a single direction: *Customer* filters *Budget* and *Product* filters *Budget*. There are no ambiguities. It is also important to note that the *Brands* and *CountryRegions* calculated tables do not add any new information to the data model. They are simply a way to materialize a table whose cardinality allows the creation of two SMR relationships for each virtual relationship we need.

The following picture shows the granularity of the three tables involved in the virtual relationship between *Product* and *Budget*.

The virtual relationship between *Product* and *Budget* does not have a one-side. We can say that such a virtual relationship has a many-to-many cardinality. The artifact we just created can be obtained using an MMR relationship with a single filter, as shown in the following model.

The relationship between *Product* and *Budget* must be defined with a many-to-many cardinality and a Single cross-filter direction. Power BI also displays a warning when you create an MMR relationship. It is important to keep the cross-filter direction as Single in order to avoid ambiguities in the data model.

Even though the MMR relationship is a single entity in the data model, its performance is not as good as the artifact that requires one calculated table and two SMR relationships. Indeed, there are no indexes created to support an MMR relationship in the storage engine. While an MMR relationship is faster than a virtual relationship implemented using TREATAS, it is not as good as an artifact based on two SMR relationships. The difference can be ignored when the cardinality of the column used in the relationship is in the range of 10 to 100 unique values. When there are thousands or more unique values in the column defining a relationship, you should consider creating the calculated table and the two SMR relationships instead of an MMR relationship.

The Tabular model allows users to create two different types of many-to-many relationships:

- The
**many-to-many relationships between dimensions**: these are the “classic” many-to-many relationships between dimensions in a dimensional model. These relationships require a bridge table containing data coming from the data source. The bridge table holds the information that defines the existing relationships between the entities. Without the bridge table, the relationships cannot be established. - The
**relationships at different granularities**: these relationships use columns that do not correspond to the identity of the table, so the cardinality is “many” on both ends of the virtual relationship. A physical implementation of this type of relationship can use two SMR relationships connected to an intermediate table populated with the unique values of the column defining the relationship. The Tabular model enables users to create this relationship using an MMR relationship, which is always a weak relationship.

The “classic” many-to-many relationship is implemented using two SMR relationships in the order one-many/many-one. In order to implement a filter propagation from *Customer* to *Account*, the bidirectional filter must be enabled on the relationship connecting the *Account* table to the bridge table – *AccountsCustomers* in this example.

The relationship at different granularities is implemented using two SMR relationships in the order many-one/one-many, or by using a single MMR relationship in a Tabular model. In both cases, the virtual relationship should propagate the filter in a single direction. If the relationship is implemented using two SMR relationships, in order to propagate the filter from *Product* to *Budget*, the bidirectional filter must be enabled on the relationship connecting the Product table to the intermediate table – *Brands* in this example.

A single MMR relationship can simplify the creation of a relationship at different granularities. It is considered best practice to specify the correct direction of the relationship, which should be Single. By default, Power BI creates these relationships with a bidirectional filter propagation, which can create ambiguities in the data model.

There are different ways to implement a logical relationship in a Tabular model. DAX provides flexible techniques to implement any kind of virtual relationship between entities, but only physical relationships in the data model provide the best results in terms of performance. Filter propagation direction and granularity of the relationships are key concepts to create the proper physical implementation of a logical relationship.

The classic notion of “many-to-many” relationships in Dimensional modeling does not correspond to what is referred to as a “many-to-many cardinality relationship” in a Tabular model. The latter is just an alternative to an artifact – based on two physical relationships and a calculated table – that enables users to define a relationship between entities with a granularity different from the granularity of the tables. For this reason, when referring to a Tabular model it is better to use the acronyms SMR and MMR to identify Single-Many-Relationship and Many-Many-Relationship, respectively. The term “many-to-many” relationship should only be used to describe the logical relationship between dimensions, which is implemented through two SMR relationships in the physical model.

]]>This article explains the behavior of LASTNONBLANK, LASTNONBLANKVALUE, and similar DAX functions, also providing patterns for performance optimization.

The DAX language can operate on aggregated values and on individual rows of tables in the data model, but it does not offer an immediate syntax to perform visual-level calculations over the cells visible in a report. The reason is that a DAX measure must work in any condition and in any report, which is why the notion of row context and filter context exists in the DAX language.

When you need to apply the notion of “first”, “previous”, “next”, “last” while iterating a table, you might have to struggle with these concepts in order to get the result you want. There are a few DAX functions that can simplify the syntax, but they must be used with caution in order to not hurt the query’s performance:

- FIRSTNONBLANK
- FIRSTNONBLANKVALUE
- LASTNONBLANK
- LASTNONBLANKVALUE

This article explains how these functions work, how to use them correctly, and how to apply optimizations resulting in DAX code that does not use these functions at all!

The Contoso model we use contains sales transactions where a customer has different orders on different dates. Each order can include multiple transactions and there can be multiple orders from the same customers on any one date. We analyze two scenarios:

**Sales PD**: Compare the*Sales Amount*of one day with the**previous day****Sales PO**: Compare the*Sales Amount*of one order with the**previous order**.

Because these calculations are dynamic, by filtering one customer the comparison only occurs considering the days and orders where there are transactions for that filtered customer. The dynamic nature of this calculation makes it very expensive to pre-calculate such amounts.

The first example compares one day with the previous day available by filtering the transactions of one customer.

The challenge is to find the previous day, and then retrieve the value of that day. Because one day can have multiple orders for one same customer, the calculation must operate at the day granularity level.

The second example is a similar report where we have one row for each order instead of one row for each day. The following visualization only shows the first orders made by the customer.

In this case, the calculation must be done working at the order granularity level, which is still an aggregation of one or more transactions.

The **Sales PD** scenario requires implementing the following business logic in a measure:

- Retrieve the previous date when there are transactions in the current filter context.
- Evaluate Sales Amount filtering the date found in the previous step.

We can implement the first step using the LASTNONBLANK function as shown in the *Prev. Order Date v1* measure:

Prev. Order Date v1 := IF ( NOT ISEMPTY ( Sales ), VAR FirstVisibleDate = MIN ( 'Date'[Date] ) VAR PreviousDate = CALCULATE ( LASTNONBLANK ( 'Date'[Date], [Sales Amount] ), 'Date'[Date] < FirstVisibleDate ) VAR Result = IF ( NOT ISEMPTY ( Sales ), PreviousDate ) RETURN Result )

The LASTNONBLANK function iterates all the values in the column provided as the first argument. For each value it evaluates the expression provided in the second argument, returning the largest value included in the first argument that generated a non-blank result as it was evaluating the expression provided in the second argument. In this case, the *Sales Amount* measure is evaluated for each date in the filter context. Because we need to retrieve the last date with a transaction made by one same customer filtered in the report, we have to modify the filter context for the *Date* table; otherwise, for each cell of the report we would only consider the date of the current line in the report. This is the reason why the LASTNONBLANK function is wrapped inside a CALCULATE function that filters only the date before the date visible in the cell of the report:

CALCULATE ( LASTNONBLANK ( 'Date'[Date], [Sales Amount] ), 'Date'[Date] < FirstVisibleDate )

When LASTNONBLANK is executed providing a column reference as the first argument, DAX implicitly rewrites the expression by retrieving the values using DISTINCT in a potential context transition. The previous code is internally rewritten as:

CALCULATE ( LASTNONBLANK ( CALCULATETABLE ( DISTINCT ( 'Date'[Date] ) ), [Sales Amount] ), 'Date'[Date] < FirstVisibleDate )

You can also provide a table as a first argument of LASTNONBLANK. Thus, the same expression could have been written as:

LASTNONBLANK ( CALCULATETABLE ( DISTINCT ( 'Date'[Date] ), 'Date'[Date] < FirstVisibleDate ), [Sales Amount] )

The FIRSTNONBLANK function is identical to LASTNONBLANK, with the only difference that it returns the smallest value instead of the largest value included in the first argument that generated a non-blank result when evaluating the expression provided in the second argument.

The following screenshot shows the result of the *Prev. Order Date v1* measure. As expected, this is simply the date of the previous line in the report.

However, if you give some thought to the work that must be done, LASTNONBLANK and FIRSTNONBLANK could potentially iterate a large number of rows. When we filter all the dates before a certain date, the number of rows considered could be in the order of thousands if you have a Date table with 10 years, and this must be repeated for every cell in the report. In order to avoid this useless task, the *Prev. Order Date v1 *measure offers an important optimization: it executes the search if and only if the current filter context has rows in the *Sales* table:

IF ( NOT ISEMPTY ( Sales ), ...

This means that in the previous report the LASTNONBLANK function is only executed 16 times instead of hundreds of times because of the several years available in the model. This way, we also avoid displaying a previous order date for dates when there are no transactions.

Now we can implement the second step of the calculation. The date obtained by the *Prev. Order Date v1* measure can be used as a filter to retrieve the *Sales Amount* value on that date. The *Sales PD v1* measure retrieves the amount value by using the result of LASTNONBLANK as a filter to compute *Sales Amount*:

Sales Previous Date v1 := VAR FirstVisibleDate = MIN ( 'Date'[Date] ) VAR PreviousOrderDate = CALCULATETABLE ( LASTNONBLANK ( 'Date'[Date], [Sales Amount] ), 'Date'[Date] < FirstVisibleDate ) VAR Result = CALCULATE ( [Sales Amount], PreviousOrderDate ) RETURN Result

In this case the *Sales Previous Date v1* measure does not check whether there are *Sales* rows visible in the filter context. This is because the measure is hidden and only used by other measures, so it is more efficient to only implement such a condition in the measure shown to the user. This is the definition of the *Sales PD v1* and *Diff. PD v1* measures:

Sales PD v1 := IF ( NOT ISEMPTY ( Sales ), [Sales Previous Date v1] ) Diff. PD v1 := IF ( NOT ISEMPTY ( Sales ), VAR CurrentAmount = [Sales Amount] VAR PreviousAmount = [Sales Previous Date v1] VAR Result = CurrentAmount - PreviousAmount RETURN Result )

We have seen that the result provided by LASTNONBLANK and FIRSTNONBLANK is a table with just one column and one row that can be used as a filter in a CALCULATE function. In our example, we are evaluating *Sales Amount* by filtering only the date obtained by evaluating the same *Sales Amount* for all the dates passed to LASTNONBLANK. In other words, we evaluate *Sales Amount* twice for the same date. The LASTNONBLANKVALUE function can slightly simplify the code required.

The FIRSTNONBLANKVALUE and LASTNONBLANKVALUE functions have the same behavior as FIRSTNONBLANK and LASTNONBLANK, with the exception that the value returned is the result of the expression evaluated for the item that has been found by FIRSTNONBLANKVALUE or LASTNONBLANKVALUE.

We can imagine that this code:

CALCULATE ( [Sales Amount], LASTNONBLANK ( 'Date'[Date], [Sales Amount] ) )

Can be written this way:

LASTNONBLANKVALUE ( 'Date'[Date], [Sales Amount] )

In theory, the engine could reuse the value computed for *Sales Amount* thus avoiding the double evaluation, and resulting in faster execution. For sure, we can write a smaller amount of code. However, this new syntax does not magically solve all the performance issues. The cardinality of the iterator is the same, so the possible performance bottlenecks are similar. The advantage of reusing one measure is usually minimal.

We have a new definition of the measures in the report. These measures can be identified by the suffix v2 and return the same result as in the v1 version:

Sales Previous Date v2 := VAR FirstVisibleDate = MIN ( 'Date'[Date] ) VAR Result = CALCULATE ( LASTNONBLANKVALUE ( 'Date'[Date], [Sales Amount] ), 'Date'[Date] < FirstVisibleDate ) RETURN Result Sales PD v2 := IF ( NOT ISEMPTY ( Sales ), [Sales Previous Date v2] ) Diff. PD v2 := IF ( NOT ISEMPTY ( Sales ), VAR CurrentAmount = [Sales Amount] VAR PreviousAmount = [Sales Previous Date v2] VAR Result = CurrentAmount - PreviousAmount RETURN Result )

Using LASTNONBLANK and LASTNONBLANKVALUE can provide the required result in a relatively easy way, but it might be costly. These functions are iterators and they evaluate within a row context the expression provided in the second argument; this oftentimes requires a context transition as is the case every time you evaluate a measure in an iterator. Therefore, there is an impact in execution time and in memory required at query time – because of the materialization required by the context transition.

These functions can be used without deeper knowledge of the data model, but the cost for this could be significant. In order to achieve another improvement, it is necessary to get rid of the unnecessary context transitions. This requires an implementation that does not use LASTNONBLANK and LASTNONBLANKVALUE at all!

Most of the times, we can achieve better performance by making a simple assumption: we do not need to evaluate *Sales Amount* for every date, we can simply check whether the *Sales* table has at least one row for that customer and date. If yes, we assume that *Sales Amount* returns a non-blank value. If you can make that assumption, then the following implementation results in a much faster execution that also requires a lower amount of memory at query time.

We can retrieve the date of the previous order by just applying a MAX aggregation over the *Sales[Order Date]* column. Because the Sales table only has rows for the existing orders, this implicitly provides the date of the previous order once we apply the proper filter context – just using the same filter for the previous date we also applied in v1 and v2 versions of the measures:

Prev. Order Date v3 := IF ( NOT ISEMPTY ( Sales ), VAR FirstVisibleDate = MIN ( 'Date'[Date] ) VAR PreviousDate = CALCULATE ( MAX ( Sales[Order Date] ), 'Date'[Date] < FirstVisibleDate ) VAR Result = IF ( NOT ISEMPTY ( Sales ), PreviousDate ) RETURN Result )

In order to compute the Sales Amount, we leverage TREATAS in order to change the data lineage of the result of the PreviousDate variable. We apply this filter to compute *Sales Amount*. The advantage of this approach is the reduced materialization, which is very important when you consider how a measure can scale out with multiple concurrent users:

Sales Previous Date v3 := VAR FirstVisibleDate = MIN ( 'Date'[Date] ) VAR PreviousDate = CALCULATE ( MAX ( Sales[Order Date] ), 'Date'[Date] < FirstVisibleDate ) VAR FilterPreviousDate = TREATAS ( { PreviousDate }, 'Date'[Date] ) VAR Result = CALCULATE ( [Sales Amount], FilterPreviousDate ) RETURN Result

The structure of the definition of Sales PD v3 and Diff. PD v3 is identical to the v1 and v2 versions; the only difference is in the name of the internal measures referenced.

It is a common misconception that LASTNONBLANK and LASTNONBLANKVALUES can only be used with columns containing dates. While this is probably the most common scenario, there is no such restriction in DAX: you can use any data type for the column provided in the first argument, including strings. This can be useful to implement the **Sales PO** scenario we previously described. The report to obtain has one row for each order.

In this case the first task is to retrieve the previous order number. The following implementation uses LASTNONBLANK and removes the filter from *Date* in order to make sure to find the previous order also on other dates:

Prev. Order Number v1 := IF ( NOT ISEMPTY ( Sales ), VAR FirstVisibleOrder = MIN ( Sales[Order Number] ) VAR PreviousOrder = CALCULATE ( LASTNONBLANK ( DISTINCT ( Sales[Order Number] ), [Sales Amount] ), REMOVEFILTERS ( 'Date' ), Sales[Order Number] < FirstVisibleOrder ) VAR Result = IF ( NOT ISEMPTY ( Sales ), PreviousOrder ) RETURN Result )

From a performance standpoint, we prefer the implementation that uses a simple MAX aggregation:

Prev. Order Number v3 := IF ( NOT ISEMPTY ( Sales ), VAR FirstVisibleOrder = MIN ( Sales[Order Number] ) VAR PreviousOrder = CALCULATE ( MAX ( Sales[Order Number] ), REMOVEFILTERS ( 'Date' ), Sales[Order Number] < FirstVisibleOrder ) VAR Result = IF ( NOT ISEMPTY ( Sales ), PreviousOrder ) RETURN Result )

In order to retrieve the Sales Amount of the previous order, the implementation based on LASTNONBLANKVALUE reduces the amount of code required:

Sales Previous Order v2 := VAR FirstVisibleOrder = MIN ( Sales[Order Number] ) VAR Result = CALCULATE ( LASTNONBLANKVALUE ( DISTINCT ( Sales[Order Number] ), [Sales Amount] ), REMOVEFILTERS ( 'Date' ), Sales[Order Number] < FirstVisibleOrder ) RETURN Result

This time the optimized version is more verbose, but this could provide a very important performance optimization!

Sales Previous Order v3 := VAR FirstVisibleOrder = MIN ( Sales[Order Number] ) VAR PreviousOrder = CALCULATE ( MAX ( Sales[Order Number] ), REMOVEFILTERS ( 'Date' ), Sales[Order Number] < FirstVisibleOrder ) VAR FilterPreviousOrder = TREATAS ( { PreviousOrder }, Sales[Order Number] ) VAR Result = CALCULATE ( [Sales Amount], REMOVEFILTERS ( 'Date' ), FilterPreviousOrder ) RETURN Result

The difference in performance between the different solutions depends on the cardinality applied by LASTNONBLANK (*Date[Date]* and *Sales[Order Number]* in our two scenarios) and the complexity of the measure being evaluated (*Sales Amount*). By filtering a single customer as we did in the examples shown in this article, it is hard to measure a difference because the execution time to refresh the report is below 10-20ms. For this reason, we created a benchmark query by removing the filter by customer so that we have more valid dates to compute. The measures considered are *Sales PD* and *Sales PO*, related to the corresponding scenarios. For each measure we used three versions:

**v1**: using LASTNONBLANK**v2**: using LASTNONBLANKVALUE**v3**: using custom DAX

In order to benchmark the **Sales PD** scenario, we used the following query:

EVALUATE TOPN ( 501, SUMMARIZECOLUMNS ( 'Date'[Date], "Sales_Amount", 'Sales'[Sales Amount], "Sales_PD", 'Sales'[Sales PD v1] -- Change v1, v2, v3 ), 'Date'[Date], 1 ) ORDER BY 'Date'[Date]

The result obtained for *Sales PD v1* using LASTNONBLANK is the following one. The cost of the storage engine is minimal and most of the operation are executed in the formula engine. A more complex measure or a bigger *Sales* table would affect the storage engine cost (the *Sales* table used in this example only contains 100,000 rows).

The same benchmark for *Sales PD v2* using LASTNONBLANKVALUE displays an unexpectedly slower execution time. Thus, a simpler syntax does not necessarily translate to better performance.

Finally, the benchmark for *Sales PD v3* (using custom DAX expressions) provides the best performance in terms of execution time.

The custom DAX formula reduces the execution time by 20% compared to v1 (LASTNONBLANK) and by 35% compared to v2 (LASTNONBLANKVALUE).

For the **Sales PD** scenario, we used the following query to perform the benchmark:

EVALUATE TOPN ( 501, SUMMARIZECOLUMNS ( 'Sales'[Order Number], 'Date'[Date], __DS0FilterTable, "Sales_Amount", 'Sales'[Sales Amount], "Sales_PO", 'Sales'[Sales PO v1] -- Change v1, v2, v3 ), 'Sales'[Order Number], 1, 'Date'[Date], 1 ) ORDER BY 'Sales'[Order Number], 'Date'[Date]

Because of the granularity of the *Sales[Order Number] *column, the execution is much slower in absolute terms compared to the previous scenario, showing a bigger difference between the different versions of the measures.

The result obtained for *Sales PO v1* using LASTNONBLANK is the following one.

The result obtained for *Sales PO v2* using LASTNONBLANKVALUE is the following one.

Finally, the Sales PO v3 measure provides the best execution time.

In this case the performance differences are only visible with a large granularity. The custom DAX formula reduces the execution time by 19% compared to v1 (LASTNONBLANK) and by 32% compared to v2 (LASTNONBLANKVALUE).

The differences shown might vary for different models and different queries. However, we do not expect to see cases where LASTNONBLANK or LASTNONBLANKVALUE are faster than the custom DAX solution. In some case, the differences are too small to be perceived by the report users.

The DAX functions FIRSTNONBLANK, FIRSTNONBLANKVALUE, LASTNONBLANK, and LASTNONBLANKVALUE are iterators that could materialize data at the granularity of the column provided as a first argument.

Most of the times, it is possible to get the same result by writing custom DAX code – code that might provide better performance and scalability thanks to a reduced level of materialization. In order to do so, it is necessary to make an assumption about the data model: the presence of transactions in the fact table (Sales in our example) is enough to consider the element iterated, without having to evaluate a measure to check whether it does or does not return a blank value.

While the difference in query performance may be small in simple reports, there could be a larger impact in complex reports.

]]>

DAX Formatter was created in 2013 to help users format their code based on the Rules for DAX Code Formatting published in October 2012. Because the main goal is to make the code immediately readable and recognizable, DAX Formatter purposely does not provide many options to customize the format. But we are now considering a small change.

Over the years, we only added three options:

**Separators**: the choice between the**comma**(,) and**semicolon**(;) separators is automatic, but you can override this setting manually. This option is required to accomplish the different formatting applied by Excel and Power BI with certain locale settings.**Lines**: how much code to include within a line of code. When we publish DAX samples in books and articles, for educational purposes we favor the**short lines**setting, which tries to not include more than one argument in each line. The**long****lines**setting allows short function calls to be formatted inline without breaking arguments off into different lines. We suggest considering long lines as a default in real world projects.**Light / Dark mode**: you can switch the style of the website to dark mode, which changes your background to black.

All these settings are persisted in your cookies if you allow them, so you do not have to change the configuration every time you open www.daxformatter.com.

Defining the formatting rules required weeks of discussion and allowed the adoption of a standard way to write DAX, making it easier to read DAX code written by someone else. However, when we defined the rules, most of the DAX editors available used proportional fonts, which is an unusual choice for any programming or query language. Reading DAX was harder than necessary because the opening round bracket was too close to the last letter of the function name. Therefore, we included this rule:

- Always put a space before parentheses ‘(‘ and ‘)’

The (intentional) consequence of the rule is that we have a space between the function name and the opening round bracket:

CALCULATE ( [measure], table[column] = 5 ) CALCULATE ( [measure], table[column] = 5 )

These days, most of the DAX editors (in Power BI, DAX Studio, and Tabular Editor) use non-proportional fonts, making this space unnecessary and not easy to include when writing code with IntelliSense. We are now considering changing the formatting rule into this:

- Always put a space before parentheses ‘(‘ and ‘)’, with the exception of the open bracket after a function name.

By adopting this rule we get this formatted code:

CALCULATE( [measure], table[column] = 5 ) CALCULATE( [measure], table[column] = 5 )

Should we adopt this change? We would like your feedback.

We are almost convinced of our decision, but we want to make sure we are considering all the elements at stake. For example, changing one of the rules might have an impact on people using code version and batches that automatically format the code at regular intervals or when there is a new release of a mode. We invite you to try the new format and tell us why we should not change it. The main obstacle we have found so far is simply that we are used to reading code this way, which is not enough to stop the change.

You can start using this standard today. Both DAX Formatter and DAX Studio already provide an option to change this setting.

This is the new settings area available in DAX Formatter.

The new option is:

**Spacing**: the**Best Practice**setting chooses what we consider in the official formatting rules. You can choose to manually enforce the original rule (A**space after function**) or the proposed change (**No space after function)**. At the time of writing, Best Practice corresponds to the original rule, but when we decide to apply the change to the rules, the behavior of Best Practice will .

A similar option is available in the DAX Studio settings.

By default, the checkbox is unchecked, but future versions of DAX Studio might have the checkbox checked by default after we decide to apply the change to the rules.

It is now your turn to evaluate the change and provide your feedback.

Thanks in advance – Enjoy DAX!

]]>This article describes how to write efficient DAX expressions that compute the compound interest of incremental investments made throughout the holding period, using the PRODUCTX function.

If you make an investment and hold it for a period of time, the future value of your investment depends on the rate of return over the investment holding period.

For instance, consider an investment of $100 in a simple debt instrument with an annual fixed interest rate of 10.0% and a five-year maturity and where all owed amounts (principal plus interest) are paid in one lump sum at the end of the five year period. The significance of this type of debt instrument is tied to the fact that interest income generated each year is effectively reinvested or added to the amount that is owed, thereby enabling the interest income to ** compound** over the holding period.

In fact, there is a simple math equation for determining the future value of such an instrument:

*FV = PV(1 + r) ^{n}*

or more specifically:

*Future Value = Present Value x (1 + Rate) ^{ number of periods/years}*

In our case:

*Future Value = $100 x (1 + 10%) ^{ 5 }= $161.05 *

In other words, if we paid $100 today (the “Present Value”), at the end of Year 5 we would receive $161.05 (the “Future Value”), generating $61.05 of total interest income.

If, on the other hand, we received the interest income from the debt instrument at the end of each year (i.e., the interest income is not reinvested and consequently not allowed to compound), we would receive $10 at the end of the first four years ( = 10% x $100 for each year) plus $110 at the end of the fifth year ($10 of interest income for the fifth year plus repayment of the initial $100 principal amount). Summarizing this second example, if we invested $100 today, we would receive $150 by the end of Year 5.

By effectively reinvesting the interest income in the first example, we are able to generate $11.05 more of total interest income ( = $61.05 – $50) than if we were to receive our interest income in a distribution at the end of each year even though both examples accrue interest at the same rate. Of course, in the latter case, we receive the money sooner, which has its own considerations. In any event, these very simple cases demonstrate the power of ** compounding** and

Coincidentally, both debt instrument examples are what is known as “bullet” loans, where the entire principal amount ($100) is repaid in one lump sum at maturity (at the end of Year 5). In the first example the interest income payments are deferred until maturity, thereby allowing the interest to compound over the holding period. In the second example, the interest income payments are made at the end of each year, which means that the amount of debt accruing interest each year is always the same ($100).

Now let us consider a slightly more complex investment with compounding interest where the interest rate differs year-to-year. Because the interest rate varies, you can’t use the simple formula above (or its FV function equivalent in Excel). Rather, you must effectively stack each year on top of the preceding year and calculate year-by-year. See the table below.

This scenario is straightforward using Excel, because you can simply take the value of the previous year (which is typically in the previous row) as the starting value for the next year to compute the next year’s interest income.

Performing the calculation of compound interest in DAX is challenging, because there is no way to reference the result value in the previous year as we can easily do in Excel. Moreover, in a real data model it is possible that incremental investments are made throughout the holding period. Accordingly, a report could look like the following scenario where an incremental investment is added on the first day of each year over the five-year period (different investment dates during each year would require a different calculation).

In this case, the value at the end of Year 2 is computed as the sum of $110.00 (value at the end of Year 1) and $80.00 (the value invested on day one of Year 2) increased by 9.0% (the corresponding interest rate for the period).

The sample model we are using for this demo is the following.

The *Investments* table contains one row for each investment made over the years, whereas the *Rates* table contains the different interest rates on a yearly basis. *Date* is just a simple calendar table.

The main idea to solve the scenario is to split the larger problem into simpler problems. Focus on a year, for example 2022. In 2022 there is an investment of $120. Additionally, in 2022, there are also investments made earlier in 2020 and 2021. Each of these investments has been active for a different number of years: the $100 invested in 2020 have been active for 2 years in 2022, while the $80 invested in 2021 have only been active for one year in 2022.

If we focus on a couple of years – the beginning year and the end year of the investment – the calculation is straightforward: it is enough to multiply the investment amount by *(1+interest rate)* for every year when the investment is active. This produces the interests gained by the investment over the years when it is active.

With that in mind, the formula to compute the *Value End of Year *measure can be written as follows, also using the PRODUCTX function:

Value End of Year := VAR SelectedYear = SELECTEDVALUE ( 'Date'[Year], -- Find the current year in the report MAX ( 'Date'[Year] ) -- default to the last year available ) VAR PreviousYears = -- PreviousYears contains the FILTER ( -- years BEFORE the current one ALL ( 'Date'[Year] ), 'Date'[Year] <= SelectedYear ) VAR PreviousInvestments = -- PreviousInvestments contains ADDCOLUMNS ( -- the amount of all the investments PreviousYears, -- made in previous years "@InvestedAmt", CALCULATE ( SUM ( Investments[Amount] ) ) ) VAR Result = -- For each previous investment SUMX ( -- calculate the compound interest PreviousInvestments, -- over the years and sum the results VAR InvestmentYear = 'Date'[Year] VAR InvestmentAmt = [@InvestedAmt] VAR YearsRateActive = FILTER ( ALL ( Rates ), VAR YearRate = YEAR ( Rates[Date] ) RETURN YearRate >= InvestmentYear && YearRate <= SelectedYear ) VAR CompundInterestRateForInvestment = PRODUCTX ( YearsRateActive, 1 + Rates[InterestRate] ) RETURN InvestmentAmt * CompundInterestRateForInvestment ) RETURN Result

Although quite long, the formula should be fast enough in most scenarios. Indeed, its complexity does not depend on the size of the *Investments* table, because that table is scanned only once to perform the grouping by year while computing the *PreviousInvestments* variable.

The speed of the measure depends solely on the number of years: *PreviousInvestments* contains one row per year; For each of these years, the formula performs an iteration for all the years up to the selected date.

Therefore, the complexity is around N squared, where N is the number of years of the investment – in real-world scenarios, it should perform well. This approach is possible because **the investment is made on the first day of each year** – a more complex calculation is required if investment dates are different, but the principle is similar and the calculation over the years should be made at the year level in order to be efficient.

Anyway, the interesting part of this formula is not in its performance. The most relevant detail is that in order to solve the scenario, you need to rethink the algorithm. Trying to solve the formula by thinking in an iterative way – computing the value one year after the other – does not work well in DAX. Rethinking it in terms of evaluating the current value of each investment provides a straight path to the solution that is also more efficient.

The previous calculation is valid assuming that each investment is made on the first day of each year. If you have a more complex scenario where each investment can have a different date, then you can reuse the previous logic by assuming that the amount you have at the beginning of each year is the amount of the investments made in the previous year added to the interest income at the end of the previous year. This approach also handles capital expenditure reductions, which are recorded as negative transactions in the *Investments* table. The following screenshot shows an additional investment of $80.00 made on July 1, 2021 and a disinvestment of $120.00 made on January 1, 2022.

The following screenshot shows the expected result.

There are several approaches we can use to implement this solution. A simple one could be to create a measure to compute at the end of each year, the accrued amount of the investments made in that year, and then reuse the measure using the logic of the previous formula. However, this approach results in a query plan with nested iterators that are not ideal in DAX. Therefore, we implemented a different approach that guarantees a better query plan – though this requires a single, more verbose measure.

The following implementation of the *Value End of Year* measure calculates the sum of the investments at the day level only once. Most of the calculation takes place in the formula engine. However, because the complexity is around D + (N squared) where D is the number of days with investments and N is the number of years, we experienced much better performance with this verbose approach:

Value End of Year := VAR SelectedYear = SELECTEDVALUE ( 'Date'[Year], -- Find the current year in the report MAX ( 'Date'[Year] ) -- default to the last year available ) VAR YearlyInterestRates = ADDCOLUMNS ( CALCULATETABLE ( ADDCOLUMNS ( SUMMARIZE ( Rates, 'Date'[Year] ), "@InterestRate", CALCULATE ( SELECTEDVALUE ( Rates[InterestRate] ) ) ), REMOVEFILTERS ( 'Date' ) ), "@DailyRate", VAR DaysInYear = CALCULATE ( COUNTROWS ( 'Date' ) ) VAR DailyRate = [@InterestRate] / DaysInYear RETURN DailyRate ) VAR LastDateInReport = CALCULATE ( MAX ( 'Date'[Date] ), ALLSELECTED ( 'Date' ) ) VAR InvestmentsByDate = CALCULATETABLE ( ADDCOLUMNS ( SUMMARIZE ( Investments, 'Date'[Date], 'Date'[Year] ), "@InvestmentAtDate", CALCULATE ( SUM ( Investments[Amount] ) ) ), 'Date'[Date] <= LastDateInReport ) VAR InvestmentsByDateWithRates = NATURALLEFTOUTERJOIN ( InvestmentsByDate, YearlyInterestRates ) VAR PreviousYears = -- PreviousYears contains the SELECTCOLUMNS ( -- years INCLUDING the current one FILTER ( YearlyInterestRates, 'Date'[Year] <= SelectedYear ), "@Year", 'Date'[Year] ) VAR PreviousInvestments = -- PreviousInvestments contains ADDCOLUMNS ( -- for each year the amount of PreviousYears, -- all the investments made in the "@AccruedAmt", -- previous year with interest applied VAR CurrentYear = [@Year] VAR StartOfNextYear = DATE ( CurrentYear + 1, 1, 1 ) VAR InvestmentsInCurrentYear = FILTER ( InvestmentsByDateWithRates, 'Date'[Year] = CurrentYear ) VAR AccruedAmount = -- For each investment within the year SUMX ( -- sum the amount and accrued interest InvestmentsInCurrentYear, VAR InvestedAmount = [@InvestmentAtDate] VAR DaysInvestment = StartOfNextYear - 'Date'[Date] VAR EffectiveRate = [@DailyRate] * DaysInvestment VAR AccruedInterest = InvestedAmount * EffectiveRate VAR Result = InvestedAmount + AccruedInterest RETURN Result ) RETURN AccruedAmount ) VAR Result = -- For each previous investment SUMX ( -- calculate the compound interest PreviousInvestments, -- over the years and sum the results VAR InvestmentYear = [@Year] VAR InvestmentAmt = [@AccruedAmt] VAR YearsRateActive = FILTER ( YearlyInterestRates, 'Date'[Year] > InvestmentYear && 'Date'[Year] <= SelectedYear ) VAR CompundInterestRateForInvestment = PRODUCTX ( -- Compute the compound YearsRateActive, -- interest for the 1 + [@InterestRate] -- previous years ) -- or returns 1 if there + ISEMPTY ( YearsRateActive ) -- are no previous years RETURN InvestmentAmt * CompundInterestRateForInvestment ) RETURN Result

The previous measure only works for a report at the year granularity. This also guarantees a better optimization. If the report needs to be available at different granularities such as quarter and month, the measure is more complex because it must compute the accrued interest on the last day of the period selected. The following is an example of the desired report.

The formula used to produce this report is longer and slower. We suggest only resorting to this approach when absolutely necessary, because it can produce slower reports:

Value End of Period := VAR LastDateSelected = MAX ( 'Date'[Date] ) VAR LastYearSelected = MAX ( 'Date'[Year] ) VAR YearlyInterestRates = ADDCOLUMNS ( CALCULATETABLE ( ADDCOLUMNS ( SUMMARIZE ( Rates, 'Date'[Year] ), "@InterestRate", CALCULATE ( SELECTEDVALUE ( Rates[InterestRate] ) ) ), REMOVEFILTERS ( 'Date' ) ), "@DailyRate", VAR CurrentYear = 'Date'[Year] VAR DaysInYear = CALCULATE ( COUNTROWS ( 'Date' ), REMOVEFILTERS ( 'Date' ), 'Date'[Year] = CurrentYear ) VAR DailyRate = DIVIDE ( [@InterestRate], DaysInYear ) RETURN DailyRate ) VAR LastDateInReport = CALCULATE ( MAX ( 'Date'[Date] ), ALLSELECTED ( 'Date' ) ) VAR InvestmentsByDate = CALCULATETABLE ( ADDCOLUMNS ( SUMMARIZE ( Investments, 'Date'[Date], 'Date'[Year] ), "@InvestmentAtDate", CALCULATE ( SUM ( Investments[Amount] ) ) ), 'Date'[Date] <= LastDateInReport ) VAR InvestmentsByDateWithRates = NATURALLEFTOUTERJOIN ( InvestmentsByDate, YearlyInterestRates ) VAR PreviousYears = -- PreviousYears contains the SELECTCOLUMNS ( -- years INCLUDING the current one FILTER ( YearlyInterestRates, 'Date'[Year] < LastYearSelected ), "@Year", 'Date'[Year] ) VAR PreviousInvestments = -- PreviousInvestments contains ADDCOLUMNS ( -- for each year the amount of PreviousYears, -- all the investments made in the "@AccruedAmt", -- previous year with interest applied VAR CurrentYear = [@Year] VAR StartOfNextYear = DATE ( CurrentYear + 1, 1, 1 ) VAR InvestmentsInCurrentYear = FILTER ( InvestmentsByDateWithRates, 'Date'[Year] = CurrentYear ) VAR AccruedAmount = -- For each investment within the year SUMX ( -- sum the amount and accrued interest InvestmentsInCurrentYear, VAR InvestedAmount = [@InvestmentAtDate] VAR DaysInvestment = StartOfNextYear - 'Date'[Date] VAR EffectiveRate = [@DailyRate] * DaysInvestment VAR AccruedInterest = InvestedAmount * EffectiveRate VAR Result = InvestedAmount + AccruedInterest RETURN Result ) RETURN AccruedAmount ) VAR FilterLastYearInvestments = FILTER ( InvestmentsByDateWithRates, 'Date'[Year] = LastYearSelected ) VAR LastYearInvestments = SUMX ( FILTER ( FilterLastYearInvestments, 'Date'[Date] <= LastDateSelected ), VAR InvestedAmount = [@InvestmentAtDate] VAR DaysInvestment = LastDateSelected - 'Date'[Date] + 1 VAR EffectiveRate = [@DailyRate] * DaysInvestment VAR AccruedInterest = InvestedAmount * EffectiveRate VAR Result = InvestedAmount + AccruedInterest RETURN Result ) VAR DaysLastYear = LastDateSelected - DATE ( LastYearSelected, 1, 1 ) + 1 VAR ValuePreviousInvestments = -- For each previous investment SUMX ( -- calculate the compound interest PreviousInvestments, -- over the years and sum the results VAR InvestmentYear = [@Year] VAR InvestmentAmt = [@AccruedAmt] VAR YearsRateActive = FILTER ( YearlyInterestRates, 'Date'[Year] > InvestmentYear && 'Date'[Year] <= LastYearSelected ) VAR CompundInterestRateForInvestment = PRODUCTX ( -- Compute the compound YearsRateActive, -- interest for the previous years VAR YearRate = 'Date'[Year] VAR InterestPreviousYears = [@InterestRate] * (YearRate <> LastYearSelected) VAR InterestLastYear = [@DailyRate] * DaysLastYear * (YearRate = LastYearSelected) VAR Result = 1 + InterestPreviousYears + InterestLastYear RETURN Result ) -- or returns 1 if there + ISEMPTY ( YearsRateActive ) -- are no previous years VAR FinalInvestmentValue = InvestmentAmt * CompundInterestRateForInvestment RETURN FinalInvestmentValue ) VAR Result = ValuePreviousInvestments + LastYearInvestments RETURN Result

The first part of this article presented the overall concepts of dealing with compound interest. We then presented an efficient first DAX approach in computing the compound interest for incremental investments made throughout the holding period. In order to achieve high performance, we moved the calculation logic to the year level. This required an assumption that all investments are made on the first day of each year exclusively.

Later on in the article, we show more complete calculations where every investment can be made at any time and the report can have different granularities. The fundamental idea is the same: compound interest is always calculated at the year level, whereas the value of each investment at the end of the first year is computed with a day granularity. This approach is more verbose, but way more efficient from a computational point of view.

*Special thanks to our friend **Rick Scanlon, a partner at Innovation Endeavors** who helped us write out the definition of compound interest and reviewed the business logic presented in this article.*

This article describes the reasons why an Excel pivot table may be slow when using the Analyze in Excel feature of Power BI.

Power BI uses the Analysis Services Tabular engine to process queries coming from reports. When you create a Power BI report, the query sent to the engine uses the DAX language, which usually produces more optimized query plans. However, the Analysis Services engine is also able to understand MDX queries sent to a Tabular model. Excel is a client tool that can connect to an Analysis Services database sending queries in MDX.

Thanks to the bilingual feature (DAX/MDX) in Analysis Services, Microsoft quickly introduced a new server technology in the market: the Tabular model, based on DAX. This was done by leveraging tools that existed for another technology, the Multidimensional model based on MDX. Many optimizations were introduced for DAX queries over the last few years, in order to improve the Power BI user experience; however, sometimes there are no equivalent optimizations for MDX queries, resulting in Excel reports that run slower than equivalent Power BI reports. This article describes what is causing many of these differences; it also presents possible workarounds and highlights features that Microsoft might want to make a priority, in order to improve the experience for Excel users connected to Power BI datasets or Analysis Services Tabular databases.

Power BI generates one or more DAX queries for a single matrix in a report. For example, the following matrix displays a few measures grouped by customer *Country* and sorted by *Sales Amount*.

The measures used in this report are straightforward:

Quantity # := SUM ( Sales[Quantity] ) Sales Amount := SUMX ( Sales, Sales[Quantity] * Sales[Net Price] ) Total Cost := SUMX ( Sales, Sales[Quantity] * Sales[Unit Cost] ) Margin := [Sales Amount] - [Total Cost] Margin % := DIVIDE ( [Margin], [Sales Amount] )

The single query generated by the matrix is the following one:

EVALUATE TOPN ( 502, SUMMARIZECOLUMNS ( ROLLUPADDISSUBTOTAL ( 'Customer'[Country], "IsGrandTotalRowTotal" ), "Sales_Amount", 'Sales'[Sales Amount], "Quantity__", 'Sales'[Quantity #], "Total_Cost", 'Sales'[Total Cost], "Margin__", 'Sales'[Margin %] ), [IsGrandTotalRowTotal], 0, [Sales_Amount], 0, 'Customer'[Country], 1 ) ORDER BY [IsGrandTotalRowTotal] DESC, [Sales_Amount] DESC, 'Customer'[Country]

The execution of that query in DAX Studio with a cold cache shows that the execution is fast and only requires one Storage Engine (SE) query.

This is the performance expected of a report in Power BI, even though we are querying a model that has one million of rows in the Sales table. When we try to recreate the same report in Excel, we can see important differences in the way the query is executed. Typically, you publish the Power BI report and then you create an Excel report using the Analyze in Excel feature. However, this would make it impossible to investigate the query plans; we can thus simulate the same behavior by connecting Excel to Analysis Services using the diagnostic port provided by Power BI Desktop, which is shown by DAX Studio when it is connected to Power BI Desktop.

We create the following report using a PivotTable in Excel.

Excel generates a single MDX query to populate a PivotTable, no matter how complex. In this case, the MDX query is relatively simple – the sort order is not included in the query because Excel sorts the result it receives:

SELECT { [Measures].[Quantity #], [Measures].[Sales Amount], [Measures].[Total Cost], [Measures].[Margin %] } DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS, NON EMPTY Hierarchize( { DrilldownLevel ( { [Customer].[Country].[All] },,, INCLUDE_CALC_MEMBERS ) } ) DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON ROWS FROM [Model] CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

The execution of this MDX query in DAX Studio with a cold cache shows that the execution is still fast, though slower than the previous DAX version. However, the most important difference is the increase in the number of SE queries – we now have 8 instead of 1.

The reason why the MDX query requires a larger number of SE queries compared to the equivalent DAX query is that DAX benefits from one optimization called **“fusion”**; fusion includes multiple aggregations within the same storage engine query. The reports are identical and require the aggregations of the *Quantity #*, *Sales Amount*, and *Total Cost* measures. The following is the xmSQL code of the single SE query required by the DAX version:

WITH $Expr0 := ( PFCAST ( 'Analytics Sales'[Quantity] AS INT ) * PFCAST ( 'Analytics Sales'[Unit Cost] AS INT ) ) , $Expr1 := ( PFCAST ( 'Analytics Sales'[Quantity] AS INT ) * PFCAST ( 'Analytics Sales'[Net Price] AS INT ) ) SELECT 'Analytics Customer'[Country], SUM ( @$Expr0 ), SUM ( 'Analytics Sales'[Quantity] ), SUM ( @$Expr1 ) FROM 'Analytics Sales' LEFT OUTER JOIN 'Analytics Customer' ON 'Analytics Sales'[CustomerKey]='Analytics Customer'[CustomerKey];

The query granularity is at the *Country* level, and the DAX engine computes the grand total by summing the value obtained for each country. Indeed, the measures used in the report display a purely additive behavior – they are just SUM / SUMX functions.

When the report runs in MDX, every measure requires its own SE query. This is the SE query generated for the *Sales Amount* measure:

WITH $Expr0 := ( PFCAST ( 'Analytics Sales'[Quantity] AS INT ) * PFCAST ( 'Analytics Sales'[Net Price] AS INT ) ) SELECT SUM ( @$Expr0 ) FROM 'Analytics Sales';

This is the SE query generated for the *Total Cost* measure:

WITH $Expr0 := ( PFCAST ( 'Analytics Sales'[Quantity] AS INT ) * PFCAST ( 'Analytics Sales'[Unit Cost] AS INT ) ) SELECT SUM ( @$Expr0 ) FROM 'Analytics Sales';

Although the *Margin* measure used by *Margin %* is just a difference between *Sales Amount* and *Total Cost*, there is a specific SE query to compute *Margin*:

WITH $Expr0 := ( PFCAST ( 'Analytics Sales'[Quantity] AS INT ) * PFCAST ( 'Analytics Sales'[Unit Cost] AS INT ) ) , $Expr1 := ( PFCAST ( 'Analytics Sales'[Quantity] AS INT ) * PFCAST ( 'Analytics Sales'[Net Price] AS INT ) ) SELECT SUM ( @$Expr0 ), SUM ( @$Expr1 ) FROM 'Analytics Sales';

Moreover, the SE queries are also duplicated because each SE query runs both at the *Country* granularity level and at the grand total level. For this reason, the queries in lines 10 to 16 of the last screenshot look like duplicated versions of the queries in lines 2 to 8. The difference is visible in the screenshot comparing line 8 with line 16: the former just sums *Quantity* grouped by *Country*.

The impact of the missing “fusion” optimization is not clearly visible in terms of perceived execution time when:

- The measures are trivial.
- There are only a few million rows in the fact table in the model.
- The report does not require more than a few dozen cells in the result.

When one or more of these conditions is no longer true, the difference in performance is much greater.

Consider for example this different, slower version of the measures, where *Slow Amount* applies a 5% discount on transactions with a *Net Price* greater than 100; The code is intentionally inefficient in order to highlight the problem caused by the absence of fusion in MDX queries:

Quantity # := SUM ( Sales[Quantity] ) Slow Amount := SUMX ( Sales, Sales[Quantity] * IF ( Sales[Net Price] > 100, Sales[Net Price] * 0.95, Sales[Net Price] ) ) Total Cost := SUMX ( Sales, Sales[Quantity] * Sales[Unit Cost] ) Slow Margin := [Slow Amount] - [Total Cost] Slow Margin % := DIVIDE ( [Slow Margin], [Slow Amount] )

The layout of the reports and the structure of the DAX and MDX queries is the same we have seen before, but simply replacing the measures with their slower version. However, the query plans are different. The DAX request now generates two SE queries, because the measures are no longer guaranteed to be fully aggregatable. Therefore, there is one SE query at the *Country* granularity and another one for the grand total, even though each SE query still computes all the measures because of the fusion optimization. The two SE queries of the DAX measures are visible in the next screenshot.

The presence of the IF condition in the iterator generates a *CallbackDataID* that does not cache the result of the SE query. While this is of limited importance for the DAX query executed by a single user, it has a bigger impact on the MDX version of the query. This is shown in the next screenshot.

All the SE queries in bold include a *CallbackDataID* and do not take advantage of the SE cache. In this simple example, the MDX version of the report is two times slower than the DAX version (126 ms instead of 55 ms), but the effect produced by the increased number of SE queries could be bigger in more complex reports.

In order to improve the performance of an Excel report based on a PivotTable, consider the following best practices:

- Reducing the number of measures included in a PivotTable.
- Optimizing the performance of the measures used in a PivotTable.
- Avoiding any
*CallbackDataID*especially in measures that are referenced multiple times in the same report.

A better solution would be an MDX implementation of the “fusion” optimization available for DAX queries. Please vote for this suggestion on idea.powerbi.com if you think this optimization is important in your scenarios, and in order to make Microsoft aware of this requirement; please ensure to also involve your account in Microsoft if you work in a larger corporation.

]]>

Microsoft released 13 new functions in 2019. The first 4 functions are related to the calculation group feature, which is now only available on Azure Analysis Services and Analysis Services 2019:

- ISSELECTEDMEASURE: Returns true if one of the specified measures is currently being evaluated.
- SELECTEDMEASUREFORMATSTRING: Returns the format string for the measure that is currently being evaluated.
- SELECTEDMEASURE: Returns the measure that is currently being evaluated.
- SELECTEDMEASURENAME: Returns the name of the measure that is currently being evaluated.

Chances are, you have never used the functions above. Calculation groups are not available in Power BI yet. They should be available once the Power BI XMLA endpoint is available in read-write mode, but there will still be limited adoption at that point. The real game changer is the availability of calculation groups in Power BI Desktop. The limited adoption of calculation groups is my biggest disappointment for 2019. But sooner or later this will happen – more about this later.

There is nothing groundbreaking about the other 9 functions: the same results were achieved with different syntax, or they cover some corner case that should not have a significant impact.

- DISTINCTCOUNTNOBLANK: This is like DISTINCTCOUNT but ignores any blank value.
- FIRSTNONBLANKVALUE: Returns the first non-blank value evaluated by an expression iterating the unique values in a column.
- LASTNONBLANKVALUE: Returns the last non-blank value evaluated by an expression iterating the unique values in a column.
- QUARTER: Returns the number of the quarter from a date.
- CONVERT: Converts an expression in a specific data type.
- REMOVEFILTERS: Removes filters like ALL, but it can be only used as a CALCULATE modifier and not as a table function.
- ALLCROSSFILTERED: Removes all the filters propagated over any kind of relationship and not only those obtained by the expanded table. It can be only used as a CALCULATE modifier, so REMOVE probably would have been a better prefix than ALL. Luckily, this function is rarely required.
- CONTAINSSTRING: Finds a string inside another string using limited wildcards; not case-sensitive.
- CONTAINSSTRINGEXACT: Finds a string inside another string using a case-sensitive comparison; no wildcards allowed.

This year at SQLBI we delivered 34 public courses on three continents, plus many sessions in conferences, SQL Saturdays, and other community events.

We released the second edition of The Definitive Guide to DAX in August. For several reasons, we experienced delays in delivering the book outside of North America, especially in Europe. Hopefully, the problem seems to be solved by now. You can find more details about the content of the book and the “behind the scenes” on the blog post I wrote when the book was published.

We also launched the second edition of our Mastering DAX video course. New content, new demos, and many new exercises to practice on. Sometimes I receive requests about an advanced course… well, Mastering DAX is advanced, indeed! If you think we missed something, let us know. Thanks to the new organization, we made it possible to add modules over time. And if you are new to DAX, just enroll in the free Introducing DAX video course and only move on to Mastering DAX after you have completed it.

We continued to improve DAX Guide thanks to the contributions of many readers – if your feedback has not been applied yet… don’t worry, we didn’t lose it, it’s in the queue.

Finally, our OKViz spinoff released the first commercial custom visual, Smart Filter Pro. There have been many new versions and improvements of this visual, whereas the work on other new visuals will only be visible in 2020…

I’ve seen the future, and it’s bright – but unfortunately, it’s under NDA!

Well, I need to explain this. Daniele Perilli and I are now part of the Power BI contributor program. We will contribute to some Power BI features. We are not alone. And we all work for the community. Now, use your imagination, dream up something good and imagine it happening.

I know, you want to know more. Well, usually big conferences are used to make important announcements. The Microsoft Business Applications Summit in April should have good news. But if you can go to London for SQLBits 2020 a few weeks before, you might get a nice preview.

Besides, here is what we will do in 2020:

- We will add a Calculation Groups module to the Mastering DAX video course. All the enrolled students will receive it for free.
- We will release a new version of DAX Patterns. We started working on it in 2019 and we will complete the work in 2020. All the existing patterns will be refreshed with Power BI examples, more optimized DAX code, and we will also add new patterns. We wanted to release it in 2019, but the work on The Definitive Guide to DAX took longer than expected, so we had to delay the work on the patterns.
- OKViz will release new custom visuals.
- We will keep on producing content for SQLBI and tools for the community.

If you want to receive updates promptly without missing any news, go ahead and register to our newsletter.

Happy 2020!

]]>

As do many languages, DAX enables people to get the same result through different techniques. However, the same semantic may produce different query plans, so it is a good idea to know different techniques because this could be useful when you want to optimize the performance of a slower expression.

**UPDATE 2020-02-27** : the COALESCE function is available in DAX, check product version compatibility on DAX Guide. You no longer need to implement the alternative approach described in this article if you have a product supporting COALESCE . However, this article is still relevant to implement an alternative to NULLIF , which is not implemented in DAX, yet.

**DISCLAIMER:** In this article, you will see a technique where DIVIDE can replace other conditional statements. Though the resulting code is less readable, sometimes there is a performance advantage to using DIVIDE instead of IF or SWITCH. We do not want to describe the technical reasons why DIVIDE could be more efficient. This depends on several factors. Therefore, you should never assume that DIVIDE is faster than IF or SWITCH; you also should only consider this different way of expressing the same semantic when the performance advantage is measurable and consistent across different queries and tests. Moreover, the DAX engine evolves over time, so any analysis of the differences in query plans could be obsolete very soon. Therefore, do your homework before applying any of these techniques in production, and read Understanding eager vs. strict evaluation in DAX for more details about query plans evaluating expressions that could produce blank values.

The NULLIF function in T-SQL returns a null value if the two specified expressions are equal. The syntax is:

NULLIF ( <expression1>, <expression2> )

NULLIF returns* <**expression1>* if the two expressions are not equal. If the expressions are equal, NULLIF returns a null value of the type of *<expression1>*. This is an equivalent DAX code:

VAR exp1 = <expression1> VAR exp2 = <expression2> RETURN IF ( exp1 == exp2, BLANK(), exp1 )

A better version in DAX avoids the use of IF:

VAR exp1 = <expression1> VAR exp2 = <expression2> RETURN DIVIDE ( exp1, NOT ( exp1 == exp2 ) )

Using DIVIDE instead of an IF can provide a better execution plan, although this is not guaranteed. In general, whenever we have an IF statement that uses only two arguments and the result is a numeric value, the DIVIDE syntax can provide an alternative that is less readable but sometimes faster. The reason for that is described in the following section.

Consider the following DAX pattern where *<expression> *returns a numeric value:

IF ( <condition>, <expression> )

You can get the same result by writing:

DIVIDE ( <expression>, <condition> )

The result of a logical condition is converted to 0 or 1 in DAX. If the condition is false, then the denominator of DIVIDE is 0 and the result is BLANK. If the condition is true, then the denominator of DIVIDE is 1 and the result is identical to *<expression>*.

The COALESCE function in ANSI SQL returns the current value of the first expression that does not evaluate to NULL. The syntax is:

COALESCE ( <expression1>, <expression2>, <expression3>, … )

We can rewrite this using IF in DAX this way:

VAR exp1 = <expression1> RETURN IF ( NOT ISBLANK ( exp1 ), exp1, VAR exp2 = <expression2> RETURN IF ( NOT ISBLANK ( exp2 ), exp2, VAR exp3 = <expression3> RETURN IF ( NOT ISBLANK ( exp3 ), exp3, … ) ) )

A more readable but potentially less efficient way to write the same code is the following:

VAR exp1 = <expression1> VAR exp2 = <expression2> VAR exp3 = <expression3> … RETURN SWITCH ( TRUE, NOT ISBLANK ( exp1 ), exp1, NOT ISBLANK ( exp2 ), exp2, NOT ISBLANK ( exp3 ), exp3, … )

The code using IF and SWITCH works for expressions of any data type. If you are handling numeric expressions and you never want to return a blank, then you can consider this alternative option:

VAR exp1 = <expression1> VAR exp2 = <expression2> VAR exp3 = <expression3> … RETURN NOT ISBLANK ( exp1 ) * exp1 + ISBLANK ( exp1 ) * ( NOT ISBLANK ( exp2 ) ) * exp2 + ISBLANK ( exp1 ) * ISBLANK ( exp2 ) * ( NOT ISBLANK ( exp3 ) ) * exp3 … )

In this case, using the multiplication over the result of ISBLANK produces a similar effect to what we described using DIVIDE . The difference is that because we are summing different expressions, the result cannot be blank. The propagation of blank in the result can be obtained by wrapping the expression in a DIVIDE function like in the following expression: ** **

VAR exp1 = <expression1> VAR exp2 = <expression2> VAR exp3 = <expression3> … RETURN DIVIDE ( NOT ISBLANK ( exp1 ) * exp1 + ISBLANK ( exp1 ) * ( NOT ISBLANK ( exp2 ) ) * exp2 + ISBLANK ( exp1 ) * ISBLANK ( exp2 ) * ( NOT ISBLANK ( exp3 ) ) * exp3 …, ( NOT ISBLANK ( exp1 ) ) * ( NOT ISBLANK ( exp2 ) ) * ( NOT ISBLANK ( exp3 ) ) * … )

Whenever all the expressions are blank, the denominator of DIVIDE is 0 and the result is blank.

The NULLIF and COALESCE functions available in different versions of SQL have equivalent expressions in DAX. These DAX alternatives are many. Small details in the syntax might have different semantic implications, producing different results. The best practice is to use the simplest possible syntax whenever possible, evaluating more complex DAX expressions only when optimal performance is at stake and making sure you validate that an alternative syntax is faster by doing a proper benchmark.

]]>This article describes how you can create a comparison with the previous time period in a visualization, regardless of whether the time periods are consecutive or not.

Time intelligence calculations in DAX are usually created considering consecutive periods in any type of comparisons. This approach might not work well when the requirement is to compare the differences between a selection of non-consecutive periods. In that case, the “previous” element in a visualization might not correspond to the previous element in the data model.

For example, consider the following report where the slicer selects an arbitrary set of months within a year: March, May, August, and October 2008.

The *Sales Diff PM* and *% Sales Diff PM* measures provide the difference between the *Sales Amount* of the month displayed in the row and the month displayed in the previous row of the matrix visualization. These two measures are based on the *Sales PM* measure, which returns the Sales Amount of the previous selected month in the same visualization.

Sales Diff PM := VAR SalesCurrentMonth = [Sales Amount] VAR SalesPreviousMonth = [Sales PM] VAR Result = DIVIDE ( SalesCurrentMonth - SalesPreviousMonth, ( NOT ISBLANK ( SalesCurrentMonth ) ) * ( NOT ISBLANK ( SalesPreviousMonth ) ) ) RETURN Result % Sales Diff PM = DIVIDE ( [Sales Diff PM], [Sales PM] )

The main goal of this article is to describe how to write the *Sales PM* measure of this example. From a DAX standpoint, the previous row of the matrix is not a concept that can be directly expressed in a formula. In order to author a measure that can do so, we have to start with an assumption: each row displays a month, and the months are sorted according to their natural sort order (January, February, March, and so on). However, the previous month in the visualization is not necessarily the previous month in the calendar. In the example we are considering, the selection made on the slicer shows just a few months. We should redefine the concept of “previous month” as “previous month in the selection made outside of the matrix”. Now we can call upon a Power BI concept with a close enough representation in DAX: the ALLSELECTED modifier allows a CALCULATE function to retrieve the filter context defined outside of a visual, which in this case is the selection made on filters and other visuals on the same report page.

Knowing the current month of a cell in the visualization, the previous month is the maximum month number available in the filter context provided by ALLSELECTED excluding the current and following months. This evaluation is made by the *PreviousYearMonth* variable in the *Sales PM* measure. The *PreviousYearMonth* variable is used to filter the *Year Month Number* in the CALCULATE function that evaluates *Sales Amount* for the previous selected month:

Sales PM := VAR CurrentYearMonth = SELECTEDVALUE ( 'Date'[Year Month Number] ) VAR PreviousYearMonth = CALCULATE ( MAX ( 'Date'[Year Month Number] ), ALLSELECTED ( 'Date' ), KEEPFILTERS ( 'Date'[Year Month Number] < CurrentYearMonth ) ) VAR Result = CALCULATE ( [Sales Amount], 'Date'[Year Month Number] = PreviousYearMonth, REMOVEFILTERS ( 'Date' ) ) RETURN Result

The technique shown in this article can be used whenever you need to retrieve a “previous item” displayed in a Power BI visualization where non-consecutive items are filtered and the requirement is to consider strictly the items that are selected and visible.

]]>A calculation group is like a table in the Tabular model and a calculation item is a value in a column of the table defined for the calculation group. In order to sort the calculation item there is an additional column called Ordinal used to manage the “Sort by Column” behavior, like any other column of a Tabular model. Even though the Ordinal column is not mandatory, it is required in order to make the custom ordering work – without the column Ordinal, the calculation item will be displayed in in alphabetical sort order regardless of the Ordinal property of the calculation item.

Tabular editor automatically creates the Ordinal column in a new calculation group. If you don’t have the Ordinal column in the calculation group because you created the model using a previous version of Tabular Editor, you can fix the model using Tabular Editor 2.9.1 by doing the following:

- Right-click the calc group table > Create New > Data Column
- Set the Source Column property to “Ordinal”
- Set the Data Type property to “Integer / Whole Number (int64)”
- Optionally make the column Ordinal invisible (right-click / Make Invisible)

- Click the column of the corresponding calculation item
- Set the Sort By Column property to “Ordinal”

After that, you can set the Ordinal property of the calculation items by directly assigning a value to that property for each calculation item, or by moving the sort order of calculation items in the calculation group using drag & drop. This way, Tabular Editor automatically adjusts the Order property so that it reflects the desired order of the column.

]]>This article describes the differences between eager evaluation and strict evaluation in DAX, empowering you to choose the best evaluation type for your data models.

UPDATE 2020-01-31: The new IF.EAGER function can enforce the eager evaluation described in this article without having to refactor the DAX expression using variables. The content of this article is still valid, just consider IF.EAGER instead of IF in those cases where the eager evaluation is more convenient for your formula.

A conditional expression in DAX – like an IF function – can be evaluated in two ways: eager and strict. Understanding the difference between the two evaluation modes can be very useful in finetuning the performance of a measure. The examples in this article cover strict and eager evaluation. The goal is to equip you to make an informed decision in picking the best evaluation mode for a specific data model.

The first example is a simple measure that performs a comparison and then returns one of two different values depending on the outcome of the comparison. Consider the following query:

EVALUATE VAR AverageSales = [Sales Amount] / DISTINCTCOUNT ( 'Product'[Color] ) RETURN ADDCOLUMNS ( VALUES ( 'Product'[Color] ), "@Test", IF ( [Sales Amount] > AverageSales, [Margin], [Total Cost] ) )

The *@Test* column contains either the *Margin* or the *Total Cost* result, depending on whether the *Sales Amount* of the given color is higher than the *AverageSales* per color or not. This is a useless calculation – we are not interested in the result, but rather in how the DAX engine computes it.

The engine starts by computing the *AverageSales* variable. Then, it has a choice. The first option is to compute the value of the condition at line 8 for each color – checking whether *Sales Amount* for the given color is larger than *AverageSales* – before computing either Margin or Total Cost, choosing to calculate only the one that is required by the result of the condition. This behavior is the most intuitive and it is known as ** strict evaluation** in Analysis Services and Power BI. Another well-known name for this technique is short-circuit evaluation.

There is another way of computing the same result. The engine could compute every measure (*Margin*, *Total Cost*, and *Sales Amount*) for every product color. Once all these values are available, the engine would check the condition for each color and choose whether to use the previously computed value of *Margin* or *Sales Amount*. This technique is known as ** eager evaluation** and consists in precomputing all the possible values that could be returned by a conditional statement, choosing which value to use case by case at a later time.

Which of the two techniques is better? You already know the answer: it depends. Though strict evaluation may look more efficient, this is not always the case. Before moving further, consider a simple example. What if the expression for the *@Test* column looked like this:

EVALUATE VAR AverageSales = [Sales Amount] / DISTINCTCOUNT ( 'Product'[Color] ) RETURN ADDCOLUMNS ( VALUES ( 'Product'[Color] ), "@Test", IF ( [Sales Amount] > AverageSales, [Sales Amount] * 1.5, [Sales Amount] * 2.0 ) )

In this expression, the same sub-expression *[Sales Amount]* is used in both branches of the IF statement. Strict evaluation requires computing the *Sales Amount* measure once in order to determine which colors satisfy the condition; then the same *Sales Amount* measure is computed a second time to solve the expression chosen by the condition. In other words, the engine needs to scan the *Sales* table multiple times in order to obtain the results required.

By using eager evaluation, the DAX engine computes the value of *Sales Amount* for each color in a single scan; only later does it check the condition defining which of the two expression to return. This means that the eager evaluation requires fewer scans of the *Sales* table.

However, this does not mean that eager evaluation is always the best choice. It is better in the example outlined, but there are many scenarios where strict evaluation proves to be the best option. Nevertheless, since strict evaluation is intuitively better, we wanted to start with an example where eager evaluation is indeed the best option.

Let us elaborate more on the topic by looking at the actual query plan of the query described at the beginning of this article; it is reported in the following code snippet, which also includes the definition of the measures:

DEFINE MEASURE Sales[Margin] = SUMX ( Sales, Sales[Quantity] * ( Sales[Net Price] - Sales[Unit Cost] ) ) MEASURE Sales[Sales Amount] = SUMX ( Sales, Sales[Quantity] * Sales[Net Price] ) MEASURE Sales[Total Cost] = SUMX ( Sales, Sales[Quantity] * Sales[Unit Cost] ) EVALUATE VAR AverageSales = [Sales Amount] / DISTINCTCOUNT ( 'Product'[Color] ) RETURN ADDCOLUMNS ( VALUES ( 'Product'[Color] ), "@Test", IF ( [Sales Amount] > AverageSales, [Margin], [Total Cost] ) )

The DAX query generates five Storage Engine (SE) queries.

Two of these five queries are not interesting: one at line 4 is needed to compute *AverageSales* and one at line 6 is required to obtain the full list of the *Product[Color]* values. The other three queries are more interesting. Please note that all queries have been edited to make them easier to read.

The query at line 2 computes *Sales Amount* for every color. It is required to evaluate the condition for all the possible colors:

SELECT 'Product'[Color], SUM ( 'Sales'[Quantity] * 'Sales'[Net Price] ) FROM 'Sales' LEFT OUTER JOIN 'Product' ON 'Sales'[ProductKey]='Product'[ProductKey]

The last two queries at line 8 and line 10 compute *Margin* and *Total Cost* respectively, for a different subset of colors. The DAX Formula Engine (FE) evaluates the condition for the colors using the result of the previous query, and then splits the colors into two lists: one list contains the colors returning true in the IF condition – computing *Margin* in that case – while the other list contains the colors returning false for the same IF condition – computing *Total Cost* as a result:

SELECT 'Product'[Color], SUM ( 'Sales'[Quantity] * ( 'Sales'[Net Price] - 'Sales'[Product Cost] ) ) FROM 'Sales' LEFT OUTER JOIN 'Product' ON 'Sales'[ProductKey]='Product'[ProductKey] WHERE 'Product'[Color] IN ( 'Black', 'Silver', 'Blue', 'White', 'Grey' ) SELECT 'Product'[Color], SUM ( 'Sales'[Quantity] * 'Sales'[Product Cost] ) FROM 'Sales' LEFT OUTER JOIN 'Product' ON 'Sales'[ProductKey]='Product'[ProductKey] WHERE 'Product'[Color] IN ( 'Red', 'Green', 'Orange', ..[11 total values, not all displayed] )

The engine does not compute the total cost for all the colors. Every color appears in only one of the two SE queries, depending on the condition evaluated for each color in the previous evaluation.

At this point we are curious to see the same query with eager evaluation. In order to enforce the eager evaluation, we change the code a bit by introducing variables:

EVALUATE VAR AverageSales = [Sales Amount] / DISTINCTCOUNT ( 'Product'[Color] ) RETURN ADDCOLUMNS ( VALUES ( 'Product'[Color] ), "@Test", VAR Margin = [Margin] VAR SalesAmount = [Sales Amount] VAR TotalCost = [Total Cost] RETURN IF ( SalesAmount > AverageSales, Margin, TotalCost ) )

The values used inside the IF function are now computed before the IF, storing them into variables (*Margin*, *SalesAmount*, and *TotalCost*). We are explicitly telling the engine the execution path that we want: first, compute all the values for every color; then, choose the result between the values computed depending on a condition (*SalesAmount* > *AverageSales*). This way, DAX follows our request and uses eager evaluation.

This results in only three SE queries: two SE queries are identical to the first two described in the previous DAX query, because they are required to compute the list of colors and the *AverageSales* variable. The remaining SE query retrieves the values of the three *Margin*, *Sales Amount*, and *Total Cost* measures with a single request:

SELECT 'Product'[Color], SUM ( 'Sales'[Quantity] * 'Sales'[Net Price] ), SUM ( 'Sales'[Quantity] * ( 'Sales'[Net Price] - 'Sales'[Product Cost] ) ), SUM ( 'Sales'[Quantity] * 'Sales'[Product Cost] ) FROM 'Sales' LEFT OUTER JOIN 'Product' ON 'Sales'[ProductKey]='Product'[ProductKey]

Once the three *Margin*, *SalesAmount*, and *TotalCost* variables are computed for every color, the FE iterates the result of this last SE query. It does so choosing for each color whether to use *Margin* or *TotalCost,* depending on the condition evaluated using the *SalesAmount* of the color and the *AverageSales* computed earlier.

For such simple queries, there is no difference in the overall execution times because they both stand well under 10 milliseconds. Nevertheless, on large queries and/or with complex measures, the difference might be very noticeable.

The main elements driving the choice between eager versus strict evaluation are the following:

**Complexity of the expressions depending on a condition**(like in an IF function). If the expression in the alternative branches of execution is very complex, then strict evaluation should be preferred because it exclusively reduces the number of calculations to the required rows.**Presence of common sub-expressions in the expressions depending on a condition**. If the same sub-expression is used in alternative branches of execution, then eager evaluation lets the engine optimize the execution by computing the sub-expression only once. With strict evaluation, the sub-expression must be computed multiple times – once for each of the alternative branches of execution.**Sparsity of the result**. If the IF function does not have an else branch of execution and it filters just a few rows out of many, then a strict evaluation reduces the number of computations to only the ones required.

Trying to simplify: For simple calculations like SUMX aggregations using simple expressions or for non-selective conditions, the eager evaluation usually works better. For more complex calculations or for very selective conditions, then the strict evaluation is a better choice.

For educational reasons, the code outlined in this article is quite simple. For complex DAX measures or queries, a deeper evaluation is required before choosing one technique over the other.

As an example, in a previous article (Optimizing conditions involving blank values in DAX) we showed a query that could potentially iterate over 15 million combinations, 99 thousand of which actually returned a result.

Using strict evaluation on that query resulted in having to compute a table with all possible 25 million rows, just to evaluate the condition for all the possible combinations. In that article, we had established other considerations; but the bottom line is that the best possible optimization for that scenario is to use variables in order to force eager evaluation. By using eager evaluation, the engine aggressively computes all the required expressions with a single scan of the fact table. The result includes 99 thousand rows and the engine knows that none of the remaining combinations will return a useful result. Eager evaluation proved to really shine in that case.

As with any article about optimization, remember that we never provide golden rules. The choice of strict versus eager optimization is one of the many tools in the DAX guru’s toolbelt. Next time you write an IF function inside an iteration over a large table, you want to also consider whether you are resorting to a strict or an eager evaluation.

Remember: never make a choice without proper testing. The best formula for one model might not be the best formula for a different model.

Have fun with DAX!

]]>This article shows a basic technique used to sort months according to a fiscal calendar, by using a couple of calculated columns and the “sort by column” feature of Power BI.

By default, Power BI sorts columns alphabetically. So when it applies to months, the default alphabetical sorting order of months in a new Date table is incorrect.

The Sort by Column feature of Power BI can sort the months by name, relying on the month number which needs to contain numbers from 1 to 12 reflecting the correct order.

When the column with the month name is sorted by month number, the report becomes much better:

Still, there is an issue if the user replaces the calendar year with the fiscal year in the previous report. Supposing the fiscal year starts in July, with July 2007 being month 1 of fiscal year 2007. It turns out that January 2007 belongs to the fiscal year 2006. By creating a column for the fiscal year in the Date table, the report evolves into:

The whole red box is in the wrong place. In fact, January to June in fiscal year 2006 should be the end of fiscal year 2006 and January in fiscal year 2007 should be after December 2007. In other words, the entire red box needs to shift down towards the bottom.

Updating the Month number column to reflect a new sorting is an option – however this would mean losing the original sorting of months which is correct for the standard calendar. Instead, the correct solution is to create a new Month column, which is an exact replica of the Month name column, and call it “Fiscal Month”. Then, this new column can be sorted by fiscal month number.

These are the two new calculated columns:

'Date'[Fiscal Month] = 'Date'[Month] 'Date'[Fiscal Month Number] = IF ( 'Date'[Month Number] <= 6, 'Date'[Month Number] + 6, 'Date'[Month Number] – 6 )

When a report slices data by fiscal year, using Fiscal Month instead of Month produces the following report, with the months sorted from July to June.

]]>This article describes how blank values considered in a DAX conditional expression can affect its query plan and how to apply possible optimizations to improve performance in these cases.

An important metric to consider in optimizing DAX is the cardinality of the data structures iterated by the formula engine. Sometimes the formula engine needs to scan huge datacaches because it cannot leverage the auto-exist logic of DAX. Optimizing these scenarios requires a deep understanding of the DAX engine. This article describes an example of such optimization.

The examples used in this article are not related to specific business problems. We want to focus on the behavior of DAX, not on a specific business problem. Let us start with a sample query:

DEFINE MEASURE Sales[SimpleSum] = SUM ( Sales[Quantity] ) EVALUATE { COUNTROWS ( SUMMARIZECOLUMNS ( 'Date'[Date], Customer[Company Name], 'Product'[Color], "Test", [SimpleSum] ) ) }

The result of this query is the number of combinations of date, company, and color with sales transactions. In the demo database we are using, it returns 99,067 combinations scanning the 12 million rows of the *Sales* table in around 50 milliseconds. Nothing exceptional here – this is the speed expected of Power BI.

Nevertheless, it is important to spend some time analyzing exactly how the DAX engine solves the query. The query plan is extremely simple.

The query plan executes a storage engine (SE) query retrieving the 99,067 combinations from Sales, and then it counts them. Therefore, the SE materializes 99,067 rows scanning Sales. Even though – as humans – we consider this to be a reasonable query plan, the engine performs some complex reasoning to figure out that this is indeed the best way to answer the query. Let us see what happens.

SUMMARIZECOLUMNS requires the engine to group the columns by groups of three, belonging to three different tables. SUMMARIZECOLUMNS also requires the engine to perform the cross-join of the values, and then evaluate the measure. Because the columns belong to different tables, the auto-exist behavior does not kick in (see Understanding DAX auto-exist for more details).

Taken individually, the three columns are not large: *Date[Date]* contains 2,556 values, *Customer[Company Name]* contains 386 values and *Product[Color]* only contains 16 values. However, the full cross-join of these three columns corresponds to the product of the three values. And 2,556*386*16 equals to 15,785,856 possible combinations of values.

Be mindful that – semantically – the query is equivalent to the following one, which shows the required materialization more clearly:

DEFINE MEASURE Sales[SimpleSum] = SUM ( Sales[Quantity] ) EVALUATE { COUNTROWS ( FILTER ( ADDCOLUMNS ( CROSSJOIN ( CROSSJOIN ( VALUES ( 'Date'[Date] ), VALUES ( Customer[Company Name] ) ), VALUES ( 'Product'[Color] ) ), "Test", [SimpleSum] ), NOT ( ISBLANK ( [Test] ) ) ) ) }

How does DAX know that it would be useless to scan – and therefore materialize – all the possible combinations, when it can only compute the existing ones instead? The reason is that the measure aggregates the SUM of one column in the *Sales* table. If *Sales* does not contain one combination of the three columns, the sum produces no result. Therefore, the optimizer uses this information to produce an optimal query plan, which scans *Sales* by also joining the other three tables. This way, the scan only retrieves the rows that might produce a result. The following xmSQL query shows the scan executed by the SE:

SELECT 'Customer'[Company Name], 'Date'[Date], 'Product'[Color], SUM ( 'Sales'[Quantity] ) FROM 'Sales' LEFT OUTER JOIN 'Customer' ON 'Sales'[CustomerKey] = 'Customer'[CustomerKey] LEFT OUTER JOIN 'Date' ON 'Sales'[OrderDate] = 'Date'[Date] LEFT OUTER JOIN 'Product' ON 'Sales'[ProductKey] = 'Product'[ProductKey];

The code executed is equivalent to this DAX query, which is another efficient way to obtain the same result:

DEFINE MEASURE Sales[SimpleSum] = SUM ( Sales[Quantity] ) EVALUATE { COUNTROWS ( ADDCOLUMNS ( SUMMARIZE ( Sales, 'Date'[Date], Customer[Company Name], 'Product'[Color] ), "Test", [SimpleSum] ) ) }

Despite everything being rather simple so far, it is useful to do a first recap:

- The query requires a full cross-join between three dimension tables (
*Customer*,*Date*, and*Product*) and it returns the sum of one column from the fact table (*Sales*). - The engine checks that the result can be produced if and only if a combination of values from the dimensions does exist in the fact table.
- The optimizer simplifies the query by iterating only the existing combinations; it avoids the scan of 15,000,000 possible combinations, focusing only on the existing 99,000 combinations instead.

In terms of performance, the optimizer does a tremendous job. Indeed, the query containing the explicit CROSSJOIN takes around 8 seconds to run, whereas the optimized version requires a handful of milliseconds.

It is crucial to understand that to apply the optimization, the engine must be sure that the two queries are equivalent.

Slightly changing the query produces very different results. For example, consider the following version where we only changed the measure definition. Instead of computing a sum, the measure now checks if the sum is greater than or equal to zero, returning a Boolean value (TRUE or FALSE):

DEFINE MEASURE Sales[SimpleBoolean] = SUM ( Sales[Quantity] ) >= 0 EVALUATE { COUNTROWS ( SUMMARIZECOLUMNS ( 'Date'[Date], Customer[Company Name], 'Product'[Color], "Test", [SimpleBoolean] ) ) }

The result is now 15,785,856. In other words, the engine produces – and evaluates – the full cross-join. The reason is that non-existing combinations of date, company, and color produce a result. The SUM of these non-existing combinations is blank, because there are no sales for non-existing rows. Nevertheless, the combinations are evaluated and the expression result is TRUE for all of the non-existing combinations.

As you might expect, this query is much slower: it runs in around 2 seconds, because it scans over 15,000,000 rows. We strongly suggest you spend some time reading the query plan of this last query.

It shows that the optimizer is indeed much better than expected. Instead of computing the measure for all the possible combinations, it computes only the existing combinations in *Sales*. It then compares the full cross-join of the three columns with the existing combinations in *Sales* in order to speed up the calculation.

The important detail to remember in this step is that the value of a Boolean condition does not depend on the existence of a combination in the fact table. The Boolean value might be true or false depending on the combination of values from the dimension. In other words, there are 15,000,000 possible values to scan. In this case, the engine does a good job by reducing the number of computations.

It turns out that this is not the slowest query we can analyze in this article. We can further mess with the optimizer and force the production of the worst query plan. The following query generates the slowest query plan:

DEFINE MEASURE Sales[SimpleIf] = IF ( SUM ( Sales[Quantity] ) >= 0, SUM ( Sales[Quantity] ) ) EVALUATE { COUNTROWS ( SUMMARIZECOLUMNS ( 'Date'[Date], Customer[Company Name], 'Product'[Color], "Test", [SimpleIf] ) ) }

In this query, for each combination of *Date*, *Company Name*, and *Color*, the measure checks if the result is greater than or equal to zero, only summing positive values. The measure serves no purpose other than to demonstrate when the optimizer starts to be confused. Indeed, this query runs in more than 8 seconds, resulting in horrible performance.

At first sight, it looks like the issue is the presence of IF. This is not correct. IF, by itself, is not a big deal. The real issue is the requirement to evaluate non-existing combinations of attributes of the dimensions. IF is useful to spot the problem, but IF by itself is not the only culprit.

The problem with this last query is in the condition that checks if the sum of quantity is greater than or equal to zero. In a DAX comparison, BLANK is considered as zero. Therefore, a measure that evaluates to blank is treated as zero and any non-existing combination of the dimensions satisfies the IF statement. Therefore, the engine must evaluate more than 15,000,000 combinations, requiring a long execution time.

Fixing the code in this scenario is utterly simple. It is enough to tell DAX that we are not interested in zeros – therefore in blanks. If instead of using greater than or equal to (>=) we just check for strictly greater than (>), then blank values are no longer part of the game:

DEFINE MEASURE Sales[SimpleIf] = IF ( SUM ( Sales[Quantity] ) > 0, SUM ( Sales[Quantity] ) ) EVALUATE { COUNTROWS ( SUMMARIZECOLUMNS ( 'Date'[Date], Customer[Company Name], 'Product'[Color], "Test", [SimpleIf] ) ) }

This last query restores the great performance of DAX, evaluating only the relevant rows.

In the example used for these demos, excluding the zero value from the condition solved the issue. More generally, adding a bit of code to explicitly remove blank values from the calculation greatly helps the optimizer find the best path. For example, the following formulation – though more verbose – runs nearly as fast as the latter one:

DEFINE MEASURE Sales[SimpleIf] = IF ( NOT ISBLANK ( SUM ( Sales[Quantity] ) ), IF ( SUM ( Sales[Quantity] ) > 0, SUM ( Sales[Quantity] ) ) ) EVALUATE { COUNTROWS ( SUMMARIZECOLUMNS ( 'Date'[Date], Customer[Company Name], 'Product'[Color], "Test", [SimpleIf] ) ) }

The code may look more complex, but providing the required information to the engine on how to treat blanks produces a tangible and efficient result.

Finally, it is worth noting that similar results can be obtained by using variables. Variables greatly help the optimizer understand the code we author, producing optimal paths in the execution plan. This last version of the query is the fastest among all the different versions analyzed:

DEFINE MEASURE Sales[SimpleIf] = VAR S = SUM ( Sales[Quantity] ) RETURN IF ( S >= 0, S ) EVALUATE { COUNTROWS ( SUMMARIZECOLUMNS ( 'Date'[Date], Customer[Company Name], 'Product'[Color], "Test", [SimpleIf] ) ) }

Understanding the reason why a variable solves the problem is a bit more intricate. By using a variable, we are telling the optimizer that we want to aggregate all the rows using SUM, regardless of whether the condition in the IF function is met or not. This turns on eager evaluation for the measure – a that is, the values are computed in block and then the IF condition is used to add only the positive results. The eager computation of the sum of *Sales[Quantity]* produces only 99,000 values, making it possible to generate an efficient query plan. Describing it further takes an article of its own: you can read Understanding eager vs. string evaluation in DAX to know more about it.

As it always goes with articles about DAX optimization, we do not want to share any golden rule, mainly because we genuinely know there are none. Nevertheless, when optimizing your code always pay attention to the number of rows iterated by the formula engine. Large iterations are often generated by an inaccurate treatment of blanks, which are often present because of non-existing combinations of dimensions that the engine must iterate to guarantee the equivalence between BLANK and zero.

]]>