Search

Search this blog:

Search This Blog

DAX Approximate Lookup


I've had quite a few questions lately around using DAX to find information in another table based on thresholds or ranges of values. 

Use Cases:

  • Find the [Current Price] based on the Sales[Order Date], using a Historic Pricing table with Effective Date and Unit Price columns
  • Calculate the [Employee Bonus] based on upper and lower limits for Sales[Sales Amount] each month, using a Bonus table with Bonus Percent, Minimum Sales and Maximum Sales thresholds
  • Find the [Pallet ID] for the first pallet that has enough inventory to fulfill the Sales[Order Qty], using an Inventory table with Pallet ID and Stock Qty columns
  • So much more! 
Comment below with what brought you to this post to add to the list of Use Cases for Unrelated Approximate Lookup tables.
 
All these examples have one basic feature in common - there is no defined relationship between the two tables in question, and rarely will we find an exact match. 

Employee Bonus Lookup

Let's focus on the Employee Bonus example for this post. 

Given the Bonus table as follows:

We are asked to calculate the Employee Bonus amount based on their monthly sales figures defined as an aggregation on daily sales as below: 


Looking at this table, we can see that EmployeeID 11115 is consistently earning a 5% bonus as their sales fall between $2,000-$20,000 each month. 

EmployeeID 11117 however only gets 3% bonus for the months of September and October 2019. 

Note: We have subtotals turned on in this visual so that we can see the annual total sales as well as the monthly sales. 

Calculate Bonus %

As we already discussed, there is no defined relationship between these two tables, so if we try to add Bonus[Bonus] to our matrix above, we'll just get the same value for every row. 

Instead, we can use DAX to perform an approximate lookup for us. Depending on your preference for functions and a bit on your business requirements, any of the below will get you close: 

Bonus MAXX =
MAXX ( FILTER ( Bonus, [TotalSales] > Bonus[Min Sales] ), Bonus[Bonus] )

Bonus MINX =
MINX ( FILTER ( Bonus, [TotalSales] < Bonus[Max Sales] ), Bonus[Bonus] )

Bonus Between =
CALCULATE (
    SELECTEDVALUE ( Bonus[Bonus] ),
    FILTER (
        Bonus,
        Bonus[Min Sales] < [TotalSales]
            && Bonus[Max Sales] > [TotalSales]
    )
)



Note the Bonus MINX does not really achieve what we want and all of them have the incorrect value for Bonus in the totals for the year. 

What do we want to display as the annual bonus? In the case of EmployeeID 11116 the decision may be easier, but for EmployeeID 11117 it is not so straightforward. Since the bonuses are calculated and paid monthly, we will opt to return BLANK() when more than one month has been selected. This can be done on any of the Bonus options above simply by nesting it inside an IF statement to check that DimDate[YearMonthKey] has one value only: 

Result

Bonus =
IF (
    HASONEVALUE ( DimDate[YearMonthKey] ),
    MAXX ( FILTER ( Bonus, [TotalSales] > Bonus[Min Sales] ), Bonus[Bonus] )
)

Approximate VLOOKUP


Today we're going to take a closer look at VLOOKUP and the lesser used functionality of Approximate Match. 

Business Problem 

Take the following Bonus table: 

 Min SalesMax Sales Bonus
500  0%
500 2000  3%
2000 20000  5%
20000 100000 7%

We want to use that information to automatically find the Employee Bonus based on monthly sales figures in a table such as this:
 EmployeeIDSalesAmount 
 1200 
 2550
 314500 
 47000 

Business Solution


Now this is a bit tricky because we can't simply search for the SalesAmount in the Bonus table to return the Bonus % because we won't find any matches, so we need to use an approximate search. 

VLOOKUP Function Arguments


Let's first review the Function Arguments for VLOOKUP:

Lookup_value - What is the matching number or value between the two tables?
  • This should be taken from the same table where you are authoring the formula (don't do the lookup work for Excel).
  • This should be a RELATIVE cell reference.
Table_array - What is the name/reference of the table where your answer or additional information is stored?
  • This is the other table (NOT the one where you are authoring the formula). Also called the Lookup Table. 
  • This should be an ABSOLUTE cell reference. (Table names are absolute by default.)
  • The first column of this table MUST be the column containing the Lookup_value, Excel will simply find the matching row.
  • Sort this table - Excel will always take the first answer it finds.
Col_index_num - Which number column has the answer?
  • Take the first column (which contains the Lookup_value) as 1.
  • Count the number of columns until you get to the column containing the answer.
Range_lookup - What type of match should be returned?
  • TRUE - indicates an approximate match.
  • FALSE - indicates an exact match.

VLOOKUP Formula

So, to calculate the Employee Bonus above we would use the following formula:

EmployeeBonus = VLOOKUP([@SalesAmount],Bonus,2,TRUE)


What is this formula doing? 

Let's start with EmployeeID 1. They had monthly SalesAmount of $200, so we'll search the first column of the Bonus table for 200. As soon as we find a value >200, we'll stop and go back to the previous row. 
500 is the first value that's >200, so we need to stop and go back to the row before 500. That puts us in the first row and returns a 0% bonus.

EmployeeID 2 had $550 monthly SalesAmount. 500<550, so we can keep going. 2000 is the first value that's >550, so we'll stop there and go back to the previous row with a bonus of 3%. 

EmployeeIDSalesAmount  Employee Bonus
 1200  0%
 2550 3%
 314500  5%
 47000  5%

The trick to getting this to work is that your Lookup Table, in this case our Bonus table, MUST be sorted properly first. 

Sort Order Matters

If we try the same formula with the Bonus table out of sort order, we'll get different results: 

Min Sales Max Sales Bonus
0 500 0%
2000 20000 5%
20000 100000 7%
500 2000 3%

EmployeeID Sales Amount Bonus
1 200 0%
2 550 0%
3 14500 5%
4 7000 5%

So what's happening here? 

Start with EmployeeID 1 - $200 < 2000 so we'll go back to row 1 and return 0%.
EmployeeID 2 - $550 < 2000 so we'll again go back to row 1 and return 0%. We never get the chance to look at row 4 with 500 and 3% to see that it is a closer match. Excel returns the first match.
EmployeeID 3 - 14500 < 20000 so we'll go up one row to 5%. In this case we happen to have found the correct answer by coincidence. 

Sort order matters!

DAX Context Transition: Why it can be handy to use a [Measure] inside a Measure


Today I want to discuss an advanced reason for using a [Measure] inside another Measure. In beginner classes for DAX, I tell my students that they can use a [Measure] inside another Measure in order to:
  • Make it easier to read your calculations
  • Refer to the same calculation across multiple Measures
Take the below data model as an example: 


Let's say we want to calculate the average quantity of items any given customer (Account) purchases in a single order, based on all their past orders. In order to get AvgQtyPerOrder for a given Account, I simply need to divide their total quantity by their total orders. To calculate this using DAX, I have a few options.

Option A: All calculations in one measure

AvgQtyPerOrder A =
DIVIDE (
    SUM ( SalesOrderDetails[Qty] ),
    DISTINCTCOUNT ( SalesOrderDetails[Order Number] )
)

Option B: Break calculations into three smaller measures

Total Qty =
SUM ( SalesOrderDetails[Qty] )

Total Orders =
DISTINCTCOUNT ( SalesOrderDetails[Order Number] )

AvgQtyPerOrder B =
DIVIDE ( [Total Qty], [Total Orders] )

As you can see in the image below, both options return the same result:


Option A is great because it's transparent and I can easily see exactly what calculations are happening within the Measure, but it's not exactly easy to read, and it's not how my brain works when authoring formulas. I personally find it easier to break the DAX into bite size chunks. 

Option B gives us those bite sized chunks and when all formulas for each measure are viewed side by side like this, it's much easier to digest. 

It's also possible to go on to use [Total Qty] or [Total Orders] in other measures, without needing to rewrite the formula for each measure. For example, we might use [Total Orders]  to calculate the total number of customers (Accounts) who have purchased something from us:

Total Accounts B =
COUNTROWS ( FILTER ( Account, [Total Orders] > 0 ) )

The danger with using Option B is that we lose that transparency and [Total Orders] is now masked in a Measure name. Without checking the formula for [Total Orders] we can't be 100% certain what [Total Accounts B] is actually doing. 

Okay, so we can do Option A instead, right? 

Total Accounts A =
COUNTROWS (
    FILTER ( Account, DISTINCTCOUNT ( SalesOrderDetails[Order Number] ) > 0 )
)

All we've done is replace [Total Orders] with it's definition DISTINCTCOUNT ( SalesOrderDetails[Order Number] ) so we should get the same result, right? 

WRONG! Here's the result we get when we place are two measures in a matrix with Product Name: 


We can see that Option B is giving us the correct result of only the accounts who bought that product, and the Total gives the number of Accounts who have bought anything. Option A simply gives the total number of Accounts in the Accounts table for all rows in the matrix. 

WHY does it matter if we use a measure inside a measure, or simply define the measure instead? 

The answer is due to the Context Transition that happens when we use DAX filter functions

I'm talking about these guys: 
  • FILTER
  • ALL
  • ALLEXCEPT
  • CALCULATE
  • etc

DAX Context

DAX has two contexts that we must ALWAYS take into consideration when authoring DAX formulas: 
  • Row Context
  • Filter Context
When authoring MEASURES, we are always working within the current Filter Context provided by the data model, the visuals, slicers, and report level filters. When authoring COLUMNS, we work with the Row Context provided by the table. And of course, there's lots of cross-over too, like when:
  • using Iterators (such as SUMX) within a MEASURE to change the Filter Context to a Row  Context of the Table defined in the DAX expression
  • using CALCULATE within a COLUMN to change the Row Context of the data model table to a Filter Context that would be present in a MEASURE used within a report matrix or table visual containing the same data
  • using table Filter Functions (such as FILTER) within a MEASURE to change the Filter Context to a Row Context of the Table defined in the DAX expression
We call that cross-over Context Transition

Context Transition with filtering functions

Recall, we're still trying to figure out why the following MEASURES give 2 different results when used in a matrix with Product[Name].


Total Accounts A =
COUNTROWS (
    FILTER ( Account, DISTINCTCOUNT ( SalesOrderDetails[Order Number] ) > 0 )
)

Total Accounts B =
COUNTROWS ( FILTER ( Account, [Total Orders] > 0 ) )

Note, these measures are being used inside a FILTER() function. The FILTER function requires two arguments: 
FILTER(<Table>, <Filter>)

The Table as its first argument applies a Row Context to whatever expression is in the <Filter> argument. We know that Row Context applies in calculated COLUMNS.

To see what's going on in our two MEASURES, we're going to take a quick look at some calculated COLUMNS in order to simulate the Context Transition applied by the FILTER function. 

Both Option A and B are applying a FILTER on the Account table, so let's try jumping into the Account table and create a couple of calculated COLUMNS to see the difference. The difference between Option A and Option B was how to calculate Total Orders: 

Total Orders A = DISTINCTCOUNT ( SalesOrderDetails[Order Number] ) 

Total Orders B = [Total Orders] 

Here's the result in the calculated COLUMNS in Account table: 


Option A using DISTINCTCOUNT provides the same result for every row in the table. This should come as no surprise, we know not to use aggregate functions (such as COUNT) within a calculated column, as it doesn't provide the correct context. 

Option B using the [Total Orders] measure interestingly provides a different result for each row in the table. It does this because the MEASURE provides a Filter Context and our data model relationships allow this transition to happen.

The same thing happens within our original MEASURES; when we use the [Total Orders] measure, we get the context transition we're looking for. When we use DISTINCTCOUNT, we get incorrect Row Context.

Mystery solved!

So in summary, we can now update our beginner rationale for [Measure] reference inside Measures with the more complex rationale: 

Use a [Measure] inside another Measure in order to:
  • Make it easier to read your calculations
  • Refer to the same calculation across multiple Measures
  • Apply a Context Transition when using filtering functions

BONUS: Context Transition with CALCULATE 

Just for fun, we can take this one step further and add an option C into the mix. Let's start with a review of CALCULATE and what it does. From Microsoft Docs regarding CALCULATE:
Definition: Evaluates an expression in a context that is modified by the specified filters.
Remarks: If the data has been filtered, the CALCULATE function changes the context in which the data is filtered, and evaluates the expression in the new context that you specify. For each column used in a filter argument, any existing filters on that column are removed, and the filter used in the filter argument is applied instead.
Okay, so we can use CALCULATE instead of a MEASURE to achieve the desired result with greater transparency of our calculation.

Note that the below calculated COLUMN in the Account table gives the correct numbers, just as using the [Total Orders] measure.

Total Orders C =
CALCULATE ( DISTINCTCOUNT ( SalesOrderDetails[Order Number] ) )


So if we substitute the expression for Total Orders C into our original Measure, we should get the correct numbers: 

Total Accounts C =
COUNTROWS (
    FILTER (
        Account,
        CALCULATE ( DISTINCTCOUNT ( SalesOrderDetails[Order Number] ) ) > 0
    )
)


And we do get the correct numbers! CALCULATE enables us to change the context of the Total Accounts measure and achieve the complex results we're looking for. 

Custom Visual Review: Charticulator

This is not your ordinary custom visual - this is EVERY custom visual. Charticulator puts the power to design and develop custom visuals to ...