Search

Search this blog:

Search This Blog

Reporting Order of Operations


 We all remember learning order of operations in school, whether it was Please Excuse My Dear Aunt Sally, or BEDMAS, the order of operations is vital in understanding mathematics and getting the desired result from your calculations. 

Power BI and Excel are no different. We must understand the order of operations when building our reports in order to achieve the desired results. 


Order of Operations

Power BI/Excel order of operations is basically as follows: 

  1. Raw data source updates/changes 
  2. Power Query operations (M code) 
  3. DAX tables and columns 
  4. Slicers 
  5. DAX measures 

When you click Refresh, it will go back to the Raw data source and check for updates, apply the query changes from Power Query, then calculate that DAX tables and columns and THEN apply the slicer selections. This means that only DAX measures update based on slicer selections, not DAX columns.

Relevance to Real World

Many questions stem from not understanding the order in which Power BI (or Excel) apply the transformations to our data source. It works a bit like an onion, adding layer on top of layer. You cannot use the output of a layer that hasn't yet been applied. 

This problem is most commonly encountered when working with calculated COLUMNS in DAX. Calculated columns are essentially part of the Data Model, and can usually be done in Power Query (which is what I recommend when possible). They are calculated on data refresh, and DO NOT CHANGE based on report filters or slicer selections.

Calculated MEASURES in DAX calculate AFTER all report filters and slicer selections. This makes them more dynamic and able to be manipulated easily by the report viewer or end user. DAX measures are the key to powerful insights. 


Power BI Webinar: Opportunity Analysis


View Power BI Webinar

If you missed the webinar last Friday, you can still tune in via the link above. See how using unrelated tables can help you gain insights into your customers and their habits. Who is your target market? What customers have NOT bought a specific product? Analyze your opportunities with Power BI.


Teams Tip: Keep similar topics together


 Finally! Thank you Microsoft for making this feature more obvious in Teams desktop: 



Despite all my best efforts to stress the importance of replying to existing topics in order to keep like messages together, people still started New conversation instead. This breaks the messages into two (or more) buckets and makes it challenging to follow a cohesive thread. 

Now the 'New conversation' option is a button you must click to access. Hopefully this will make it less confusing and decrease the number of accidental new conversations in Teams. 

Remember, only start a new conversation if you want to discuss a completely new topic. If you do, it's best practice to give your conversation a subject line. Click the A icon at bottom left to do this - it makes things easier to find later. 





What does AVERAGE mean?


 Do you like my pun on this one - Average is Mean, get it? 

Ok, I won't quite my day job for a gig as a comedian, but I do want to take a look at the built in Power BI average functions today. 

Problem

I've had a few questions lately about calculating AVERAGE in Power BI, and not getting the totals you might expect. How can we ensure the that OVERALL average calculates as we expect? 

The solution may lie in AVERAGEX function, but we still need to fully understand what the X means, and also how and when AVERAGE is calculated in Power BI. 

To demonstrate this, I'm going to use a very simple data model for this post. Just three tables: Orders, Customer and a Date table. They are related in a typical star schema, with Orders as the Fact table.

Definition of AVERAGE

So the first thing to note is the definition of AVERAGE. We calculate average of a set of numbers by adding up all the numbers and dividing by the count of numbers. 

If we look at our Orders table:

Let's take the average of Sales Amount.

First I need to add up all the numbers: 

8 + 10 + 20 + 10 + 12 + 25 + 15 + 5 + 15 = 120

Then I need to divide by the count of total orders, which is 9.

So that leaves us with 120/9 = 13.333, therefore the Average Sales Amount = 13.33


That's consistent with what we get in Power BI when we calculate 

Average Sales = AVERAGE(Orders[Sales Amount])

As you can see in the table below, Average Sales is 13.33 as we calculated:


We can even see that at each row level, the same rule applies: Average Sales = Sum of Sales/Count of Orders. So when you look at the table above, you could calculate the 'Total' Average Sales two ways: 

Method 1)

Take the average of the average sales for each row: AVERAGE(8 , 10 , 20 , 10 , 12 , 25 , 15 , 5 , 15) = 120/9

Method 2) 

Take the 'Total' row SUM of Sales and divide by the 'Total' row Count of Sales: 120/9


Both methods give the same result of 13.33.

However, this only applies when we're looking at the same level of granularity as we have in our raw data. You will notice that the table above has one row for each OrderID, just like our Orders table. 

What happens when we change our granularity? Let's say for example I only want to look at the average sales each day?

We can see here that my Average Sales hasn't changed - it's still 13.33. Let's see if we can get this number with our two methods of calculating average: 

Method 1) AVERAGE(8, 15, 15.67, 11.67) = 50.34/4 = 12.585 

Method 2) 120/9 = 13.3333 

Method 2 hasn't changed a bit, but Method 1 is giving a completely different answer. This is because the AVERAGE() function in DAX will always calculate over the entire data model. 

If I want to first calculate the average for each date, and then average those averages, I need to provide a row context for DAX to perform that first calculation of average. We'll use the iterator AVERAGEX to provide that row context. 

In this case, if we want to first calculate the average for each date, I need to compute averages over the Date table: AVERAGEX(Date) A = AVERAGEX(Date, [Average Sales])


In the above table, the AVERAGEX(Date) A provides row context of the Date table and is also being used in a visualization that provides row context for that same Date table, so if we try our two methods for calculating average:

Method 1) AVERAGE(8, 15, 15.67, 11.67) = 50.34/4 = 12.585 

Method 2) 120/9 = 13.3333 

Method 1) works, but Method 2) doesn't. How do we make Method 2) work? You guessed it, provide the row context for calculating the 'total' of the averages instead of just the grand total sum: 

SUMX(Date) = SUMX(Date, [Average Sales])

Note we will need to do the same thing with our Count, and calculate the total number of dates rather than the total number of sales:

Count of Dates = DISTINCTCOUNT(Orders[Date])

Adding these two new measures to our table with Date granularity gives:


That works - now if we look at the last three columns in the above table, we apply both methods successfully: 

Method 1) AVERAGE(8, 15, 15.67, 11.67) = 50.34/4 = 12.585 

Method 2) 50.33/4=12.5825  

Aside from rounding errors, these two numbers are identical. 

CONCLUSION

So what have we learned? Average is calculated by adding up all numbers in a set and dividing by the count of numbers in that set. 

AVERAGEX changes the row context for that calculation and can help change how many numbers we have total, and also change the value of each of those numbers (which may or may not change the total). 

If we use the AVERAGEX in a visualization with the same row context, our totals row will 'make sense' and we can use both methods to calculate the 'total' average, or overall average. 

If we use the AVERAGEX in a visualization with a different row context, our totals row may not 'make sense' and we cannot use Method 1) for calculating the overall average. If you can wrap your head around this concept, you will be much more successful with your DAX formulas and calculations. 

Bonus: 

To see if you understand what's happening with the row context, let's change it one more time. 

If I change the row context and try to put 

AVERAGEX(Date) A = AVERAGEX(Date, [Average Sales])

in a visual with Customer row context, Method 1 won't work. 

In the table above, we have AVERAGEX(Date) A = 7.67+15+20+12 =  54.67/4 = 13.6675   

QUESTION: 

Can you figure out how I calculated SUMX(Customer) to get Method 2) to work? 

ANSWER:

SUMX(Customer) = SUMX(Customer, AVERAGEX(Date, [Average Sales]))

Or, written another way: 

SUMX(Customer) = SUMX(Customer, [AVERAGEX(Date) A])

Note that in the context of the Customer table visualization, SUMX(Customer, [AVERAGEX(Date) A]) = AVERAGEX(Date) A for each customer, only the total row differs.

So what?

So what are we actually viewing when we use AVERAGEX(Date) A in the context of the Customer table? We're taking the Average Sales for each Date broken down by each Customer. We haven't changed the base measure - that's still calculating the AVERAGE(Orders[Sales Amount]). All we have changed is the context and the granularity at which it's calculated. 

Let's build a hierarchy of Customer, Date, OrderID: 

I want to focus on Customer C in the above table, because the AVERAGEX(Date) differs from the Average Sales for Customer C. 

Why? Because Customer C had orders spread UNEVENLY across multiple dates. So when we average the sales for Customer C by date first, that converts the TWO values of 15 on 11 September (for OrderIDs 7 and 9) into ONE value, giving them each a lesser weighting when averaging with the ONE value of 25 on 10 September (for OrderID 6). 

So we can see that the OVERALL average sale for Customer C is 18.33 = (25+15+15)/3. However, when we use AVERAGEX(Date) we are averaging the DAILY average sale for Customer C, which is 20 = (25+15)/2. 

In the image below, you can see a compilation of AVERAGEX statements used in the context of Customer and Date visualizations. 

There are a couple of things to note here in this compilation:

  • AVERAGEX of Total Sales gives a different result than AVERAGEX of Average Sales.
  • In the TOTAL row: The row context of the visualization DOES NOT MATTER in the TOTAL row. Only the row context provided by the DAX MEASURE matters in the TOTAL row. You can see this in the highlighted green totals - they are identical for both Customer and Date visualization row contexts, and this is not a coincidence.
  • AVERAGEX of Average Sales only differs from the Average Sales when the individual transactions are NOT EVENLY distributed as discussed above. The examples of this are highlighted in purple and blue in the image below. 




Custom Visual Review: Charticulator

This is not your ordinary custom visual - this is EVERY custom visual. Charticulator puts the power to design and develop custom visuals to ...