Do you like my pun on this one - Average is Mean, get it?
Ok, I won't quite my day job for a gig as a comedian, but I do want to take a look at the built in Power BI average functions today.
Problem
I've had a few questions lately about calculating AVERAGE in Power BI, and not getting the totals you might expect. How can we ensure the that OVERALL average calculates as we expect?
The solution may lie in AVERAGEX function, but we still need to fully understand what the X means, and also how and when AVERAGE is calculated in Power BI.
To demonstrate this, I'm going to use a very simple data model for this post. Just three tables: Orders, Customer and a Date table. They are related in a typical star schema, with Orders as the Fact table.
Definition of AVERAGE
So the first thing to note is the definition of AVERAGE. We calculate average of a set of numbers by adding up all the numbers and dividing by the count of numbers.
If we look at our Orders table:
Let's take the average of Sales Amount.
First I need to add up all the numbers:
8 + 10 + 20 + 10 + 12 + 25 + 15 + 5 + 15 = 120
Then I need to divide by the count of total orders, which is 9.
So that leaves us with 120/9 = 13.333, therefore the Average Sales Amount = 13.33
That's consistent with what we get in Power BI when we calculate
Average Sales = AVERAGE(Orders[Sales Amount])
As you can see in the table below, Average Sales is 13.33 as we calculated:
We can even see that at each row level, the same rule applies: Average Sales = Sum of Sales/Count of Orders. So when you look at the table above, you could calculate the 'Total' Average Sales two ways:
Method 1)
Take the average of the average sales for each row: AVERAGE(8 , 10 , 20 , 10 , 12 , 25 , 15 , 5 , 15) = 120/9
Method 2)
Take the 'Total' row SUM of Sales and divide by the 'Total' row Count of Sales: 120/9
Both methods give the same result of 13.33.
However, this only applies when we're looking at the same level of granularity as we have in our raw data. You will notice that the table above has one row for each OrderID, just like our Orders table.
What happens when we change our granularity? Let's say for example I only want to look at the average sales each day?
We can see here that my Average Sales hasn't changed - it's still 13.33. Let's see if we can get this number with our two methods of calculating average:
Method 1) AVERAGE(8, 15, 15.67, 11.67) = 50.34/4 = 12.585
Method 2) 120/9 = 13.3333
Method 2 hasn't changed a bit, but Method 1 is giving a completely different answer. This is because the AVERAGE() function in DAX will always calculate over the entire data model.
If I want to first calculate the average for each date, and then average those averages, I need to provide a row context for DAX to perform that first calculation of average. We'll use the iterator AVERAGEX to provide that row context.
In this case, if we want to first calculate the average for each date, I need to compute averages over the Date table: AVERAGEX(Date) A = AVERAGEX(Date, [Average Sales])
In the above table, the AVERAGEX(Date) A provides row context of the Date table and is also being used in a visualization that provides row context for that same Date table, so if we try our two methods for calculating average:
Method 1) AVERAGE(8, 15, 15.67, 11.67) = 50.34/4 = 12.585
Method 2) 120/9 = 13.3333
Method 1) works, but Method 2) doesn't. How do we make Method 2) work? You guessed it, provide the row context for calculating the 'total' of the averages instead of just the grand total sum:
SUMX(Date) = SUMX(Date, [Average Sales])
Note we will need to do the same thing with our Count, and calculate the total number of dates rather than the total number of sales:
Count of Dates = DISTINCTCOUNT(Orders[Date])
Adding these two new measures to our table with Date granularity gives:
That works - now if we look at the last three columns in the above table, we apply both methods successfully:
Method 1) AVERAGE(8, 15, 15.67, 11.67) = 50.34/4 = 12.585
Method 2) 50.33/4=12.5825
Aside from rounding errors, these two numbers are identical.
CONCLUSION
So what have we learned? Average is calculated by adding up all numbers in a set and dividing by the count of numbers in that set.
AVERAGEX changes the row context for that calculation and can help change how many numbers we have total, and also change the value of each of those numbers (which may or may not change the total).
If we use the AVERAGEX in a visualization with the same row context, our totals row will 'make sense' and we can use both methods to calculate the 'total' average, or overall average.
If we use the AVERAGEX in a visualization with a different row context, our totals row may not 'make sense' and we cannot use Method 1) for calculating the overall average. If you can wrap your head around this concept, you will be much more successful with your DAX formulas and calculations.
Bonus:
To see if you understand what's happening with the row context, let's change it one more time.
If I change the row context and try to put
AVERAGEX(Date) A = AVERAGEX(Date, [Average Sales])
in a visual with Customer row context, Method 1 won't work.
In the table above, we have AVERAGEX(Date) A = 7.67+15+20+12 = 54.67/4 = 13.6675
QUESTION:
Can you figure out how I calculated SUMX(Customer) to get Method 2) to work?
ANSWER:
SUMX(Customer) = SUMX(Customer, AVERAGEX(Date, [Average Sales]))
Or, written another way:
SUMX(Customer) = SUMX(Customer, [AVERAGEX(Date) A])
Note that in the context of the Customer table visualization, SUMX(Customer, [AVERAGEX(Date) A]) = AVERAGEX(Date) A for each customer, only the total row differs.
So what?
So what are we actually viewing when we use AVERAGEX(Date) A in the context of the Customer table? We're taking the Average Sales for each Date broken down by each Customer. We haven't changed the base measure - that's still calculating the AVERAGE(Orders[Sales Amount]). All we have changed is the context and the granularity at which it's calculated.
Let's build a hierarchy of Customer, Date, OrderID:
I want to focus on Customer C in the above table, because the AVERAGEX(Date) differs from the Average Sales for Customer C.
Why? Because Customer C had orders spread UNEVENLY across multiple dates. So when we average the sales for Customer C by date first, that converts the TWO values of 15 on 11 September (for OrderIDs 7 and 9) into ONE value, giving them each a lesser weighting when averaging with the ONE value of 25 on 10 September (for OrderID 6).
So we can see that the OVERALL average sale for Customer C is 18.33 = (25+15+15)/3. However, when we use AVERAGEX(Date) we are averaging the DAILY average sale for Customer C, which is 20 = (25+15)/2.
In the image below, you can see a compilation of AVERAGEX statements used in the context of Customer and Date visualizations.
There are a couple of things to note here in this compilation:
- AVERAGEX of Total Sales gives a different result than AVERAGEX of Average Sales.
- In the TOTAL row: The row context of the visualization DOES NOT MATTER in the TOTAL row. Only the row context provided by the DAX MEASURE matters in the TOTAL row. You can see this in the highlighted green totals - they are identical for both Customer and Date visualization row contexts, and this is not a coincidence.
- AVERAGEX of Average Sales only differs from the Average Sales when the individual transactions are NOT EVENLY distributed as discussed above. The examples of this are highlighted in purple and blue in the image below.