How do you calculate percentile using DAX?
Let's start by defining percentile. The Oxford Dictionary defines it as:
one of the 100 equal groups that a larger group of people can be divided into, according to their place on a scale measuring a particular value
In statistics we define percentile as the percentage of values in a given set which fall below the selected value. This can be represented mathematically using the below formula:
P = (N-R)/N
Where:
N = the total count of values in the given set
R = the rank of the selected value compared to the given set in DESCENDING order
P = the percentile of the selected value in the given set
I have emphasized some of the key words here that will help us translate this definition into DAX.
Step 1: N = the total count of values in the given set
We know we need to count the number of values in the set, for this we can use the COUNTROWS() function in DAX. However, in order to get total anything in DAX, we need to use a filter expression such as ALLSELECTED() to ignore any row context in the table or visuals we are using, but still honour the filter context of the slicers and filters we use.
For each and every value in the given set, I want to return the same count, but I want that count to update with filters.
So if we have a sample population as given below, the total count of values is 5:
Gender | Value | N |
Male | 80 | 5 |
Female | 78 | 5 |
Male | 80 | 5 |
Female | 74 | 5 |
Female | 90 | 5 |
But when we filter for Males only, the total count of values becomes 2:
Gender | Value | N |
Male | 80 | 2 |
Male | 80 | 2 |
We therefore define N as follows:
N= COUNTROWS(ALLSELECTED(GivenSet))
Step 2: R = the rank of the selected value compared to the given set in DESCENDING order
There are a few ways to calculate rankings in DAX, but we already discussed how I want this calculation to be dynamic and update based on slicer and filter selections.
So if we consider the same population as given from above, the rank for the total population would be as follows, putting a Female in the highest rank of 1:
Gender | Value | N | R |
Male | 80 | 5 | 2 |
Female | 78 | 5 | 3 |
Male | 78 | 5 | 3 |
Female | 74 | 5 | 5 |
Female | 90 | 5 | 1 |
But when we filter for Males only, the highest ranked value belongs to a male:
Gender | Value | N | R |
Male | 80 | 2 | 1 |
Male | 78 | 2 | 2 |
Therefore we need to use ALLSELECTED again, and also an RANKX function in DESCending order here:
R = RANKX(ALLSELECTED(GivenSet), CALCULATE(SUM(GivenSet[Value])), , ASC)
Step 3: P = the percentile of the selected value in the given set
That leaves us with the simple task of dividing our two values, so we will use the DIVIDE() function to avoid errors.
From above, we know that:
P = (N-R)/N
P = (N-R)/N
P = DIVIDE(N-R, N)
To save us having to copy the DAX for each, we can simply reuse the measures we already calculated for N and R. Because they are MEASURES, they will calculate within the Row and Filter context of our slicers and matrix.
P = DIVIDE ([N]-[R] ,
[N]
)
Sticking with our same sample set, we will get the below results for the entire population. Note our highest ranked male is in the 60th percentile (60% of the sample performed lower than he did):
Gender | Value | N | R | N-R | P |
Male | 80 | 5 | 2 | 3 | 60% |
Female | 78 | 5 | 3 | 2 | 40% |
Male | 78 | 5 | 3 | 2 | 40% |
Female | 74 | 5 | 5 | 0 | 0% |
Female | 90 | 5 | 1 | 4 | 80% |
But when we filter for Males only, the highest ranked male now drops to the 50th percentile (only 50% of the sample performed lower than he did):
Gender | Value | N | R | N-R | P |
Male | 80 | 2 | 1 | 1 | 50% |
Male | 78 | 2 | 2 | 0 | 0% |
Real World Example: Birth Weight Percentile
All right, so let's see this in practical use.
A common use of percentile is for comparing children's growth from birth. You have probably seen one of these growth charts in a doctor's office at some point in your life:
What does it actually mean? Looking at birth weight for boys, according to this chart, the following percentiles apply:
Approx. Birth Weight (g) | Percentile |
2500 | 5 |
2800 | 10 |
3200 | 25 |
3600 | 50 |
3900 | 75 |
4200 | 80 |
4300 | 95 |
So that means, on average 25% of babies born are less than 3200g at birth.
I have compiled birth statistics from 2018 in the US, and used the above method for calculating N, R and P according to birth weight.
N =COUNTROWS (ALLSELECTED ( '2018 Births US'[Sex of Infant], '2018 Births US'[Index] ))R =RANKX (ALLSELECTED ( '2018 Births US'[Index] ),CALCULATE ( AVERAGE ( '2018 Births US'[Birth Weight - Detail in Grams] ) ),,DESC)Birth Weight Percentile =DIVIDE ( [N] - [R], [N] )
Now we can see if these average percentiles apply to smaller sample sizes of the population, or if, for example, babies born in January to fathers with a Doctorate degree, fall above or below this average: