Search

Search this blog:

Search This Blog

Calculating Percentile


How do you calculate percentile using DAX?

Let's start by defining percentile. The Oxford Dictionary defines it as:

one of the 100 equal groups that a larger group of people can be divided into, according to their place on a scale measuring a particular value

In statistics we define percentile as the percentage of values in a given set which fall below the selected value. This can be represented mathematically using the below formula:

P = (N-R)/N

Where:
N = the total count of values in the given set
R = the rank of the selected value compared to the given set in DESCENDING order
P = the percentile of the selected value in the given set

I have emphasized some of the key words here that will help us translate this definition into DAX. 

Step 1: N = the total count of values in the given set

We know we need to count the number of values in the set, for this we can use the COUNTROWS() function in DAX. However, in order to get total anything in DAX, we need to use a filter expression such as ALLSELECTED() to ignore any row context in the table or visuals we are using, but still honour the filter context of the slicers and filters we use. 

For each and every value in the given set, I want to return the same count, but I want that count to update with filters.

So if we have a sample population as given below, the total count of values is 5:

 Gender Value N
 Male 80
 Female 78
 Male 80
 Female 74
 Female 90

But when we filter for Males only, the total count of values becomes 2:

Gender Value N
 Male 802
 Male 80

We therefore define N as follows:
N= COUNTROWS(ALLSELECTED(GivenSet))

Step 2: R = the rank of the selected value compared to the given set in DESCENDING order

There are a few ways to calculate rankings in DAX, but we already discussed how I want this calculation to be dynamic and update based on slicer and filter selections. 

So if we consider the same population as given from above, the rank for the total population would be as follows, putting a Female in the highest rank of 1:

 Gender Value N R
 Male 80 52
 Female 78 53
 Male 78 53
 Female 74 55
 Female 90 51

But when we filter for Males only, the highest ranked value belongs to a male:

Gender Value N R
 Male 802 1
 Male 782 2


Therefore we need to use ALLSELECTED again, and also an RANKX function in DESCending order here:

R = RANKX(ALLSELECTED(GivenSet), CALCULATE(SUM(GivenSet[Value])), , ASC)

Step 3: P = the percentile of the selected value in the given set

That leaves us with the simple task of dividing our two values, so we will use the DIVIDE() function to avoid errors. 

From above, we know that: 

P = (N-R)/N

P = DIVIDE(N-R, N)

To save us having to copy the DAX for each, we can simply reuse the measures we already calculated for N and R. Because they are MEASURES, they will calculate within the Row and Filter context of our slicers and matrix. 

P = DIVIDE ([N]-[R] ,
    [N]
)

Sticking with our same sample set, we will get the below results for the entire population. Note our highest ranked male is in the 60th percentile (60% of the sample performed lower than he did): 

 Gender Value N R N-R P
 Male 80 52 3 60%
 Female 78 53 2 40%
 Male 78 53 2 40%
 Female 74 55 0 0%
 Female 90 51 4 80%

But when we filter for Males only, the highest ranked male now drops to the 50th percentile (only 50% of the sample performed lower than he did):

Gender Value N R N-R P
 Male 802 1 1 50%
 Male 782 2 0 0%


Real World Example: Birth Weight Percentile


All right, so let's see this in practical use. 

A common use of percentile is for comparing children's growth from birth. You have probably seen one of these growth charts in a doctor's office at some point in your life: 

What does it actually mean? Looking at birth weight for boys, according to this chart, the following percentiles apply: 
 Approx. Birth Weight (g)Percentile 
 25005
 280010
 320025
 360050 
 390075
 420080
 430095

So that means, on average 25% of babies born are less than 3200g at birth. 

I have compiled birth statistics from 2018 in the US, and used the above method for calculating N, R and P according to birth weight. 

N =
COUNTROWS (
    ALLSELECTED ( '2018 Births US'[Sex of Infant], '2018 Births US'[Index] )
)


R =
RANKX (
    ALLSELECTED ( '2018 Births US'[Index] ),
    CALCULATE ( AVERAGE ( '2018 Births US'[Birth Weight - Detail in Grams] ) ),
    ,
    DESC
)


Birth Weight Percentile =
DIVIDE ( [N] - [R], [N] )


Now we can see if these average percentiles apply to smaller sample sizes of the population, or if, for example, babies born in January to fathers with a Doctorate degree, fall above or below this average: 





Custom Visual Review: Charticulator

This is not your ordinary custom visual - this is EVERY custom visual. Charticulator puts the power to design and develop custom visuals to ...