Excel with Allison: Calculating Percentile

How do you calculate percentile using DAX?

Let's start by defining percentile. The Oxford Dictionary defines it as:

one of the 100 equal groups that a larger group of people can be divided into, according to their place on a scale measuring a particular value

In statistics we define percentile as the percentage of values in a given set which fall below the selected value. This can be represented mathematically using the below formula:

P = (N-R)/N

Where:

N = the total count of values in the given set

R = the rank of the selected value compared to the given set in DESCENDING order

P = the percentile of the selected value in the given set

I have emphasized some of the key words here that will help us translate this definition into DAX.

Step 1: N = the total count of values in the given set

We know we need to count the number of values in the set, for this we can use the COUNTROWS() function in DAX. However, in order to get total anything in DAX, we need to use a filter expression such as ALLSELECTED() to ignore any row context in the table or visuals we are using, but still honour the filter context of the slicers and filters we use.

For each and every value in the given set, I want to return the same count, but I want that count to update with filters.

So if we have a sample population as given below, the total count of values is 5:

Gender	Value	N
Male	80	5
Female	78	5
Male	80	5
Female	74	5
Female	90	5

But when we filter for Males only, the total count of values becomes 2:

Gender	Value	N
Male	80	2
Male	80	2

We therefore define N as follows:

N= COUNTROWS(ALLSELECTED(GivenSet))

Step 2: R = the rank of the selected value compared to the given set in DESCENDING order

There are a few ways to calculate rankings in DAX, but we already discussed how I want this calculation to be dynamic and update based on slicer and filter selections.

So if we consider the same population as given from above, the rank for the total population would be as follows, putting a Female in the highest rank of 1:

Gender	Value	N	R
Male	80	5	2
Female	78	5	3
Male	78	5	3
Female	74	5	5
Female	90	5	1

But when we filter for Males only, the highest ranked value belongs to a male:

Gender	Value	N	R
Male	80	2	1
Male	78	2	2

Therefore we need to use ALLSELECTED again, and also an RANKX function in DESCending order here:

R = RANKX(ALLSELECTED(GivenSet), CALCULATE(SUM(GivenSet[Value])), , ASC)

Step 3: P = the percentile of the selected value in the given set

That leaves us with the simple task of dividing our two values, so we will use the DIVIDE() function to avoid errors.

From above, we know that:

P = (N-R)/N

P = DIVIDE(N-R, N)

To save us having to copy the DAX for each, we can simply reuse the measures we already calculated for N and R. Because they are MEASURES, they will calculate within the Row and Filter context of our slicers and matrix.

P = DIVIDE ([N]-[R] ,

[N]

)

Sticking with our same sample set, we will get the below results for the entire population. Note our highest ranked male is in the 60th percentile (60% of the sample performed lower than he did):

Gender	Value	N	R	N-R	P
Male	80	5	2	3	60%
Female	78	5	3	2	40%
Male	78	5	3	2	40%
Female	74	5	5	0	0%
Female	90	5	1	4	80%

But when we filter for Males only, the highest ranked male now drops to the 50th percentile (only 50% of the sample performed lower than he did):

Gender	Value	N	R	N-R	P
Male	80	2	1	1	50%
Male	78	2	2	0	0%

Real World Example: Birth Weight Percentile

All right, so let's see this in practical use.

A common use of percentile is for comparing children's growth from birth. You have probably seen one of these growth charts in a doctor's office at some point in your life:

https://www.cdc.gov/growthcharts/data/set1clinical/cj41c017.pdf

CDC Growth Chart - Boys

What does it actually mean? Looking at birth weight for boys, according to this chart, the following percentiles apply:

Approx. Birth Weight (g)	Percentile
2500	5
2800	10
3200	25
3600	50
3900	75
4200	80
4300	95

So that means, on average 25% of babies born are less than 3200g at birth.

I have compiled birth statistics from 2018 in the US, and used the above method for calculating N, R and P according to birth weight.

N =
COUNTROWS (
ALLSELECTED ( '2018 Births US'[Sex of Infant], '2018 Births US'[Index] )
)

R =
RANKX (
ALLSELECTED ( '2018 Births US'[Index] ),
CALCULATE ( AVERAGE ( '2018 Births US'[Birth Weight - Detail in Grams] ) ),
,
DESC
)

Birth Weight Percentile =
DIVIDE ( [N] - [R], [N] )

Now we can see if these average percentiles apply to smaller sample sizes of the population, or if, for example, babies born in January to fathers with a Doctorate degree, fall above or below this average:

Excel with Allison

Search

Search This Blog

Calculating Percentile

Step 1: N = the total count of values in the given set

Step 2: R = the rank of the selected value compared to the given set in DESCENDING order

Step 3: P = the percentile of the selected value in the given set

Real World Example: Birth Weight Percentile

Custom Visual Review: Charticulator