Quantiles by Group calculation in R, Quantiles are numbers in statistics that divide a ranking dataset into equal groups.

In R, we can use the following functions from the dplyr package to calculate quantiles grouped by a certain variable.

library(dplyr)

Identify the quantiles that you’re interested in.

q<-c(0.25, 0.5, 0.80)

Quantiles are calculated by grouping variables.

The following examples show how to use this syntax in practice.

df %>% group_by(grouping_variable) %>% summarize(quant25 = quantile(numeric_variable, probs = q[1]), quant50 = quantile(numeric_variable, probs = q[2]), quant80 = quantile(numeric_variable, probs = q[3]))

## Quantiles by Group calculation in R

The following code demonstrates how to calculate the quantiles for a dataset’s number of victories sorted by team.

library(dplyr)

Now we can create a data frame

df <- data.frame(team=c('X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'), wins=c(12, 14, 24, 15, 8, 5, 13, 13, 13, 15, 12, 13, 10, 19, 19, 8, 12, 16, 15, 21, 20, 10, 15, 11))

Let’s see the first six rows of the data frame

head(df)

team wins 1 X 12 2 X 14 3 X 24 4 X 15 5 X 8 6 X 5

Identify the quantiles that you’re interested in.

q<-c(0.25, 0.5, 0.80)

Let’s calculate the quantiles by the grouping variable.

df %>% group_by(team) %>% summarize(quant25 = quantile(wins, probs = q[1]), quant50 = quantile(wins, probs = q[2]), quant80 = quantile(wins, probs = q[3]))

team quant25 quant50 quant80 <chr> <dbl> <dbl> <dbl> 1 C 11.8 15 18.4 2 X 11 13 14.6 3 Y 11.5 13 17.4

It’s worth noting that we can specify whatever number of quantiles we want:

define interest quantiles

q<-c(0.2, 0.4, 0.6, 0.8)

Now we can calculate quantiles by the grouping variable

df %>% group_by(team) %>% summarize(quant20 = quantile(wins, probs = q[1]), quant40 = quantile(wins, probs = q[2]), quant60 = quantile(wins, probs = q[3]), quant80 = quantile(wins, probs = q[4]))

team quant20 quant40 quant60 quant80 <chr> <dbl> <dbl> <dbl> <dbl> 1 C 11.4 14.4 15.2 18.4 2 X 9.6 12.8 13.2 14.6 3 Y 10.8 12.8 13.4 17.4

We also have the option of calculating only one quantile per group. For example, here’s how to figure out what the 95th percentile of each team’s victories is:

Calculate the team’s 95th percentile of victories.

Control Chart in Quality Control-Quick Guide – Data Science Tutorial

df %>% group_by(team) %>% summarize(quant95 = quantile(wins, probs = 0.95))

team quant95 <chr> <dbl> 1 C 20.6 2 X 20.8 3 Y 19

Cool, it’s working well.