Quantiles by Group calculation in R, Quantiles are numbers in statistics that divide a ranking dataset into equal groups.
In R, we can use the following functions from the dplyr package to calculate quantiles grouped by a certain variable.
library(dplyr)
Identify the quantiles that you’re interested in.
q<-c(0.25, 0.5, 0.80)
Quantiles are calculated by grouping variables.
The following examples show how to use this syntax in practice.
df %>% group_by(grouping_variable) %>% summarize(quant25 = quantile(numeric_variable, probs = q[1]), quant50 = quantile(numeric_variable, probs = q[2]), quant80 = quantile(numeric_variable, probs = q[3]))
Quantiles by Group calculation in R
The following code demonstrates how to calculate the quantiles for a dataset’s number of victories sorted by team.
library(dplyr)
Now we can create a data frame
df <- data.frame(team=c('X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'), wins=c(12, 14, 24, 15, 8, 5, 13, 13, 13, 15, 12, 13, 10, 19, 19, 8, 12, 16, 15, 21, 20, 10, 15, 11))
Let’s see the first six rows of the data frame
head(df)
team wins 1 X 12 2 X 14 3 X 24 4 X 15 5 X 8 6 X 5
Identify the quantiles that you’re interested in.
q<-c(0.25, 0.5, 0.80)
Let’s calculate the quantiles by the grouping variable.
df %>% group_by(team) %>% summarize(quant25 = quantile(wins, probs = q[1]), quant50 = quantile(wins, probs = q[2]), quant80 = quantile(wins, probs = q[3]))
team quant25 quant50 quant80 <chr> <dbl> <dbl> <dbl> 1 C 11.8 15 18.4 2 X 11 13 14.6 3 Y 11.5 13 17.4
It’s worth noting that we can specify whatever number of quantiles we want:
define interest quantiles
q<-c(0.2, 0.4, 0.6, 0.8)
Now we can calculate quantiles by the grouping variable
df %>% group_by(team) %>% summarize(quant20 = quantile(wins, probs = q[1]), quant40 = quantile(wins, probs = q[2]), quant60 = quantile(wins, probs = q[3]), quant80 = quantile(wins, probs = q[4]))
team quant20 quant40 quant60 quant80 <chr> <dbl> <dbl> <dbl> <dbl> 1 C 11.4 14.4 15.2 18.4 2 X 9.6 12.8 13.2 14.6 3 Y 10.8 12.8 13.4 17.4
We also have the option of calculating only one quantile per group. For example, here’s how to figure out what the 95th percentile of each team’s victories is:
Calculate the team’s 95th percentile of victories.
Control Chart in Quality Control-Quick Guide – Data Science Tutorial
df %>% group_by(team) %>% summarize(quant95 = quantile(wins, probs = 0.95))
team quant95 <chr> <dbl> 1 C 20.6 2 X 20.8 3 Y 19
Cool, it’s working well.