How to Calculate Relative Frequencies in R?, The relative frequencies/proportions of values in one or more columns of a data frame can frequently be calculated in R.

Data Science Statistics Jobs » Are you looking for Data Science Jobs?

Fortunately, utilizing the dplyr package’s methods makes this task simple. This tutorial shows how to apply these functions to the following data frame to get relative frequencies:

Let’s create a data frame

df <- data.frame(team = c('P1', 'P1', 'P1', 'P2', 'P2', 'P2', 'P2'), position = c('R2', 'R1', 'R1', 'R2', 'R2', 'R1', 'R2'), points = c(102, 115, 119, 202, 132, 134, 212))

Now we can view the data frame

df

team position points 1 P1 R2 102 2 P1 R1 115 3 P1 R1 119 4 P2 R2 202 5 P2 R2 132 6 P2 R1 134 7 P2 R2 212

## Example 1: Relative Frequency of One Variable

The relative frequency of each team in the data frame can be calculated using the code below.

library(dplyr) df %>% group_by(team) %>% summarise(n = n()) %>% mutate(freq = n / sum(n))

team n freq <chr> <int> <dbl> 1 P1 3 0.429 2 P2 4 0.571

This reveals that team P1 is responsible for 42.9 percent of the data frame’s total rows while team P2 is responsible for the remaining 57.1 percent. Take note that they add up to 100% when combined.

Replace NA with Zero in R – Data Science Tutorials

## Example 2: Relative Frequency of Multiple Variables

The relative frequency of positions by team can be calculated using the code below:

library(dplyr) df %>% group_by(team, position) %>% summarise(n = n()) %>% mutate(freq = n / sum(n))

team position n freq <chr> <chr> <int> <dbl> 1 P1 R1 2 0.667 2 P1 R2 1 0.333 3 P2 R1 1 0.25 4 P2 R2 3 0.75

This tells us that:

Team P1 has 66.7 percent of its players in position R1.

Team P1 has 33.3 percent of their players in position R2.

Team P2 has 25.0% of its players in position R1.

Team P2 has 75.0 percent of its players in position R2.

How to Replace String in Column using R – Data Science Tutorials

## Example 3: Display Relative Frequencies as Percentages

The relative frequency of locations by team is calculated using the following code, and the relative frequencies are displayed as percentages:

library(dplyr) df %>% group_by(team, position) %>% summarise(n = n()) %>% mutate(freq = paste0(round(100 * n/sum(n), 0), '%'))

team position n freq <chr> <chr> <int> <chr> 1 P1 R1 2 67% 2 P1 R2 1 33% 3 P2 R1 1 25% 4 P2 R2 3 75%