How to apply a transformation to multiple columns in R?, To apply a transformation to many columns, use R’s across() function from the dplyr package.
How to apply a transformation to multiple columns in R?
There are innumerable applications for this function, however, the following examples highlight some typical ones:
First Approach: Apply Function to Several Columns
Multiply values in col1 and col2 by 2
df %>% mutate(across(c(col1, col2), function(x) x*2))
Second Approach: One Summary Statistic for Multiple Columns can be Calculated
calculate the mean of col1 and col2
df %>% summarise(across(c(col1, col2), mean, na.rm=TRUE))
Third Approach: Multiple Summary Statistics to be Calculated for Multiple Columns
Calculate the mean and standard deviation for col1 and col2
df %>% summarise(across(c(col1, col2), list(mean=mean, sd=sd), na.rm=TRUE))
The examples below demonstrate each technique using the given data frame.
Subset rows based on their integer locations
Let’s create a data frame
df <- data.frame(team=c('P1', 'P1', 'P1', 'P2', 'P2', 'P2'), points=c(26, 22, 28, 15, 32, 28), rebounds=c(16, 15, 16, 12, 13, 10))
Now we can view the data frame
df
team points rebounds 1 P1 26 16 2 P1 22 15 3 P1 28 16 4 P2 15 12 5 P2 32 13 6 P2 28 10
Example 1: Apply Function to Multiple Columns
The values in the columns for points and rebounds can be multiplied by 2 using the across() function by using the following code.
library(dplyr)
Multiply by two to the values in the columns for points and rebounds.
df %>% mutate(across(c(points, rebounds), function(x) x*2))
team points rebounds 1 P1 52 32 2 P1 44 30 3 P1 56 32 4 P2 30 24 5 P2 64 26 6 P2 56 20
Example 2: One Summary Statistic for Multiple Columns can be Calculated
The across() function can be used to determine the mean value for both the points and rebound columns using the following sample code.
How to do Conditional Mutate in R? – Data Science Tutorials
the average value of the columns for points and rebounds.
df %>% summarise(across(c(points, rebounds), mean, na.rm=TRUE))
points rebounds 1 25.16667 13.66667
Be aware that we can also use the is.numeric function to have the data frame’s numeric columns generate a summary statistic automatically.
Calculate the mean value for each column of numbers in the data frame.
df %>% summarise(across(where(is.numeric), mean, na.rm=TRUE))
points rebounds 1 25.16667 13.66667
Example 3: Multiple Summary Statistics to be Calculated for Multiple Columns
The across() function may be used to determine the mean and standard deviation of the points and rebounds columns using the following code.
Compute the mean and standard deviation for the columns of points and rebounds.
df %>% summarise(across(c(points, rebounds), list(mean=mean, sd=sd), na.rm=TRUE)) points_mean points_sd rebounds_mean rebounds_sd 1 25.16667 5.946988 13.66667 2.42212
Now we are almost complete with dplyr package techniques. We will discuss transmute() function in an upcoming post.
How to change the column positions in R? – Data Science Tutorials