How to Standardize Data in R?, A dataset must be scaled so that the mean value is 0 and the standard deviation is 1, which is known as standardization.
The z-score standardization, which scales numbers using the following formula, is the most used method for doing this.
Two-Way ANOVA Example in R-Quick Guide – Data Science Tutorials
(xi – xbar) / s
where:
xi: The ith value in the dataset
xbar: The sample mean
s: The sample standard deviation
The examples below demonstrate how to scale one or more variables in a data frame using the z-score standardization in R by using the scale() function and the dplyr package.
Standardize just one variable
In a data frame containing three variables, the following code demonstrates how to scale just one of the variables.
library(dplyr)
Now make this example reproducible
set.seed(123)
Now let’s create an original data frame
df <- data.frame(var1= runif(10, 0, 50), var2= runif(10, 2, 20), var3= runif(10, 5, 30))
Now we can view the original data frame
df
var1 var2 var3 1 14.378876 19.223000 27.238483 2 39.415257 10.160015 22.320085 3 20.448846 14.196271 21.012670 4 44.150870 12.307401 29.856744 5 47.023364 3.852644 21.392645 6 2.277825 18.196849 22.713262 7 26.405274 6.429579 18.601651 8 44.620952 2.757072 19.853551 9 27.571751 7.902573 12.228993 10 22.830737 19.181066 8.677841
scale var1 to have mean = 0 and standard deviation = 1
df2 <- df %>% mutate_at(c('var1'), ~(scale(.) %>% as.vector)) df2
var1 var2 var3 1 -0.98619132 19.223000 27.238483 2 0.71268801 10.160015 22.320085 3 -0.57430484 14.196271 21.012670 4 1.03402981 12.307401 29.856744 5 1.22894699 3.852644 21.392645 6 -1.80732540 18.196849 22.713262 7 -0.17012290 6.429579 18.601651 8 1.06592790 2.757072 19.853551 9 -0.09096999 7.902573 12.228993 10 -0.41267825 19.181066 8.677841
You’ll notice that the other two variables didn’t change; only the first variable was scaled.
The new scaled variable has a mean value of 0, and a standard deviation of 1, as we can immediately confirm.
Bind together two data frames by their rows or columns in R (datasciencetut.com)
compute the scaled variable’s mean.
mean(df2$var1) [1] 2.638406e-17 basically zero
calculate the scaled variable’s standard deviation.
sd(df2$var1) [1] 1
Standardize Multiple Variables
Multiple variables in a data frame can be scaled simultaneously using the code provided below:
scale var1 and var2 to have mean = 0 and standard deviation = 1
df3 <- df %>% mutate_at(c('var1', 'var2'), ~(scale(.) %>% as.vector)) df3
var1 var2 var3 1 -0.98619132 1.2570692 27.238483 2 0.71268801 -0.2031057 22.320085 3 -0.57430484 0.4471923 21.012670 4 1.03402981 0.1428686 29.856744 5 1.22894699 -1.2193121 21.392645 6 -1.80732540 1.0917418 22.713262 7 -0.17012290 -0.8041315 18.601651 8 1.06592790 -1.3958243 19.853551 9 -0.09096999 -0.5668114 12.228993 10 -0.41267825 1.2503130 8.677841
Standardize All Variables
Using the mutate_all function, the following code demonstrates how to scale each variable in a data frame.
scale all variables to have mean = 0 and standard deviation = 1
How to Rank by Group in R? – Data Science Tutorials
df4 <- df %>% mutate_all(~(scale(.) %>% as.vector)) df4
var1 var2 var3 1 -0.98619132 1.2570692 1.09158171 2 0.71268801 -0.2031057 0.30768348 3 -0.57430484 0.4471923 0.09930665 4 1.03402981 0.1428686 1.50888235 5 1.22894699 -1.2193121 0.15986731 6 -1.80732540 1.0917418 0.37034828 7 -0.17012290 -0.8041315 -0.28496363 8 1.06592790 -1.3958243 -0.08543481 9 -0.09096999 -0.5668114 -1.30064291 10 -0.41267825 1.2503130 -1.86662844