Skip to content

Data Science Tutorials

  • Home
  • R
  • Statistics
  • Course
  • Machine Learning
  • Guest Blog
  • Contact
  • About Us
  • Toggle search form
  • Convert Multiple Columns to Numeric in R
    Convert Multiple Columns to Numeric in R R
  • Two-Way ANOVA Example in R
    How to perform a one-sample t-test in R? R
  • How to Count Distinct Values in R
    How to Count Distinct Values in R R
  • Remove Columns from a data frame
    How to Remove Columns from a data frame in R R
  • How to perform TBATS Model in R
    How to perform TBATS Model in R R
  • Ad Hoc Analysis
    What is Ad Hoc Analysis? Statistics
  • How to change the column positions in R?
    How to change the column positions in R? R
  • Get the first value in each group in R
    Get the first value in each group in R? R
How to Standardize Data in R

How to Standardize Data in R?

Posted on July 28July 27 By Jim No Comments on How to Standardize Data in R?
Tweet
Share
Share
Pin

How to Standardize Data in R?, A dataset must be scaled so that the mean value is 0 and the standard deviation is 1, which is known as standardization.

The z-score standardization, which scales numbers using the following formula, is the most used method for doing this.

Two-Way ANOVA Example in R-Quick Guide – Data Science Tutorials

(xi – xbar) / s

where:

xi: The ith value in the dataset

xbar: The sample mean

s: The sample standard deviation

The examples below demonstrate how to scale one or more variables in a data frame using the z-score standardization in R by using the scale() function and the dplyr package.

Standardize just one variable

In a data frame containing three variables, the following code demonstrates how to scale just one of the variables.

library(dplyr)

Now make this example reproducible

set.seed(123)

Now let’s create an original data frame

df <- data.frame(var1= runif(10, 0, 50),
                 var2= runif(10, 2, 20),
                 var3= runif(10, 5, 30))

Now we can view the original data frame

df
        var1      var2      var3
1  14.378876 19.223000 27.238483
2  39.415257 10.160015 22.320085
3  20.448846 14.196271 21.012670
4  44.150870 12.307401 29.856744
5  47.023364  3.852644 21.392645
6   2.277825 18.196849 22.713262
7  26.405274  6.429579 18.601651
8  44.620952  2.757072 19.853551
9  27.571751  7.902573 12.228993
10 22.830737 19.181066  8.677841

scale var1 to have mean = 0 and standard deviation = 1

df2 <- df %>% mutate_at(c('var1'), ~(scale(.) %>% as.vector))
df2
         var1      var2      var3
1  -0.98619132 19.223000 27.238483
2   0.71268801 10.160015 22.320085
3  -0.57430484 14.196271 21.012670
4   1.03402981 12.307401 29.856744
5   1.22894699  3.852644 21.392645
6  -1.80732540 18.196849 22.713262
7  -0.17012290  6.429579 18.601651
8   1.06592790  2.757072 19.853551
9  -0.09096999  7.902573 12.228993
10 -0.41267825 19.181066  8.677841

You’ll notice that the other two variables didn’t change; only the first variable was scaled.

The new scaled variable has a mean value of 0, and a standard deviation of 1, as we can immediately confirm.

Bind together two data frames by their rows or columns in R (datasciencetut.com)

compute the scaled variable’s mean.

mean(df2$var1)
[1] 2.638406e-17 basically zero

calculate the scaled variable’s standard deviation.

sd(df2$var1)
[1] 1

Standardize Multiple Variables

Multiple variables in a data frame can be scaled simultaneously using the code provided below:

scale var1 and var2 to have mean = 0 and standard deviation = 1

df3 <- df %>% mutate_at(c('var1', 'var2'), ~(scale(.) %>% as.vector))
df3
       var1       var2      var3
1  -0.98619132  1.2570692 27.238483
2   0.71268801 -0.2031057 22.320085
3  -0.57430484  0.4471923 21.012670
4   1.03402981  0.1428686 29.856744
5   1.22894699 -1.2193121 21.392645
6  -1.80732540  1.0917418 22.713262
7  -0.17012290 -0.8041315 18.601651
8   1.06592790 -1.3958243 19.853551
9  -0.09096999 -0.5668114 12.228993
10 -0.41267825  1.2503130  8.677841

Standardize All Variables

Using the mutate_all function, the following code demonstrates how to scale each variable in a data frame.

scale all variables to have mean = 0 and standard deviation = 1

How to Rank by Group in R? – Data Science Tutorials

df4 <- df %>% mutate_all(~(scale(.) %>% as.vector))
df4
        var1       var2        var3
1  -0.98619132  1.2570692  1.09158171
2   0.71268801 -0.2031057  0.30768348
3  -0.57430484  0.4471923  0.09930665
4   1.03402981  0.1428686  1.50888235
5   1.22894699 -1.2193121  0.15986731
6  -1.80732540  1.0917418  0.37034828
7  -0.17012290 -0.8041315 -0.28496363
8   1.06592790 -1.3958243 -0.08543481
9  -0.09096999 -0.5668114 -1.30064291
10 -0.41267825  1.2503130 -1.86662844

Check your inbox or spam folder to confirm your subscription.

Tweet
Share
Share
Pin
R

Post navigation

Previous Post: How to Create an Interaction Plot in R?
Next Post: How to convert characters from upper to lower case in R?

Related Posts

  • How to Use Gather Function in R
    How to Use Gather Function in R?-tidyr Part2 R
  • How to Create Summary Tables in R
    How to Create Summary Tables in R R
  • pheatmap function in R
    The pheatmap function in R R
  • Extract patterns in R
    Extract patterns in R? R
  • Is Data Science a Dying Profession
    Is Data Science a Dying Profession? R
  • Boosting in Machine Learning
    Boosting in Machine Learning:-A Brief Overview Machine Learning

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • About Us
  • Contact
  • Disclaimer
  • Guest Blog
  • Privacy Policy
  • YouTube
  • Twitter
  • Facebook
  • Top 7 Skills Required to Become a Data Scientist
  • Learn Hadoop for Data Science
  • How Do Online Criminals Acquire Sensitive Data
  • Top Reasons To Learn R in 2023
  • Linear Interpolation in R-approx

Check your inbox or spam folder to confirm your subscription.

 https://www.r-bloggers.com
  • How to Create Summary Tables in R
    How to Create Summary Tables in R R
  • How to Analyze Likert Scale Data
    How to Analyze Likert Scale Data? Statistics
  • How to add columns to a data frame in R
    How to add columns to a data frame in R R
  • How to Implement the Sklearn Predict Approach
    How to Implement the Sklearn Predict Approach? R
  • How to Recode Values in R
    How to Recode Values in R R
  • best books about data analytics
    Best Books to learn Tensorflow Course
  • How to perform MANOVA test in R
    How to perform the MANOVA test in R? R
  • rejection region in hypothesis testing
    Rejection Region in Hypothesis Testing Statistics

Copyright © 2023 Data Science Tutorials.

Powered by PressBook News WordPress theme