Skip to content

Data Science Tutorials

  • Home
  • R
  • Statistics
  • Course
  • Machine Learning
  • Guest Blog
  • Contact
  • About Us
  • Toggle search form
  • How to Perform Bootstrapping in R
    How to Perform Bootstrapping in R R
  • Quantiles by Group calculation in R
    Quantiles by Group calculation in R with examples R
  • How to Add Superscripts and Subscripts to Plots in R?, The basic syntax for adding superscripts or subscripts to charts in R is as follows:
    How to Add Superscripts and Subscripts to Plots in R? R
  • The Multinomial Distribution in R
    The Multinomial Distribution in R R
  • Defensive Programming Strategies in R
    Defensive Programming Strategies in R Machine Learning
  • pheatmap function in R
    The pheatmap function in R R
  • Extract patterns in R
    Extract patterns in R? R
  • How to Use “not in” operator in Filter
    How to Use “not in” operator in Filter R
Box Cox transformation in R

Box Cox transformation in R

Posted on October 23October 24 By Jim No Comments on Box Cox transformation in R
Tweet
Share
Share
Pin

Box Cox transformation in R, The Box-Cox transformation is a power transformation that eliminates nonlinearity between variables, differing variances, and variable asymmetry.

The ability to turn a variable into a new variable with a normal distribution is therefore very helpful.

Box cox family

The following expression gives the Box-Cox functions transformations for various values of lambda:

being y the changed variable and lambda (λ) the transformation parameter However, the following table describes the most typical transformations:

λTransformation
-21/x^2
-11/x
-0.51/sqrt(x)
0log(x)
0.5sqrt(x)
1x
2x^2

In practise, it is advised to choose the value from the table rather than the precise value if the estimated transformation parameter is close to one of the values of the previous table because the value from the table is simpler to understand.

How to make a rounded corner bar plot in R? – Data Science Tutorials

The boxcox function in R

The boxcox function from the MASS package in R can be used to estimate the transformation parameter using maximum likelihood estimation.

We will also receive the parameter’s 95% confidence interval from this function. The following are the arguments for the function:

boxcox(object,    
       lambda = seq(-2, 2, 1/10), 
       plotit = TRUE,  
       interp,         
       eps = 1/50,     
       xlab = expression(lambda), 
       ylab = "log-Likelihood",   
       …) 

Example

Take into account the sample vector x below, which deviates from the normal distribution:

x <- c(0.2, 0.528, 0.11, 0.260, 0.091,
            1.314, 1.52, 0.244, 1.981, 0.273,
            0.461, 0.366, 1.407, 0.79, 2.266)


hist(x)

You must compute a linear model with the lm function and pass it to the boxcox function as shown below in order to determine the appropriate “lambda”:

How to create Radar Plot in R-ggradar – Data Science Tutorials

library(MASS)
boxcox(lm(x ~ 1))

Keep in mind that the others reflect the 95% confidence interval of the estimation, and the dashed vertical line in the middle represents the estimated parameter lambda hat.

The best choice is to apply the logarithmic transformation of the data because the preceding plot indicates that the 0 is inside the confidence interval of the optimal “lambda” and because the estimation of the parameter in this example is quite near to 0. (see the table of the first section).

How to Label Outliers in Boxplots in ggplot2? (datasciencetut.com)

# Transformed data
new <- log(x)
# Histogram
hist(new)

The data now appears to be more closely following a normal distribution, but you can also run a statistical test like the Shapiro-Wilk test to make sure:

shapiro.test(new)
Shapiro-Wilk normality test
data:  new
W = 0.94531, p-value = 0.4538

We lack evidence to reject the null hypothesis of normalcy because the p-value is higher than the typical levels of significance (1%, 5%, and 10%).

How to draw heatmap in r: Quick and Easy way – Data Science Tutorials

Extracting the exact lambda

You can determine the actual lambda using the following code if the confidence interval of the estimated parameter doesn’t fit with any of the table’s values:

library(MASS)
b <- boxcox(lm(x ~ 1))
# Exact lambda
lambda <- b$x[which.max(b$y)]
lambda
0.02020202

How to create Anatogram plot in R – Data Science Tutorials

Using the expression from the first part, you can now transform the variable:

new_x_exact <- (x ^ lambda - 1) / lambda

Check your inbox or spam folder to confirm your subscription.

Tweet
Share
Share
Pin
R

Post navigation

Previous Post: How to create Anatogram plot in R
Next Post: How to create a heatmap in R

Related Posts

  • The Uniform Distribution in R
    The Uniform Distribution in R R
  • How to create contingency tables in R
    How to create contingency tables in R? R
  • How to Use Spread Function in R
    How to Use Spread Function in R?-tidyr Part1 R
  • How to Use Bold Font in
    How to Use Bold Font in R with Examples R
  • Radar plot in R
    How to create Radar Plot in R-ggradar R
  • How to Use the Multinomial Distribution in R
    How to Use the Multinomial Distribution in R? R

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • About Us
  • Contact
  • Disclaimer
  • Guest Blog
  • Privacy Policy
  • YouTube
  • Twitter
  • Facebook
  • Defensive Programming Strategies in R
  • Plot categorical data in R
  • Top Data Modeling Tools for 2023
  • Ogive Graph in R
  • Is R or Python Better for Data Science in Bangalore

Check your inbox or spam folder to confirm your subscription.

  • Data Scientist Career Path Map in Finance
  • Is Python the ideal language for machine learning
  • Convert character string to name class object
  • How to play sound at end of R Script
  • Pattern Searching in R
  • Best Data Science YouTube Tutorials
    Best Data Science YouTube Tutorials Free to Learn Course
  • OLS Regression in R
    OLS Regression in R R
  • How To Become a Business Intelligence Analyst
    How To Become a Business Intelligence Analyst Course
  • Get the first value in each group in R
    Get the first value in each group in R? R
  • The Multinomial Distribution in R
    The Multinomial Distribution in R R
  • Tips for Rearranging Columns in R
    Tips for Rearranging Columns in R R
  • Comparing group means in R
    One way ANOVA Example in R-Quick Guide R
  • R Percentage by Group Calculation
    R Percentage by Group Calculation R

Copyright © 2023 Data Science Tutorials.

Powered by PressBook News WordPress theme