Skip to content

Data Science Tutorials

For Data Science Learners

  • Data Science Applications in Banking
    Data Science Applications in Banking Machine Learning
  • ggpairs in R
    ggpairs in R R
  • Group By Sum in R
    Group By Sum in R R
  • Top 10 online data science programmes
    Top 10 online data science programs Course
  • The Ultimate Guide to Becoming a Data Analyst
    The Ultimate Guide to Becoming a Data Analyst: A Step-by-Step Process Machine Learning
  • Arrange Data by Month in R
    Arrange Data by Month in R with example R
  • How to check regression analysis heteroscedasticity in R
    How to check regression analysis heteroscedasticity in R R
  • pheatmap function in R
    The pheatmap function in R R
Box Cox transformation in R

Box Cox transformation in R

Posted on October 23October 24 By Admin No Comments on Box Cox transformation in R

Box Cox transformation in R, The Box-Cox transformation is a power transformation that eliminates nonlinearity between variables, differing variances, and variable asymmetry.

The ability to turn a variable into a new variable with a normal distribution is therefore very helpful.

Box cox family

The following expression gives the Box-Cox functions transformations for various values of lambda:

being y the changed variable and lambda (λ) the transformation parameter However, the following table describes the most typical transformations:

λ Transformation
-2 1/x^2
-1 1/x
-0.5 1/sqrt(x)
0 log(x)
0.5 sqrt(x)
1 x
2 x^2

In practise, it is advised to choose the value from the table rather than the precise value if the estimated transformation parameter is close to one of the values of the previous table because the value from the table is simpler to understand.

How to make a rounded corner bar plot in R? – Data Science Tutorials

The boxcox function in R

The boxcox function from the MASS package in R can be used to estimate the transformation parameter using maximum likelihood estimation.

We will also receive the parameter’s 95% confidence interval from this function. The following are the arguments for the function:

boxcox(object,    
       lambda = seq(-2, 2, 1/10), 
       plotit = TRUE,  
       interp,         
       eps = 1/50,     
       xlab = expression(lambda), 
       ylab = "log-Likelihood",   
       …) 

Example

Take into account the sample vector x below, which deviates from the normal distribution:

x <- c(0.2, 0.528, 0.11, 0.260, 0.091,
            1.314, 1.52, 0.244, 1.981, 0.273,
            0.461, 0.366, 1.407, 0.79, 2.266)


hist(x)

You must compute a linear model with the lm function and pass it to the boxcox function as shown below in order to determine the appropriate “lambda”:

How to create Radar Plot in R-ggradar – Data Science Tutorials

library(MASS)
boxcox(lm(x ~ 1))

Keep in mind that the others reflect the 95% confidence interval of the estimation, and the dashed vertical line in the middle represents the estimated parameter lambda hat.

The best choice is to apply the logarithmic transformation of the data because the preceding plot indicates that the 0 is inside the confidence interval of the optimal “lambda” and because the estimation of the parameter in this example is quite near to 0. (see the table of the first section).

How to Label Outliers in Boxplots in ggplot2? (datasciencetut.com)

# Transformed data
new <- log(x)
# Histogram
hist(new)

The data now appears to be more closely following a normal distribution, but you can also run a statistical test like the Shapiro-Wilk test to make sure:

shapiro.test(new)
Shapiro-Wilk normality test
data:  new
W = 0.94531, p-value = 0.4538

We lack evidence to reject the null hypothesis of normalcy because the p-value is higher than the typical levels of significance (1%, 5%, and 10%).

How to draw heatmap in r: Quick and Easy way – Data Science Tutorials

Extracting the exact lambda

You can determine the actual lambda using the following code if the confidence interval of the estimated parameter doesn’t fit with any of the table’s values:

library(MASS)
b <- boxcox(lm(x ~ 1))
# Exact lambda
lambda <- b$x[which.max(b$y)]
lambda
0.02020202

How to create Anatogram plot in R – Data Science Tutorials

Using the expression from the first part, you can now transform the variable:

new_x_exact <- (x ^ lambda - 1) / lambda

Check your inbox or spam folder to confirm your subscription.

R

Post navigation

Previous Post: How to create Anatogram plot in R
Next Post: How to create a heatmap in R

Related Posts

  • How to Use Spread Function in R
    How to Use Spread Function in R?-tidyr Part1 R
  • Rounded corner bar plot in R
    How to make a rounded corner bar plot in R? R
  • Error in sum(List) : invalid 'type' (list) of argument
    Error in sum(List) : invalid ‘type’ (list) of argument R
  • ggpairs in R
    ggpairs in R R
  • Create new variables from existing variables in R
    Create new variables from existing variables in R R
  • Find the Maximum Value by Group in R
    Find the Maximum Value by Group in R R

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Best Prompt Engineering Books
  • Understanding Machine Learning and Data Science
  • Best Git Books
  • Top 5 Books to Learn Data Engineering
  • Mastering R Programming for Data Science: Tips and Tricks
  • About Us
  • Contact
  • Disclaimer
  • Privacy Policy

https://www.r-bloggers.com

  • YouTube
  • Twitter
  • Facebook
  • Course
  • Excel
  • Machine Learning
  • Opensesame
  • R
  • Statistics

Check your inbox or spam folder to confirm your subscription.

  • Determine the significance of a mediation effect in R
    Determine the significance of a mediation effect in R R
  • How to Calculate Ratios in R
    How to Calculate Ratios in R R
  • AI in Delivery Management
    AI in Delivery Management Machine Learning
  • How to Find Unmatched Records in R
    How to Find Unmatched Records in R R
  • Comparing group means in R
    One way ANOVA Example in R-Quick Guide R
  • How to Use the Multinomial Distribution in R
    How to Use the Multinomial Distribution in R? R
  • how to draw heatmap in r
    How to draw heatmap in r: Quick and Easy way R
  • Tips for Rearranging Columns in R
    Tips for Rearranging Columns in R R

Privacy Policy

Copyright © 2025 Data Science Tutorials.

Powered by PressBook News WordPress theme