Skip to content

Data Science Tutorials

  • Home
  • R
  • Statistics
  • Course
  • Contact
  • About Us
  • Toggle search form
  • How to change the column positions in R?
    How to change the column positions in R? R
  • Two-Way ANOVA Example in R
    How to perform a one-sample t-test in R? R
  • Augmented Dickey-Fuller Test in R
    Augmented Dickey-Fuller Test in R R
  • How to create contingency tables in R
    How to create contingency tables in R? R
  • best books about data analytics
    Best Books to learn Tensorflow Course
  • Count Observations by Group in R
    Count Observations by Group in R R
  • How to Standardize Data in R
    How to Standardize Data in R? R
  • Best Online Course For Statistics
    Free Best Online Course For Statistics Course
test for normal distribution in r

Test for Normal Distribution in R-Quick Guide

Posted on May 6May 12 By Jim No Comments on Test for Normal Distribution in R-Quick Guide
Tweet
Share
Share
Pin

Test for Normal Distribution in R, Many statistical tests, such as correlation, regression, t-test, and analysis of variance (ANOVA), presuppose that the data has particular features.

They demand that the data follow a normal or Gaussian distribution. These tests are known as parametric tests since their validity is determined by the data distribution.

Normality and other assumptions made by these tests should be considered carefully in order to obtain meaningful results and interpretations from the research.

We should do some preliminary tests before utilizing a parametric test to ensure that the test assumptions are met.

Non-parametric tests are indicated in cases where the assumptions are violated.

We’ll go over how to check the data for normality using visual examination and significance tests.

Let’s install the dplyr package, dplyr used for data manipulation.

install.packages("dplyr")

ggpubr is a simple ggplot2-based data visualization tool.

install.packages("ggpubr")

Load required R packages

library("dplyr")
library("ggpubr")

Now we can import data into R

data <- ToothGrowth
head(data)
   len supp dose
1  4.2   VC  0.5
2 11.5   VC  0.5
3  7.3   VC  0.5
4  5.8   VC  0.5
5  6.4   VC  0.5
6 10.0   VC  0.5

We wish to see if the tooth length variable, len, is normally distributed.

dim(data)
[1] 60 3

Large sample sizes in this case. We can ignore the data distribution and utilize parametric testing if the sample size is large enough (n > 30).

The central limit theorem states that if the sample size is high enough (n > 30), the sampling distribution will tend to be normal no matter what distribution items have.

Normality can be assessed visually [normal plots (histogram), Q-Q plot (quantile-quantile plot)] or by significance tests to ensure consistency.

Visual techniques

Visual checks for normalcy include the density plot and the Q-Q plot.

The density plot is used to determine whether the distribution is bell-shaped.

library("ggpubr")
ggdensity(data$len,
          main = "Density plot",
          xlab = "Tooth length")

The Q-Q plot (also known as the quantile-quantile plot) depicts the relationship between a sample and the normal distribution. Also plotted is a 45-degree reference line.

library(ggpubr)
ggqqplot(data$len)

The function qqPlot() can also be used in the car package.

library("car")
qqPlot(data$len)

We can infer normality because all of the points lie roughly along this reference line.

Test for normality

The previous section’s description of visual inspection is frequently erroneous.

A significance test can be used to determine whether data exhibit a significant deviation from normalcy by comparing the sample distribution to a normal distribution.

The Kolmogorov-Smirnov (K-S) normality test and the Shapiro-test Wilk’s are two examples of normality tests.

“Sample distribution is normal,” is the null hypothesis in these tests. The distribution is non-normal if the test is significant.

For normality tests, Shapiro-approach Wilk’s is frequently preferred because it has more power than K-S. It is based on the data’s association with the relevant normal scores.

It’s worth noting that the normalcy test is affected by sample size. The majority of small samples pass normalcy testing.

In order to make the best decision, it’s crucial to combine visual assessment and significance testing.

The Shapiro-Wilk test of normality for one variable (univariate) can be performed with the R function shapiro.test().

Methods for Integrating R and Hadoop complete Guide

shapiro.test(data$len)
Shapiro-Wilk normality test
data:  data$len
W = 0.96743, p-value = 0.1091

Conclusion

The p-value > 0.05 in the output indicates that the data distribution is not substantially different from the normal distribution. To put it another way, we can assume normality.

Tweet
Share
Share
Pin
R Tags:normality, qqplot, shapiro

Post navigation

Previous Post: glm function in r-Generalized Linear Models
Next Post: Hypothesis Testing Examples-Quick Overview

Related Posts

  • Methods for Integrating R and Hadoop
    Methods for Integrating R and Hadoop complete Guide R
  • Two-Way ANOVA Example in R
    How to perform a one-sample t-test in R? R
  • How to Label Outliers in Boxplots in ggplot2
    How to Label Outliers in Boxplots in ggplot2? R
  • Error: Can't rename columns that don't exist
    Can’t rename columns that don’t exist R
  • How to Filter Rows In R
    How to Filter Rows In R? R
  • Detecting and Dealing with Outliers
    Detecting and Dealing with Outliers: First Step R

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *




  • About Us
  • Contact
  • Disclaimer
  • Privacy Policy
  • YouTube
  • Twitter
  • Facebook
  • Is Data Science a Dying Profession?
  • How to Label Outliers in Boxplots in ggplot2?
  • Best Books About Data Analytics
  • How to Scale Only Numeric Columns in R
  • Best Books to Learn Statistics for Data Science

Check your inbox or spam folder to confirm your subscription.




 https://www.r-bloggers.com
  • Filtering for Unique Values
    Filtering for Unique Values in R- Using the dplyr R
  • Interactive 3d plot in R
    Interactive 3d plot in R-Quick Guide R
  • How to Create Summary Tables in R
    How to Create Summary Tables in R R
  • How to Count Distinct Values in R
    How to Count Distinct Values in R R
  • Get the first value in each group in R
    Get the first value in each group in R? R
  • How to add columns to a data frame in R
    How to add columns to a data frame in R R
  • Two-Way ANOVA Example in R
    How to perform a one-sample t-test in R? R
  • Arrange Data by Month in R
    Arrange Data by Month in R with example R

Copyright © 2022 Data Science Tutorials.

Powered by PressBook News WordPress theme