Hypothesis Testing in R, A formal statistical test called a hypothesis test is used to confirm or disprove a statistical hypothesis.
The following R hypothesis tests are demonstrated in this course.
- T-test with one sample
- T-Test of two samples
- T-test for paired samples
Each type of test can be run using the R function t.test().
How to Create an Interaction Plot in R? – Data Science Tutorials
one sample t-test
t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95, …)
where:
x, y: The two samples of data.
alternative: The alternative hypothesis of the test.
mu: The true value of the mean.
paired: whether or not to run a paired t-test.
var.equal: Whether to assume that the variances between the samples are equal.
conf.level: The confidence level to use.
The following examples show how to use this function in practice.
Example 1: One-Sample t-test in R
A one-sample t-test is used to determine whether the population’s mean is equal to a given value.
Consider the situation where we wish to determine whether the mean weight of a particular species of turtle is 310 pounds or not. We go out and gather a straightforward random sample of turtles with the weights listed below.
How to Find Unmatched Records in R – Data Science Tutorials
Weights: 301, 305, 312, 315, 318, 319, 310, 318, 305, 313, 305, 305, 305
The following code shows how to perform this one sample t-test in R:
specify a turtle weights vector
weights <- c(301, 305, 312, 315, 318, 319, 310, 318, 305, 313, 305, 305, 305)
Now we can perform a one-sample t-test
t.test(x = weights, mu = 310)
One Sample t-test data: weights t = 0.045145, df = 12, p-value = 0.9647 alternative hypothesis: true mean is not equal to 310 95 percent confidence interval: 306.3644 313.7895 sample estimates: mean of x 310.0769
From the output we can see:
t-test statistic: 045145
degrees of freedom: 12
p-value: 0. 9647
95% confidence interval for true mean: [306.3644, 313.7895]
mean of turtle weights: 310.0769We are unable to reject the null hypothesis since the test’s p-value of 0. 9647 is greater than or equal to.05.
This means that we lack adequate evidence to conclude that this species of turtle’s mean weight is different from 310 pounds.
Example 2: Two Sample t-test in R
To determine whether the means of two populations are equal, a two-sample t-test is employed.
Consider the situation where we want to determine whether the mean weight of two different species of turtles is equal. We gather a straightforward random sample of turtles from each species with the following weights to test this.
ggpairs in R – Data Science Tutorials
Sample 1: 310, 311, 310, 315, 311, 319, 310, 318, 315, 313, 315, 311, 313
Sample 2: 335, 339, 332, 331, 334, 339, 334, 318, 315, 331, 317, 330, 325
The following code shows how to perform this two-sample t-test in R:
Now we can create a vector of turtle weights for each sample
sample1 <- c(310, 311, 310, 315, 311, 319, 310, 318, 315, 313, 315, 311, 313) sample2 <- c(335, 339, 332, 331, 334, 339, 334, 318, 315, 331, 317, 330, 325)
Let’s perform two sample t-tests
Welch Two Sample t-test data: sample1 and sample2 t = -6.7233, df = 15.366, p-value = 6.029e-06 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -21.16313 -10.99071 sample estimates: mean of x mean of y 313.1538 329.2308
We reject the null hypothesis because the test’s p-value (6.029e-06) is smaller than.05.
Accordingly, we have enough data to conclude that the mean weight of the two species is not identical.
Example 3: Paired Samples t-test in R
When each observation in one sample can be paired with an observation in the other sample, a paired samples t-test is used to compare the means of the two samples.
For instance, let’s say we want to determine if a particular training program may help basketball players raise their maximum vertical jump (in inches).
How to create Anatogram plot in R – Data Science Tutorials
We may gather a small, random sample of 12 college basketball players to test this by measuring each player’s maximum vertical jump. Then, after each athlete has used the training regimen for a month, we might take another look at their max vertical leap.
The following information illustrates the maximum jump height (in inches) for each athlete before and after using the training program.
Before: 122, 124, 120, 119, 119, 120, 122, 125, 124, 123, 122, 121
After: 123, 125, 120, 124, 118, 122, 123, 128, 124, 125, 124, 120
The following code shows how to perform this paired samples t-test in R:
Let’s define before and after max jump heights
before <- c(122, 124, 120, 119, 119, 120, 122, 125, 124, 123, 122, 121) after <- c(123, 125, 120, 124, 118, 122, 123, 128, 124, 125, 124, 120)
We can perform paired samples t-test
t.test(x = before, y = after, paired = TRUE) Paired t-test data: before and after t = -2.5289, df = 11, p-value = 0.02803 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.3379151 -0.1620849 sample estimates: mean of the differences -1.25
We reject the null hypothesis since the test’s p-value (0. 02803) is smaller than.05.
Autocorrelation and Partial Autocorrelation in Time Series (datasciencetut.com)
The mean jump height before and after implementing the training program is not equal, thus we have enough data to conclude so.
Nice post!
Did you exclude the R code for the two sample t-test?