Hypothesis Testing in R

Hypothesis Testing in R, A formal statistical test called a hypothesis test is used to confirm or disprove a statistical hypothesis.

The following R hypothesis tests are demonstrated in this course.

T-test with one sample
T-Test of two samples
T-test for paired samples

Each type of test can be run using the R function t.test().

How to Create an Interaction Plot in R? – Data Science Tutorials

one sample t-test

t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95, …)

where:

x, y: The two samples of data.

alternative: The alternative hypothesis of the test.

mu: The true value of the mean.

paired: whether or not to run a paired t-test.

var.equal: Whether to assume that the variances between the samples are equal.

conf.level: The confidence level to use.

The following examples show how to use this function in practice.

Example 1: One-Sample t-test in R

A one-sample t-test is used to determine whether the population’s mean is equal to a given value.

Consider the situation where we wish to determine whether the mean weight of a particular species of turtle is 310 pounds or not. We go out and gather a straightforward random sample of turtles with the weights listed below.

How to Find Unmatched Records in R – Data Science Tutorials

Weights: 301, 305, 312, 315, 318, 319, 310, 318, 305, 313, 305, 305, 305

The following code shows how to perform this one sample t-test in R:

specify a turtle weights vector

weights <- c(301, 305, 312, 315, 318, 319, 310, 318, 305, 313, 305, 305, 305)

Now we can perform a one-sample t-test

t.test(x = weights, mu = 310)

               One Sample t-test
data:  weights
t = 0.045145, df = 12, p-value = 0.9647
alternative hypothesis: true mean is not equal to 310
95 percent confidence interval:
 306.3644 313.7895
sample estimates:
mean of x
 310.0769

From the output we can see:

t-test statistic: 045145

degrees of freedom: 12

p-value: 0. 9647

95% confidence interval for true mean: [306.3644, 313.7895]

mean of turtle weights: 310.0769We are unable to reject the null hypothesis since the test’s p-value of 0. 9647 is greater than or equal to.05.

This means that we lack adequate evidence to conclude that this species of turtle’s mean weight is different from 310 pounds.

Example 2: Two Sample t-test in R

To determine whether the means of two populations are equal, a two-sample t-test is employed.

Consider the situation where we want to determine whether the mean weight of two different species of turtles is equal. We gather a straightforward random sample of turtles from each species with the following weights to test this.

ggpairs in R – Data Science Tutorials

Sample 1: 310, 311, 310, 315, 311, 319, 310, 318, 315, 313, 315, 311, 313

Sample 2: 335, 339, 332, 331, 334, 339, 334, 318, 315, 331, 317, 330, 325

The following code shows how to perform this two-sample t-test in R:

Now we can create a vector of turtle weights for each sample

sample1 <- c(310, 311, 310, 315, 311, 319, 310, 318, 315, 313, 315, 311, 313)
sample2 <- c(335, 339, 332, 331, 334, 339, 334, 318, 315, 331, 317, 330, 325)

Let’s perform two sample t-tests

Welch Two Sample t-test
data:  sample1 and sample2
t = -6.7233, df = 15.366, p-value = 6.029e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -21.16313 -10.99071
sample estimates:
mean of x mean of y
 313.1538  329.2308

We reject the null hypothesis because the test’s p-value (6.029e-06) is smaller than.05.

Accordingly, we have enough data to conclude that the mean weight of the two species is not identical.

Example 3: Paired Samples t-test in R

When each observation in one sample can be paired with an observation in the other sample, a paired samples t-test is used to compare the means of the two samples.

For instance, let’s say we want to determine if a particular training program may help basketball players raise their maximum vertical jump (in inches).

How to create Anatogram plot in R – Data Science Tutorials

We may gather a small, random sample of 12 college basketball players to test this by measuring each player’s maximum vertical jump. Then, after each athlete has used the training regimen for a month, we might take another look at their max vertical leap.

The following information illustrates the maximum jump height (in inches) for each athlete before and after using the training program.

Before: 122, 124, 120, 119, 119, 120, 122, 125, 124, 123, 122, 121

After: 123, 125, 120, 124, 118, 122, 123, 128, 124, 125, 124, 120

The following code shows how to perform this paired samples t-test in R:

Let’s define before and after max jump heights

before <- c(122, 124, 120, 119, 119, 120, 122, 125, 124, 123, 122, 121)
after <- c(123, 125, 120, 124, 118, 122, 123, 128, 124, 125, 124, 120)

We can perform paired samples t-test

t.test(x = before, y = after, paired = TRUE)
               Paired t-test
data:  before and after
t = -2.5289, df = 11, p-value = 0.02803
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.3379151 -0.1620849
sample estimates:
mean of the differences
                  -1.25

We reject the null hypothesis since the test’s p-value (0. 02803) is smaller than.05.

Autocorrelation and Partial Autocorrelation in Time Series (datasciencetut.com)

The mean jump height before and after implementing the training program is not equal, thus we have enough data to conclude so.