5 Two-sample t-test

This is a test used to analyze the difference between the means of two treatments/groups where the samples are not paired. The two treatments are applied to separate, independent samples.

Here is an example of how to conduct a two-sample t-test using a dataset on the body temperatures of active and inactive beavers.

5.1 Visualizing data

We could visualize our data using a boxplot since we’re trying to see the association between a categorical and a numerical value.

beaver %>%
  ggplot(aes(x = activ, y = temp)) +
  geom_boxplot(width = 0.1) +
  xlab('Activity') +
  ylab('Body temperature (°C)')

It seems likely that active beavers have a higher mean body temperature than inactive beavers, but we can verify this using the two-sample t-test.

5.2 Hypotheses

We can state the hypotheses for the test as follows:

\(H_0\): There is no difference between the mean body temperature of active and inactive beavers.
\(H_A\): The mean body temperature of active beavers is greater than that of inactive beavers.

We can let \(\mu_1\) represent the mean body temperature of active beavers and \(\mu_2\) represent the mean body temperature of inactive beavers, and rewrite our hypotheses in symbolic format:

\(H_0: \mu_1 - \mu_2 = 0\)
\(H_A: \mu_1 - \mu_2 > 0\)

5.3 Checking assumptions

The first two assumptions of this test are the same as the two-sample t-test. The additional assumption is listed below.

Both samples represent random samples obtained from their populations.
The variable in question is normally distributed in each population.
The variance of the variable is the same in each population.

For our purposes, the normality assumption can be verified using a normal quantile plot:

beaver %>%
  ggplot(aes(sample=temp)) +
  stat_qq(shape = 1) +
  stat_qq_line() +
  facet_grid(~ activ) +
  xlab('Normal quantile') +
  ylab('Body temperature (°C)')

From the qq plot above, both groups seem to be drawn from normal populations. Next, we can also check the equal-variance assumption using the F-test:

var.test(temp ~ activ, data = beaver)


    F test to compare two variances

data:  temp by activ
F = 1.0841, num df = 61, denom df = 37, p-value = 0.8045
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.5912487 1.9053969
sample estimates:
ratio of variances 
          1.084081

Since the P-value for the F-test is greater than 0.05, we fail to reject the null hypothesis of equal variance.

5.4 Performing a two-sample t-test

We can use the built-in t.test method in R to perform this test. If you need help using this command, you can use ?t.test to view the documentation for this command.

t.test(temp ~ activ, data = beaver,
  paired = FALSE, var.equal = TRUE, alternative = 'greater',
  conf.level = 0.95)


    Two Sample t-test

data:  temp by activ
t = 18.367, df = 98, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Active and group Inactive is greater than 0
95 percent confidence interval:
 0.7333337       Inf
sample estimates:
  mean in group Active mean in group Inactive 
              37.90306               37.09684

Since the test produced a P-value less than 0.05 (\(\alpha\)), we can reject the null hypothesis.