%>%
beaver ggplot(aes(x = activ, y = temp)) +
geom_boxplot(width = 0.1) +
xlab('Activity') +
ylab('Body temperature (°C)')
5 Two-sample t-test
This is a test used to analyze the difference between the means of two treatments/groups where the samples are not paired. The two treatments are applied to separate, independent samples.
Here is an example of how to conduct a two-sample t-test using a dataset on the body temperatures of active and inactive beavers.
5.1 Visualizing data
We could visualize our data using a boxplot since we’re trying to see the association between a categorical and a numerical value.
It seems likely that active beavers have a higher mean body temperature than inactive beavers, but we can verify this using the two-sample t-test.
5.2 Hypotheses
We can state the hypotheses for the test as follows:
- \(H_0\): There is no difference between the mean body temperature of active and inactive beavers.
- \(H_A\): The mean body temperature of active beavers is greater than that of inactive beavers.
We can let \(\mu_1\) represent the mean body temperature of active beavers and \(\mu_2\) represent the mean body temperature of inactive beavers, and rewrite our hypotheses in symbolic format:
- \(H_0: \mu_1 - \mu_2 = 0\)
- \(H_A: \mu_1 - \mu_2 > 0\)
5.3 Checking assumptions
The first two assumptions of this test are the same as the two-sample t-test. The additional assumption is listed below.
- Both samples represent random samples obtained from their populations.
- The variable in question is normally distributed in each population.
- The variance of the variable is the same in each population.
For our purposes, the normality assumption can be verified using a normal quantile plot:
%>%
beaver ggplot(aes(sample=temp)) +
stat_qq(shape = 1) +
stat_qq_line() +
facet_grid(~ activ) +
xlab('Normal quantile') +
ylab('Body temperature (°C)')
From the qq plot above, both groups seem to be drawn from normal populations. Next, we can also check the equal-variance assumption using the F-test:
var.test(temp ~ activ, data = beaver)
F test to compare two variances
data: temp by activ
F = 1.0841, num df = 61, denom df = 37, p-value = 0.8045
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.5912487 1.9053969
sample estimates:
ratio of variances
1.084081
Since the P-value for the F-test is greater than 0.05, we fail to reject the null hypothesis of equal variance.
5.4 Performing a two-sample t-test
We can use the built-in t.test
method in R to perform this test. If you need help using this command, you can use ?t.test
to view the documentation for this command.
t.test(temp ~ activ, data = beaver,
paired = FALSE, var.equal = TRUE, alternative = 'greater',
conf.level = 0.95)
Two Sample t-test
data: temp by activ
t = 18.367, df = 98, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Active and group Inactive is greater than 0
95 percent confidence interval:
0.7333337 Inf
sample estimates:
mean in group Active mean in group Inactive
37.90306 37.09684
Since the test produced a P-value less than 0.05 (\(\alpha\)), we can reject the null hypothesis.