STATS 250

Sampling Distribution and Confidence Intervals for the Difference between Two Population Means

To recap, if two populations are normally distributed, then \(\bar{x_1} - \bar{x_2}\) is approximately:

$$N(\mu_1 - \mu_2, \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$$

So, what's the confidence interval for a difference in population means?

$$(\bar{x}_1 - \bar{x}_2) \pm t^*(\text{s.e.}\(\bar{x}_1 - \bar{x}_2\))$$

The null value for this difference is most commonly 0.

  • Parameter: \(\mu_1 - \mu_2\)
  • Estimate: \(\bar{x_1} - \bar{x_2}\)
  • Standard error: \(\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\)

The interval requires that you have independent random samples from normal populations. If the sample sizes are large (both > 30), the assumption of normality is not so crucial and the result is approximate.

Variances

If population variances are equal, use pooled test. This is the null hypothesis:

$$H_0: \sigma_1^2 = \sigma_2^2 = \sigma_2$$

Levene's test can be used to assess this. If you have a small p-value (using the 10% level), then the pooled procedure should NOT be used.

Pooled or Unpooled?

  • If sample standard deviations are similar, then it seems reasonable that we can pool. If they were really the same, then it's easier to use the pooled version because the df is simpler. How similar is similar enough? That's when we rely on Levene's test.
  • Otherwise, used unpooled test.