STATS 250

Sampling Distribution and Confidence Intervals for the Difference between Two Population Proportions

Definition

Independent samples are samples in which there is no correlation between samples.

Ways that independent samples can occur:

  • Random samples are taken separately from two populations, and the same response variable is recorded for each individual.
  • One random sample is taken and a variable is recorded for ecah individual, but then units are categorized as belonging to one population or another (e.g. M/F).
  • Participants randomly assigned to one of two treatment conditions, and the same response variable is recorded for each unit.

Differences

If the response variable is categorical, a researcher might look at the difference between the two population proportions:

$$p_1 - p_2$$

We want to do two things:

  • Estimate this difference using a confidence interval.
  • Do a hypothesis test that the two population proportions are the same.

Example

Have you ever driven a car when you probably had too much alcohol to drive safely?

We want to compare men vs women:

  • \(p_1\): population proportion of men who respond yes
  • \(p_2\): popluation proportion of women who respond yes
  • Want to estimate \(p_1 - p_2\). We have \(\hat{p_1} - \hat{p_2}\)

Remember that the standard deviation of a sample proportion is:

$$\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

Sampling distribution: If two sample proportions are based on independent random samples from the two populations, and if \(np, n(1-p) \geq 10\) for both populations, then:

$$\text{mean}(\hat{p_1} - \hat{p_2}) \approx p_1 - p_2$$

$$\text{s.e.}(\hat{p_1} - \hat{p_2}) \approx \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}$$

$$\hat{p_1} - \hat{p_2} \sim N \left( p_1 - p_2, \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}} \right)$$

CI for the Difference in Population Proportions

You just use the sample estimate, z-multiplier, and standard error.

$$(\hat{p_1} - \hat{p_2}) \pm z^* \text{s.e.}(\hat{p_1} - \hat{p_2})$$

Note that this requires independent random samples from the two populations that are large enough (\(np \geq 10\) and \(n(1-p) \geq 10\))

If the value 0 is not contained in the confidence interval, then you can be 95% confident that there is a significant difference (or whatever confidence level you picked for the z-multiplier)!