Sampling Distribution and Confidence Intervals for the Difference between Two Population Proportions


Independent samples are samples in which there is no correlation between samples.

Ways that independent samples can occur:

  • Random samples are taken separately from two populations, and the same response variable is recorded for each individual.
  • One random sample is taken and a variable is recorded for ecah individual, but then units are categorized as belonging to one population or another (e.g. M/F).
  • Participants randomly assigned to one of two treatment conditions, and the same response variable is recorded for each unit.


If the response variable is categorical, a researcher might look at the difference between the two population proportions:

$$p_1 - p_2$$

We want to do two things:

  • Estimate this difference using a confidence interval.
  • Do a hypothesis test that the two population proportions are the same.


Have you ever driven a car when you probably had too much alcohol to drive safely?

We want to compare men vs women:

  • \(p_1\): population proportion of men who respond yes
  • \(p_2\): popluation proportion of women who respond yes
  • Want to estimate \(p_1 - p_2\). We have \(\hat{p_1} - \hat{p_2}\)

Remember that the standard deviation of a sample proportion is:


Sampling distribution: If two sample proportions are based on independent random samples from the two populations, and if \(np, n(1-p) \geq 10\) for both populations, then:

$$\text{mean}(\hat{p_1} - \hat{p_2}) \approx p_1 - p_2$$

$$\text{s.e.}(\hat{p_1} - \hat{p_2}) \approx \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}$$

$$\hat{p_1} - \hat{p_2} \sim N \left( p_1 - p_2, \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}} \right)$$

CI for the Difference in Population Proportions

You just use the sample estimate, z-multiplier, and standard error.

$$(\hat{p_1} - \hat{p_2}) \pm z^* \text{s.e.}(\hat{p_1} - \hat{p_2})$$

Note that this requires independent random samples from the two populations that are large enough (\(np \geq 10\) and \(n(1-p) \geq 10\))

If the value 0 is not contained in the confidence interval, then you can be 95% confident that there is a significant difference (or whatever confidence level you picked for the z-multiplier)!