Sampling Distribution and Confidence Intervals for a Population Mean Difference

Paired Data Scenario or Matched Pairs

When you have two measurements on the same object or scenario. This can occur if:

  • Each person/unit is measured twice. Two measurements of the same characteristic made under different conditions. Example: measuring a quantitiative response both before and after treatment.
  • Similar individuals/units paired prior to experiment. Each member of a pair receives a different treatment. Same (quantitative) response variable is measured for all individuals.

Consider one population of all possible differences. The parameter is \(\mu_d\), the mean of the population differences. The after minus before, or before minus after. You take this mean from a set of sample differences, \(d_1, d_2, \dots d_n\).

  • \(\mu_d\) = the mean for population of interest
  • \(\sigma_d\) = standard deviation for the population of interest
  • \(\bar{d}\) = sample mean for random sample of size \(n\).

This sample mean \(\bar{d}\) has a normal distriubtion.

If the population of differences has a normal distriubtion, and a random sample of any size is obtained, then the distribution of the mean difference has a normal distribution. If the population of differences does NOT have a normal distribution, but a large random sample of size n is obtained, then the distribution of the sample mean difference \(\bar{d}\) is approximately normal, with a mean of \(\mu_d\) and SD of:

$$\text{s.d.}(\bar{d}) = \frac{\sigma_d}{\sqrt{n}}$$

Notes about Distribution and Sample Mean Difference

  • Arbitrary level for "large sample size" = 30.
  • Standard deviation of \(\bar{d}\) is a measure of accuracy of the process of using sample mean difference to estimate population mean difference.
  • In practice, \(\sigma_d\) is rarely known, so we use the sample error instead.

Confidence Intervals for Population Mean Difference

Use the sample mean difference and standard error to produce a range of reasonable values for the population mean difference:

$$\bar{d} \pm t^* \text{s.e.}(\bar{d})$$

The \(t^*\) is used because you are looking at means and you don't know population standard deviation. \(n-1\) degrees of freedom are used, where \(n\) is your sample size.

You can use this to make a standardized test statistic for testing hypotheses:

$$\frac{\text{Sample statistic} - \text{Null value}}{\text{Null standard error}}$$

This test statistic is a \(t\) statistic for the same reason as above. This has \(n-1\) degrees of freedom.