STATS 250

Sampling Distribution and Confidence Intervals for a Population Mean Difference

Paired Data Scenario or Matched Pairs

When you have two measurements on the same object or scenario. This can occur if:

Each person/unit is measured twice. Two measurements of the same characteristic made under different conditions. Example: measuring a quantitiative response both before and after treatment.
Similar individuals/units paired prior to experiment. Each member of a pair receives a different treatment. Same (quantitative) response variable is measured for all individuals.

Consider one population of all possible differences. The parameter is $\mu_d$, the mean of the population differences. The after minus before, or before minus after. You take this mean from a set of sample differences, $d_1, d_2, \dots d_n$.

$\mu_d$ = the mean for population of interest
$\sigma_d$ = standard deviation for the population of interest
$\bar{d}$ = sample mean for random sample of size $n$.

This sample mean $\bar{d}$ has a normal distriubtion.

If the population of differences has a normal distriubtion, and a random sample of any size is obtained, then the distribution of the mean difference has a normal distribution. If the population of differences does NOT have a normal distribution, but a large random sample of size n is obtained, then the distribution of the sample mean difference $\bar{d}$ is approximately normal, with a mean of $\mu_d$ and SD of:

$$\text{s.d.}(\bar{d}) = \frac{\sigma_d}{\sqrt{n}}$$

Notes about Distribution and Sample Mean Difference

Arbitrary level for "large sample size" = 30.
Standard deviation of $\bar{d}$ is a measure of accuracy of the process of using sample mean difference to estimate population mean difference.
In practice, $\sigma_d$ is rarely known, so we use the sample error instead.

Confidence Intervals for Population Mean Difference

Use the sample mean difference and standard error to produce a range of reasonable values for the population mean difference:

$$\bar{d} \pm t^* \text{s.e.}(\bar{d})$$

The $t^*$ is used because you are looking at means and you don't know population standard deviation. $n-1$ degrees of freedom are used, where $n$ is your sample size.

You can use this to make a standardized test statistic for testing hypotheses:

$$\frac{\text{Sample statistic} - \text{Null value}}{\text{Null standard error}}$$

This test statistic is a $t$ statistic for the same reason as above. This has $n-1$ degrees of freedom.