STATS 250

Introduction to the Analysis of Variance

Analysis of variance is a way to compare means of two or more normal populations based on independent random samples when population variances are assumed to be equal.

This is called ANOVA.

This is funny. We want to know about means, but we have to look at variance. This is an extension of the two independent samples pooled t-test:

$$H_0 = \mu_1 = \mu_2$$

This gives you a t-statistic with $$n_1 + n_2 - 2$$ degrees of freedom.

ANOVA

One-way ANOVA expands that by having $$k$$ populations. Let's say that population 1 has a mean of $$\mu_1$$ and a standard deviation of $$\sigma_1$$. This is a random sample of size $$n_1$$. With population 2, all parameters except for the standard deviation are different.

This assumes that each is a random sample, all are independent of one another, all populations have normal distribution, and all populations have equal variances.

Our hypothesis, now, is an extension of the two independent samples t-test:

$$H_0: \mu_1 = \mu_2 = \mu-3 = \dots = \mu_3$$

$$H_\text{a} = \text{At least one mean is different}$$

ANOVA Hypotheses

It seems a little weird that you are testing for equality of means, but you're testing for variance... why is this?

The answer is that we're comparing two estimators for $$\sigma^2$$, the common population variance.

The mean square between groups is one of these estimators, but only if the null hypothesis is true. If the null hypothesis is not true, then the estimate tends to be too big.

There is also the MSE, the mean square error, and the error is the actual variability. This is a good and unbiased estimator of the common population of $$\sigma^2$$. You expect the MSE to estimate $$\sigma^2$$ well.

These two esimates are used to form the $$F$$ statistic:

$$F = \frac{\text{Variation among sample means}}{\text{Natural variation within groups}} = \frac{\text{MS groups}}{\text{MSE}}$$

If $$H_0$$ is true, then $$F$$ should be close to 1. If this $$F$$ ratio is too big, then you reject the null hypothesis.

How big is too big?

$$F$$ Statistic Qualities

If you have means far apart, the $$F$$ statistic will be pretty big. If your population variances within each population are large, then $$F$$ will be small.

Computing the $$F$$ Test Statistic

Step 1

Calculate the mean and variance of each sample: $$\bar{x_i}, s_i^2$$

Step 2

Calculate the overall sample mean (using all $$N$$ observations): $$\bar{x}$$

Step 3

Calculate the sum of squares between groups:

$$\text{SS groups} = \sum_{\text{groups}} n_i(\bar{x_i} - \bar{x})^2$$

Step 4

Calculate the sum of squares within groups (due to error):

$$\text{SSE} = \sum_{\text{groups}} (n_i - 1) s_i^2$$

Final step

$$\text{SS Total} = \text{SS Groups} + \text{SSE}$$

$$F$$ Distribution

If $$H_0$$ is true, then $$F$$ has an $$F(k-1, N-k)$$ distribution. The $$F$$ distribution has two different degrees of freedom associated with it.