In the past few lectures, we were looking at the relationship between two quantitative variables and in particular, the linear relationship between quantitative variables.
Now we're looking at categorical variables. There are three basic tests:
All three tests are based on a $$\chi^2$$ test statistic. If $$H_0$$ is true, and assumptions hold, $$\chi^2$$ follows a chi-square distribution.
In a chi-square distribution with df = degrees of freedom, then:
We're trying to compare "does this seem to fit". If we're going to fit a certain model, we expect certain numbers of counts of each category. We have observed counts. How do these observed counts compare to what we expected?
We look at these differences from an expected number of 0, square it to make it all positive, and rescale it.
This is used to assess if a particular discrete model is a good fitting model for a discrete characteristic, based on a random sample from the population.
Let's say you have four toll booths, and you see 100 cars come through total. You would expect each to get 25 cars through each. Let's say instead you had:
$$(26, 20, 28, 26)$$
The chi square value for this is:
$$\chi^2 = \sum_{\text{all cells}} \frac{\text{(Observed - Expected)}^2}{\text{Expected}} = \frac{(26 - 25)^2}{25} + \frac{(20 - 25)^2}{25} + \frac{(28 - 25)^2}{25} + \frac{(26 - 25)^2}{25} = 1.44$$
This means you have an observed value of 1.44, with df of k - 1 = categories - 1 = 3.