STATS 250

Inference for Linear Regression

Review

The correlation coefficient $$r$$ is a measure of the strength and the direction of the linear relationship between two quantitative variables.

Properties:

Ranges from -1 to 1
Sign indicates direction
Magnitude indicates strength
Only measures the strength of the linear relationship between two quantitative variables.

$$r^2$$ is the proportion of total variability in responses that can be explained by the linear relationship with the explanatory variable $$x$$. Basically, how good of a model is it?

You end up getting a relation of the form:

$$E(Y) = \beta_0 + \beta_1 x + \epsilon$$

Where $$\beta_0$$ is the intercept of the straight line in the population, and $$\beta_1$$ is the slope of the straight line in the population. Note that if the slope is equal to zero, there is no linear relationship in the population. $$\epsilon$$ is referred to as error, and is just randomness or noise. Note that we cannot see this true error - instead, we see residuals.

For each $$x$$, the population of $$y$$ values are normally distributed with some mean (may depend on $$x$$ in a linear way), and a standard deviation $$\sigma$$ that does not depend on $$x$$.

Some goals:

Estimate regression line based on data. Done that already.
Measure strength of the linear relationship. Done.
Make predictions. Done.

Now, new stuff:

See if the linear relationship is stastistically significant.
Provide confidence intervals for our predictions.
Check the assumptions of our model.

Estimating Standard Deviation

You can measure the average size of the residuals (observed errors).

$$s = \sqrt{\frac{\text{sum of squared residuals}}{n-2}}$$

Significance Testing

$$H_0: \beta_1 = 0$$

$$H_\text{a}: \beta_1 \ne 0$$

In other words, does $$x$$ tell you something about $$y$$? One way would be to use a t-statistic:

$$t = \frac{\text{value - null value}}{\text{standard error}} = \frac{b_1 - 0}{\text{s.e.}(b_1)}$$

and

$$SE(b_1) = \frac{s}{\sqrt{\sum (x - \bar{x})^2}}$$

and degrees of freedom are $$t - 2$$. You really never have to do this yourself. Just use technology.

Confidence Intervals

This is the formula:

$$b_1 \pm t^* \text{s.e.}(b_1), \text{df} = n - 2$$