STATS 250

Inference for Linear Regression

Review

The correlation coefficient $$r$$ is a measure of the strength and the direction of the linear relationship between two quantitative variables.

Properties:

  • Ranges from -1 to 1
  • Sign indicates direction
  • Magnitude indicates strength
  • Only measures the strength of the linear relationship between two quantitative variables.

$$r^2$$ is the proportion of total variability in responses that can be explained by the linear relationship with the explanatory variable $$x$$. Basically, how good of a model is it?

You end up getting a relation of the form:

$$E(Y) = \beta_0 + \beta_1 x + \epsilon$$

Where $$\beta_0$$ is the intercept of the straight line in the population, and $$\beta_1$$ is the slope of the straight line in the population. Note that if the slope is equal to zero, there is no linear relationship in the population. $$\epsilon$$ is referred to as error, and is just randomness or noise. Note that we cannot see this true error - instead, we see residuals.

For each $$x$$, the population of $$y$$ values are normally distributed with some mean (may depend on $$x$$ in a linear way), and a standard deviation $$\sigma$$ that does not depend on $$x$$.

Some goals:

  • Estimate regression line based on data. Done that already.
  • Measure strength of the linear relationship. Done.
  • Make predictions. Done.

Now, new stuff:

  • See if the linear relationship is stastistically significant.
  • Provide confidence intervals for our predictions.
  • Check the assumptions of our model.

Estimating Standard Deviation

You can measure the average size of the residuals (observed errors).

$$s = \sqrt{\frac{\text{sum of squared residuals}}{n-2}}$$

Significance Testing

$$H_0: \beta_1 = 0$$

$$H_\text{a}: \beta_1 \ne 0$$

In other words, does $$x$$ tell you something about $$y$$? One way would be to use a t-statistic:

$$t = \frac{\text{value - null value}}{\text{standard error}} = \frac{b_1 - 0}{\text{s.e.}(b_1)}$$

and

$$SE(b_1) = \frac{s}{\sqrt{\sum (x - \bar{x})^2}}$$

and degrees of freedom are $$t - 2$$. You really never have to do this yourself. Just use technology.

Confidence Intervals

This is the formula:

$$b_1 \pm t^* \text{s.e.}(b_1), \text{df} = n - 2$$