STATS 250

More on Linear Regression

The correlation coefficient $$r$$ gives an objective measure of the strength and the direction of a linear relationship between $$x$$ and $$y$$.

$$r$$ ranges between -1 and 1, and $$r$$ is unitless. The sign of $$r$$ is what indicates the direction of the association.

The magnitude of $$r$$ indicates the linearity.

Always use technology to find $$r$$.

$$r^2$$ value

$$r^2$$ is the proportion of the total variability in responses that can be explained by the linear relationship with the explanatory variable $$x$$. This can actually be shown to be the sum of squares model divided by sum of squares total (all variation in $$y$$s minus the error).

We have found that 79.1% of the variation in y can be accounted for by its linear relationship with x.

Some cautions

  • $$r^2$$ does not detect nonlinear relationships – what if it is a sinusoid, but it just looks like some noise?
  • Detecting outliers and their influence on regression results. Outliers impact means and standard deviations – both parameters $$b_0$$ and $$b_1$$ will be affected.
  • Dangers of extrapolation (watering a plant helps! let me pour 20 gallons on it!)
  • Dangers of combining groups inappropriately (Simpson's paradox)
  • Correlation does not prove causation.

The only way to show a caust and effect relationship is with an experiment.