STATS 250

Regression Analysis

Describing and assessing the significance of relationships between variables is quite important in research. We will first learn to do this in the case of two quantitative variables. We still study the material on descriptive and inferential regression together, merging them into one overall discussion of these ideas.

We wish to study the linear relationship between two quantitative variables. One variable is the response variable ($$y$$, or the dependent variable), and one is the explanatory variable ($$x$$). This is also called the independent variable.

First, we make a scatterplot to display the relationship. Then, you look for an overall pattern, and you see if there are any departures from that pattern. If there appears to be a linear relationship, what is that relationship?

You are solving for two variables:

  • $$b_0$$, the y-intercept or value of $$\hat{y}$$ at $$x = 0$$.
  • $$b_1$$, the expected increase of $$y$$ when $$x$$ increases by 1.

To minimize this line, you have an objective error function, which is typically sum of square error. You are trying to minimize $$y - \hat{y}$$ for all points on the plot. This is called the residual error. This is the least squares regression line.

The equations for the estimated slope and intercept are:

$$b1 = \frac{\sum (x - \bar{x})(y - \bar{y})}{\sum (x - \bar{x})^2} = \frac{\sum (x - \bar{x})y}{\sum (x - \bar{x})^2} = \frac{S{xy}}{S_{xx}} = r \cdot \frac{S_y}{S_x}$$

$$b_0 = \bar{y} - b_1 \bar{x}$$