Assumptions Behind the Linear Regression Model

Phillip E Pfeifer

doi:10.2139/ssrn.1584516

Abstract

In a previous note, “Introduction to Least-Squares Modeling” (UVA-QA-0500), we have seen how least squares can be used to fit the simple linear model to historical data. The resulting model can then be used to forecast the next occurrence of Y, the dependent variable, for a given value of X, the independent variable. This use of least squares to fit a forecasting model requires no assumptions. It can be applied to almost any situation, and a reasonable forecast results. At this level of analysis, least-squares modeling is equivalent simply to fitting a straight line through a cloud of points and interpolating or extrapolating for a new value of Y for a given X using the fitted line. Excerpt UVA-QA-0271 ASSUMPTIONS BEHIND THE LINEAR REGRESSION MODEL In a previous note, “Introduction to Least-Squares Modeling” (UVA-QA-0500), we have seen how least squares can be used to fit the simple linear model to historical data. The resulting model can then be used to forecast the next occurrence of Y, the dependent variable, for a given value of X, the independent variable. This use of least squares to fit a forecasting model requires no assumptions. It can be applied to almost any situation, and a reasonable forecast results. At this level of analysis, least-squares modeling is equivalent simply to fitting a straight line through a cloud of points and interpolating or extrapolating for a new value of Y for a given X using the fitted line. Although we need not make any assumptions to use this procedure, we leave an important question unanswered: How close can we expect the new Y to be to our forecast? Without some additional assumptions, we have no way of making a probability statement about the new Y. In many practical business situations, such a probability statement is an essential element in the decision-making process. There is a procedure for measuring the uncertainty associated with a least-squares forecast that will produce a complete probability distribution for a new Y. This procedure brings real value and legitimacy to the regression-modeling and forecasting process, changing it from a simple process—one step above graph paper and a ruler—to one that intelligently combines managerial judgment and statistical theory to produce believable point and interval forecasts. That's the good news. The inevitable bad news is that in order to make probability statements about a new Y using a least-squares regression model, a variety of assumptions must be made. In other words, probability statements made using linear regression theory are true only if certain assumptions hold. You can thus see the importance of (1) understanding these assumptions, (2) knowing how to check their validity, (3) understanding the consequences of an incorrect assumption, and (4) knowing what can be done if the assumptions do not hold. This note addresses each of these four points for the four general assumptions behind linear regression. The model must be checked for (1) linearity (2) independence (3) homoskedasticity and (4) normality. . . .

Full Text