Abstract

The Statistical Analysis System (SAS) is one of the more widely used statistical software packages. Two of the procedures available in SAS are the REGression PROCedure, and the General Linear Models PROCedure. These are executed by the statements PROC REG; and PROC respectively. Each procedure instructs SAS to estimate the regression equation specified in the MODEL statement following it. In addition to specifying the dependent and the independent variables of the model, users may utilize a MODEL statement to list certain options. Some of these options control the extent of detail shown in the printouts; others are specific to the estimation procedure. In either case, such options essentially override the SAS default conventions-namely, those that are operative when no options are specified in the MODEL statement. We contend that when SAS is executing a regression using these procedures, SAS does not necessarily cross-check all of the options listed in the MODEL statement against either the equation specified or the data set provided for the regression. As a result, computation by SAS of some regressionrelated (aggregate) statistics is carried out independently of the specification of the actual equation, with SAS relying on the options alone in determining which methods or formulas to use. It is therefore possible to obtain differing regression summary statistics (such as the coefficient of determination, R2, and the F statistic) for the same equation specified (by proper use of the SAS modeling options) in two equivalent ways. The problem for the unwary user is that the results may have misleading implications for the overall statistical significance of the model. Computer programs are not mind readers. No comprehensive system such as SAS can foresee all eventualities. Thus it is suggested that SAS users' manuals include a warning and alert users that these two procedures do not always cross-check for all the options listed in the MODEL statement. The objective of this article is to demonstrate this problem and, in doing so, establish where the problem arises. For this purpose, we run three regressions on the same data base and compare and analyze the estimation results. Since PROC REG; and PROC GLM; give the same results, the discussion is presented in terms of PROC REG;.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call