Abstract
SUMMARY Selection of explanatory variables in the regression equation has been a prime problem in constructing a prediction equation. This paper describes and gives an illustration of a selection technique which makes use of the orthogonality among factors extracted from the correlation matrix. Using the factors not as new variables, but merely as the reference frame, we can identify a near orthogonal subset of explanatory variables. It is indicated that this approach provides the model builder with the flexibility that is not available in the conventional, purely mechanical, selection methods. SELECTION of explanatory variables in multiple regression analysis has been a prime problem in the analysis of unplanned data. The interdependency among the explanatory variables makes it difficult to determine empirically the contribution of each independent variable to the observed variation of the dependent variable. Various alternative selection techniques have been proposed, but the criterion employed in each technique seems quite arbitrary and is known to provide different solutions for the same problem (Draper and Smith, 1966, p. 163). Attempts to include more variables in the equation are often frustrated by near singularity of the normal equation system, and make estimates of regression coefficients highly sensitive to small changes in the original data. Resulting equations often contain coefficients with theoretically incorrect signs restricting the use of the equation as a functional relationship explaining the system under study. This paper describes an approach to the problem of selection of variables in regression analysis. This method purports to come up with a prediction equation in accordance with the principle of parsimony in terms of minimum interdependency among variables. The technique can be viewed as a use of the principal components regression proposed by Kendall (1957). In the procedure to be described, however, we make use of orthogonality among components, using the components not as new variables but merely as the reference frame to identify a near orthogonal subset of explanatory variables. Selection of such variables minimizes overlapping of information supplied by explanatory variables in the regression.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have