Abstract

Recent reviews have dealt with the subject of which variables to select and which to discard in multiple regression problems. Lindley (1968) emphasized that the method to be employed in any analysis should be related to the use intended for the finally fitted regression. In the report by Beale et al. (1967), the emphasis is on selecting the best subset for any specified number of retained independent variables. Here we will be concerned with pointing out the advantages of the variable selection scheme in which independent variables are successively discarded one at a time from the original full set. While these advantages are not unknown to workers in this field, they are however not appreciated by the statistical community in general. For the purposes of this demonstration it is assumed that we are in the nonsingular case so that the number of observations exceeds the number of regressor variables. Let us begin by considering economy of effort. Suppose that we were using a step-up regression procedure, ignoring for the while its theoretical deficiencies (to be discussed later). We should then first fit k simple regressions, one for each of the k regressor variables considered, selecting the single most significant individual regressor variable. Having made this selection we would proceed with k - 1 additional fits to determine which of the remaining variables in conjunction with the first selected yielded the greatest reduction in residual variation. This process is continued on so as to provide a successive selection and ordering of variables. We may even require the ordering of all k variables, leaving for later decision what critical juncture is to be employed in determining which of the k variables to retain, which to reject-if we do so we shall have made a total of k(k + 1)/2 fits, albeit they may have differed greatly in their degree of complexity. A complete stepdown regression procedure however requires but k fits, as will now be indicated. Suppose we have done a multiple regression on all k variables and wish to consider the k possible multiple regressions on all sets of k - 1 variables, that is where 1 variable has been deleted. The results for these k possible multiple regressions are implicit in the initial k-variable regression, provided we have secured the inverse matrix, or at least its diagonal, necessary for testing the significance of the fitted partial regression coefficients. The case

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call