Abstract

SUMMARY A new variables selection criterion is presented. It is based on the Wald test statistic and is defined by Tp = Wp- K + 2p where K and p are the numbers of parameters in the full and submodel respectively, and Wp is theWald statistic for testing whether the coefficients of the variables not in the submodel are 0.'Good' submodels will have Tp-values that are close to or smaller than p, and, as with Mallows's Cp, they will be selected by graphical rather than stepwise methods. We first consider an application to the linear regression of the heat evolved in a cement mix on four explanatory variables; we use robust methods and obtain the same results as those from the more computer-intensive methods of Ronchetti and Staudte. Our later applications are to previously published data sets which use logistic regression to predict participation in the US federal food stamp program, myocardial infarction and prostatic cancer. The first data set was shown in previous analysis to contain an outlier and is considered for illustration. In the last two data sets our criterion applied to the maximum likelihood estimates selects the same model as do previously published stepwise analyses. However, for the food stamp data set, the application of our criterion using the robust logistic regression estimates of Carroll and Pederson suggests more parsimonious models than those arising from the likelihood analysis, and further suggests that interactions previously regarded as important may be due to outliers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call