Abstract

Abstract This study concerns the general issue of variable subset selection for a multiple regression model with an application to the problem of explaining the spatial variation of heart disease mortality in Northern Ohio, U.S.A. Two variable selection processes are utilized to arrive at the “best” explanatory model of heart disease mortality for this region. The widely used stepwise technique and recently developed optimal regression method are compared according to their ability to select a “best” explanatory model. The optimal regression method proved superior. Relationships uncovered by the variable selection process indicated that in the study area, percent of population which is Black is negatively associated with heart disease mortality rates whereas poor housing, percent of population ≥65 years of age, and percent of population of foreign stock are variables positively related to rates of heart disease mortality in Northern Ohio. Discussion of underlying reasons for relationships uncovered among selected socio-economic variables and heart disease mortality rates provides a frame of reference for more detailed study of the possible socio-economic determinants of heart disease mortality. It is suggested that the poor housing variable is a reasonable surrogate for the “life stress” factor and the percent population of foreign stock may be viewed as a surrogate for dietary habits. Also illustrated are the confounding of further variable selection and differences in interpretation which may arise if the “best” model is not selected.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call