ABSTRACT Previous studies have utilized regression models to investigate the impact of environmental factors on physical activity. However, such approaches are inadequate for data-driven analysis seeking to identify robust associations from the intricate and multi-variable interactions between physical activity and environmental factors. With the emergence of the concept of the exposome, which encompasses the totality of exposures, this paper explores machine learning models for predicting the percentage of physical inactivity in U.S. counties, while considering 28 social-, economic-, and physical-environmental factors. The aim of this study is to address the research gap and gain insight into the complex associations between environmental exposures and physical activity. Five machine learning models were tested, and the performances were compared to select the best classifier for further investigation. This study used data from the Behavioral Risk Factor Surveillance System (BRFSS) of the Centers for Disease Control and Prevention. The mean population of all counties was 102,841, and the mean percentage of population below 18 years was 22.3%. The partial dependence plot analysis indicated that only one feature – bachelor’s degree – exhibited a close-to-linear relationship with physical inactivity. Motor-vehicle crash death rate and mean temperature showed nonlinear and non-monotonic relationships with the predicted percentage of physical inactivity.
Read full abstract