The relationship between coronary heart disease (CHD) and complex urban built environments remains a subject of considerable uncertainty. The development of predictive models via machine learning to explore the underlying mechanisms of this association, as well as the formulation of intervention policies and planning strategies, has emerged as a pivotal area of research. A cross-sectional dataset of hospital admissions for CHD over the course of a year from a hospital in Dalian City, China, was assembled and matched with multi-source built environment data via residential addresses. This study evaluates five machine learning models, including decision tree (DT), random forest (RF), eXtreme gradient boosting (XGBoost), multi-layer perceptron (MLP), and support vector machine (SVM), and compares them with multiple linear regression models. The results show that DT, RF, and XGBoost exhibit superior predictive capabilities, with all R2 values exceeding 0.70. The DT model performed the best, with an R2 value of 0.818, and the best performance was based on metrics such as MAE and MSE. Additionally, using explainable AI techniques, this study reveals the contribution of different built environment factors to CHD and identifies the significant factors influencing CHD in cold regions, ranked as age, Digital Elevation Model (DEM), house price (HP), sky view factor (SVF), and interaction factors. Stratified analyses by age and gender show variations in the influencing factors for different groups: for those under 60 years old, Road Density is the most influential factor; for the 61–70 age group, house price is the top factor; for the 71–80 age group, age is the most significant factor; for those over 81 years old, building height is the leading factor; in males, GDP is the most influential factor; and in females, age is the most influential factor. This study explores the feasibility and performance of machine learning in predicting CHD risk in the built environment of cold regions and provides a comprehensive methodology and workflow for predicting cardiovascular disease risk based on refined neighborhood-level built environment factors, offering scientific support for the construction of sustainable healthy cities.
Read full abstract