Abstract
Introduction: Little is known about the performance benefit of combining these data with publicly available environmental risk factors for CVD prediction models. Hypothesis: We aimed to test whether routinely collected clinical data (age, sex, systolic blood pressure, total cholesterol, cigarette smoking) in combination with data on environmental risk factors could improve the performance of CVD prediction using an explainable machine learning (ML) technique. Methods: We identified individuals without previous history of CVD from the National Health Insurance Service, 2002-2015 in the Republic of Korea, linked to publicly available data on annual average of fine particulate matter (PM 10 ) exposure and urban green space coverage according to the administrative code for residence in each individual. Random forest (RF) model and SHapley Additive exPlanations (SHAP) value were used for prediction for newly diagnosed CVD and explanation for contribution of each component to the model output, respectively. Performance of the prediction models were evaluated using area under the curve (AUC). Results: Among 151,936 individuals included, there were 2,837 subsequent CVD events. The AUCs for the RF model with the clinical data only and the model with environmental risk factors (high annual average PM 10 and low UGS coverage) in addition to the clinical data were 0.731 and 0.733, respectively. The SHAP values showed that adding these environmental risk factors did not have significant impact on the model output (SHAP summary plot). However, the clinical data comprised of traditional CVD risk factors had notable contribution to the performance of prediction model. Conclusions: This study showed that adding publicly available data on environmental risk factors to the routinely collected clinical data had only marginal improvement in the prediction of CVD outcomes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.