Abstract

Introduction: Machine learning (ML) may enhance prediction of hard cardiovascular disease (CVD) events (e.g., myocardial infarction, stroke, cardiac arrest). Developing ML algorithms using routinely-collected clinical data offers a practical method to improve CVD event prediction. This study aimed to train ML algorithms to predict 10-year risk of hard CVD events using common clinical variables and compare performance to pooled cohort equations (PCEs). Hypothesis: ML algorithms trained on data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Atherosclerotic Risk in Communities (ARIC) would outperform PCEs in predicting 10-year risk of hard CVD events when tested on a PCE-valid subset. Methods: Data from MESA and ARIC were obtained. Pooled baseline data were used for algorithm training. Patients with prior CVD or lacking 10-year follow-up were excluded. The primary endpoint was hard CVD event within 10 years of baseline. Variables of interest included patient history, anthropometry, and serum lipid levels. Cohort data was split into training (75%) and test (25%) sets. Models trained were neural network, gradient boost machine, k-nearest neighbors, and an ensemble model. Results: In total, 17,775 patients (5,703 MESA, 12,072 ARIC) with mean age of 56.2 ( + 8.2) years and 54.6% female were included. During the 10-year follow-up period, 1,344 (7.6%) had a first-time hard CVD event. The ensemble model had the highest AUC on the full test set (0.783 [95% CI 0.76-0.81]), followed by the gradient boosting machine (0.781 [0.76-0.81], neural network (0.780 [0.76-0.81]), and k-nearest neighbors (0.737 [0.71-0.76]). The gradient boosting machine demonstrated highest sensitivity (76.0%), while the ensemble had highest specificity (78.6%). In the PCE-valid subset, the ensemble model had the highest AUC (0.780 [0.75-0.81]) and outperformed PCE (AUC = 0.757 [0.73-0.78], p = 0.001), as did the gradient boosting machine (AUC = 0.779 [0.75-0.81], p = 0.003]) and neural network (AUC = 0.776 [0.75-0.80], p = 0.007). Conclusions: The ensemble, gradient boosting machine, and neural network improved prediction of hard CVD events over PCEs within a PCE-valid subset despite training on a broader cohort. Future studies should validate results on an external dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call