Introduction: We investigated the predictive power of a machine learning (ML) algorithm for 5-year mortality in patients with atherosclerotic cardiovascular disease (ASCVD). Candidate features included clinical variables from the electronic health record (EHR) as well as self-reported health and lifestyle data. Methods: The study cohort included 4,689 patients (66±11 years, 62% male) with ASCVD enrolled in the Vascular Disease Biorepository, which recruits patients referred to noninvasive vascular evaluation at Mayo Clinic, Rochester, MN. Clinical variables including comorbidities, lab measurements, and medication use were obtained from the EHR using validated electronic phenotyping algorithms. Information regarding social habits, physical activity levels, and personal health was obtained from a survey filled by the participants. Mortality was ascertained from the national death index. Clinical variables (n=52) and self-reported variables related to health and lifestyle habits (n=48) were added as individual features to reinforcement learning trees (RLT), a sparse decision tree algorithm, to construct the prediction model. We assessed the impact of each feature on prediction accuracy by applying feature selection and computing variable importance (VI). Results: Features with trivial VI that made no contribution to prediction were excluded. A model with 15 features had an area under the ROC curve (AUC) of 0.82 [95% CI 0.79 - 0.83] in the test set. Among the top 10 predictors of 5-year mortality, 6 were self-reported health variables (Table) , including outlook on personal health, limitations on routine daily activities, and physical activity levels. Conclusions: A ML approach had robust discrimination (AUC 0.82) in predicting 5-year mortality in patients with ASCVD. Self-reported health variables were among the top predictive features in the final model.
Read full abstract