Abstract

Abstract Background Patient (pt) risk stratification for cardiovascular (CV) events is key to determine the appropriate approach to secondary prevention. Several risk scores have been developed (SCORE2, EUROASPIRE, Thrombolysis in Myocardial Infarction [TIMI] risk score for secondary prevention [TRS 2°P]); these are based on few predictors and have low discriminatory power. Machine learning methods using real-word data (RWD) from electronic health records (EHR) databases and capable of handling a multitude of commonly available variables can serve as powerful tools to improve the discriminatory power and generate efficient and clinically relevant risk scores. Purpose This study used a non-linear machine learning model to build an improved risk score for secondary CV events from a large Israeli EHR database. Methods Records of pts with a first record of atherosclerotic CV disease (ASCVD) admission or coronary procedure and available follow-up data (mean: 1052 days) from the period of 01/2005–11/2020 were considered (n=77432 [67% coronary artery disease, 21% carotid artery disease, 12% peripheral artery disease], a stratified 1:1 randomization by cause of admission yielding in training set [n=38716] and test set [n=38716]). The model’s performance was assessed by comparison with TRS 2°P score on the test set independent of the training set. Twenty-six covariates commonly used in clinical routine and widely available in EHR have been selected and a gradient boosting trees algorithm was applied to the pt data at baseline after the original event to predict recurrent myocardial infarction (MI). The ability of the model to discriminate pts was evaluated using the concordance index (c-index), which expresses how well a predicted risk score describes the observed events (1 implies perfect concordance, 0.5 implies random concordance). Results Prediction accuracy of our model showed a c-index for a future MI of 0.74 (95% confidence interval [CI]: 0.72, 0.76) which is significantly higher than 0.57 (95% CI: 0.56, 0.58) obtained with TRS 2°P. Our 2-tier risk score has successfully discriminated a low-risk group (74% of cohort) with a 3-year MI prevalence of 5% from a high-risk group (26% of cohort) that had a 5-fold higher prevalence (26.6%) (Figure 1). In comparison, the 3-tier TRS 2°P yielded an unbalanced distribution of group sizes with a disproportionate intermediate risk group (22% low, 67% intermediate, 11% of high-risk) with a limited clinical discriminatory power (7.1%, 10,2% and 15.6% of observed events, respectively) (Figure 2). Conclusions Our machine learning based risk score model created on a large set of RWD with a long follow-up seems to offer substantial gain of performance compared to the TRS 2°P. The present 2-tier model shows significant gain of accuracy and clinical discriminatory power paving the way for more individualized prevention strategies.Figure 1Figure 2

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call