Abstract

Abstract Background Coronary artery disease (CAD) and peripheral arterial disease (PAD) represent a significant burden in patients with cardiovascular disease (CVD). However, current antithrombotic therapies are inconsistently used because of clinical concerns about selecting patients with optimal benefit versus risk. Purpose We aimed to determine whether complex data engineering, multi-layered machine learning and artificial intelligence could be used to derive and test an improved risk stratification tool to support clinical decision making among CAD/PAD patients at risk of thrombotic events. Methods We conducted a retrospective cohort study using claims data from the US Optum Clinformatics dataset, derived from Jan 1, 2010 to Dec 31, 2018. Predictive modeling analytics, including multi-label logistic regression, cox-regression and machine learning techniques (Random Forests and Gradient Boosting Machines) were used to calculate individual patient risks for Major Adverse Cardiac Events (MACE), Major Adverse Limb Events (MALE) and Major Bleeding (MB) events for 1-year and 5-year predictions. To translate findings from black box models into interpretable results, explainable artificial intelligence (AI) techniques based on Shapley additive explanations (SHAP) were applied. Results The study cohort consisted of 1,842 million patients with a diagnosis of CAD (mean age: 71±11) and 1,557 million patients with a diagnosis of PAD (mean age: 74±10). 1,017 million patients had a diagnosis of both. Annual event rates for MACE, MALE and Major Bleeding events were 3.5%, 0.7% and 1.6%. 865 variables were extracted from the patient history, including demographics, diagnoses, procedures and prescriptions. For all outcomes, the gradient boosting machines outperformed all other statistical models, including risk scores like CHA2DS2VASc and REACH. The AUCs for the 1-year prediction were 0.75 for MACE, 0.88 for MALE and 0.76 for MB, respectively. For a standard risk score-based approach AUCs were 0.67 for both MACE and Major Bleeding, there was no risk score available for MALE. Linear regression models using all 865 variables showed slightly inferior results compared to the machine learning methods (AUCs: 0.74 for MACE, 0.84 for MALE and 0.75 for MB, respectively). Conclusion Our study showed that in a very heterogeneous patient population, machine learning techniques can outperform existing risk scores as well as linear models when assessing individual patient risks. Machine learning and artificial intelligence can be used to shape and evolve evidence generation for personalized healthcare and digital health solutions and support HCPs and payers in understanding most relevant prognostic factors that are associated with patient outcomes. Figure 1 Funding Acknowledgement Type of funding source: Private company. Main funding source(s): Bayer AG

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call