Abstract Background Patients with cancer have up to a 3-fold higher risk for cardiovascular disease (CVD) than the general population. Consequently, traditional risk scores developed to predict CVD in the general population (i.e., Pooled Cohort Equations (PCE), Predicting Risk of cardiovascular disease EVENTs [PREVENT]) may provide less accurate risk prediction for this population. Purpose Compare cancer-specific Machine Learning (ML)-based score to PCE and PREVENT to predict 10-year CVD risk in patients with breast cancer (BC), colorectal cancer (CRC), lung cancer (LC), or prostate cancer (PC). Methods Patients aged ≥18 years, diagnosed with BC, CRC, LC, or PC between 2005-2012 at a large hybrid academic-community practice in Northeast Ohio, United States, were included. An ML Extreme Gradient Boosting (XGBoost) algorithm, adapted for survival modeling, was developed using a training subset of the cohort (55% train + 20% test) and ranked 40 to 50 covariates (including social determinants of health, cancer treatment) for predicting 10-year CVD risk for each cancer type using SHAP (SHapley Additive exPlanations) values. The top 10 ML predictors were utilized to create a predictive equation using logistic regression models. This equation was then tested in the cohort validation subset (25%) and subsequently compared to the PCE and PREVENT (both simple and enhanced versions) using the area under the curve (AUC) of the time-dependent receiver operating characteristic curve. The CVD events were identified using ICD codes corresponding to Acute Coronary Syndrome, Ischemic Stroke, and Heart Failure. CVD risk was assessed through Kaplan-Meier. Results We included 10,240 patients (Table 1). The actual 10-year CVD events were: BC 21% (95% CI 19-23), CRC 10% (95% CI 8-12), LC 27% (95% CI 24-30), and PC 20% (95% CI 17-23). The ACC/AHA PCE predicted mean 10-year CVD risks of 16% (95% CI 15-18) for BC, 19% (95% CI 17-21) for CRC, 28% (95% CI 25-32) for LC, and 19% (95% CI 19-20) for PC, with corresponding AUCs of 0.75, 0.65, 0.76, and 0.61, respectively. The PREVENT simple version predicted mean 10-year CVD risks of 13% (95% CI 13-14) for BC, 13% (95% CI 12-13) for CRC, 15% (95% CI 14-15) for LC, and 16% (95% CI 12-19) for PC, with AUCs of 0.55, 0.55, 0.55, and 0.53, respectively. The PREVENT enhanced equation predicted mean 10-year CVD risks of 17% (95% CI 17-18) for BC, 15% (95% CI 14-15) for CRC, 15% (95% CI 15-16) for LC, and 17% (95% 16-19) for PC, with AUCs of 0.75, 0.62, 0.79, and 0.63, respectively. The predictive equation derived from the top 10 ML predictors (Table 2) achieved AUCs of 0.84 for BC, 0.76 for CRC, 0.83 for LC, and 0.71 for PC. Conclusion(s) Conventional CVD equations inadequately assess the risk in patients with BC, CRC, LC, or PC, often resulting in underestimation or overestimation. Cancer-specific ML-derived equations show good performance and emphasize the importance of integrating cancer-related covariates for precise prediction.Table 1Table 2
Read full abstract