Introduction: Patients with Type 2 diabetes mellitus (T2DM) have an increased risk for coronary artery disease (CAD) compared to patients without T2DM. Ventricular arrhythmias (VA), such as ventricular fibrillation and ventricular tachycardia, are the major causes of mortality among patients with CAD. Thus, T2DM patients with CAD, especially older adults, have a higher risk of VA compared to patients without T2DM. The time-to-event prediction can contribute to decreasing mortality due to VA in older adults with T2DM and CAD by providing an estimated time and probability of diagnosis at specific time points. However, the limited number of older T2DM patients with CAD, who are diagnosed with ventricular arrhythmias, lowers the performance of a traditional time-to-event analysis model. Hypothesis: We hypothesize that machine learning can improve performance in time-to-event prediction. Methods: This study includes 3975 participants aged >65 years, diagnosed with T2DM and CAD in the All of US database. The baseline time was the earliest date when the patients were older than 65 years and diagnosed with T2DM and CAD. Of all participants, 379 were diagnosed with VA within 5 years from baseline. We compared the machine learning–based time-to-event analysis models to the Cox Proportional Hazards model to explore the potential of machine learning in time-to-event analysis. Additionally, the Synthetic Minority Oversampling Technique was applied to resolve the imbalanced data issue. The model performance was evaluated based on the concordance index, which calculates the correlation between predicted risk scores and actual observations, after 5-fold cross-validation. Results: The average concordance index of the Gradient Boosting Machines model was the highest among the five models, including the Cox Proportional Hazards model, with balanced data (Table). Conclusion: We identified that machine learning could improve the performance of prediction models in time-to-event analysis. The Synthetic Minority Oversampling Technique could mitigate the issue of imbalanced data. In addition, machine learning-based survival models showed higher concordance index values compared to the Cox Proportional Hazards model.
Read full abstract