The (re)hemorrhage in patients with sporadic cerebral cavernous malformations (CCM) was the primary aim for CCM management. However, accurately identifying the potential (re)hemorrhage among sporadic CCM patients in advance remains a challenge. This study aims to develop machine learning models to detect potential (re)hemorrhage in sporadic CCM patients. This study was based on a dataset of 731 sporadic CCM patients in open data platform Dryad. Sporadic CCM patients were followed up 5 years from January 2003 to December 2018. Support vector machine (SVM), stacked generalization, and extreme gradient boosting (XGBoost) were used to construct models. The performance of models was evaluated by area under receiver operating characteristic curves (AUROC), area under the precision-recall curve (PR-AUC) and other metrics. A total of 517 patients with sporadic CCM were included (330 female [63.8%], mean [SD] age at diagnosis, 42.1 [15.5] years). 76 (re)hemorrhage (14.7%) occurred during follow-up. Among 3 machine learning models, XGBoost model yielded the highest mean (SD) AUROC (0.87 [0.06]) in cross-validation. The top 4 features of XGBoost model were ranked with SHAP (SHapley Additive exPlanations). All-Elements XGBoost model achieved an AUROCs of 0.84 and PR-AUC of 0.49 in testing set, with a sensitivity of 0.86 and a specificity of 0.76. Importantly, 4-Elements XGBoost model developed using top 4 features got a AUROCs of 0.83 and PR-AUC of 0.40, a sensitivity of 0.79, and a specificity of 0.72 in testing set. Two machine learning-based models achieved accurate performance in identifying potential (re)hemorrhages within 5 years in sporadic CCM patients. These models may provide insights for clinical decision-making.
Read full abstract