Development of a machine learning-based risk prediction model for cerebral infarction and comparison with nomogram model

Xuewen Li,Yiting Wang,Jiancheng Xu

doi:10.1016/j.jad.2022.07.045

Abstract

BackgroundDevelopment of a cerebral infarction (CI) risk prediction model by mining routine test big data with machine learning algorithms. MethodsCohort 1 included 2017 CI patients and health checkers, and the optimal machine learning algorithms in Extreme gradient Boosting (XgBoost), Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF) were selected to mine all routine test data of the enrolled subjects for screening CI model features. Cohort 2 included patients with CI and Non-CI from 2018 to 2020 to develop an early warning model for CI and was analyzed in subgroups with a cutoff of 50 years. Cohort 3 included CI patients versus Non-CI patients in 2021, and a nomogram models was developed for comparison with the machine learning model. ResultsThe optimal algorithm XgBoost was used to develop a CI risk prediction model CI-Lab8 containing eight characteristics of fibrinogen, age, glucose, mean erythrocyte hemoglobin concentration, albumin, neutrophil absolute value, activated partial thromboplastin time, and triglycerides. The model had an AUC of 0.823 in cohort 2, significantly higher than the FIB (AUC = 0.737), which ranked first in feature importance. CI-Lab8 also had higher diagnostic accuracy in CI patients <50 years of age (AUC = 0.800), slightly lower than in CI patients ≥50 years of age (AUC = 0.856). Receiver operating characteristic curve, calibration curve, and decision curve analysis in cohort 3 showed CI-Lab8 to be superior to nomogram. ConclusionIn this study, the CI risk prediction model developed by XgBoost algorithm outperformed the nomogram model and had higher diagnostic accuracy for CI patients in both <50 and ≥50 years old, which may assist clinical assessment for CI.

Full Text