ObjectiveTo construct and compared the short-term prognosis prediction models of acute ischemic stroke (AIS) by machine learning (ML). MethodsRetrospectively study. The group W (mRS≤3) was clustered, and combined with group P (mRS>3) to form the post-clustering dataset for modeling. The “glmnet”, “rpart”, “xgboost”, “randomForest”, “neuralnet” packages were used to construct ML models. The accuracy, sensitivity, specificity, positive predict value (PPV), negative predict value (NPV) among the models were compared. Four external clinical datasets were used for external clinical validation. The optimal prediction model was determined by variable screening ability, model visualization, and external clinical validation performance. ResultsThe post-clustering dataset contains 139 patients (group W) and 122 patients (group P). The neutrophil multiplied by D-dimer (NDM) has predictive value in all ML prediction models in this study. In the decision tree model, NDMQ occupies the first tree node, When NDM≤5.62 and the age<74.5, the probability of poor prognosis of AIS is less than 20 %. When NDM>5.62 and accompanied by pneumonia, the incidence of poor prognosis of AIS is about 90 %. In the Random Forest (RF) model, NDMQ had the highest Gini index. The variable combination screened by the RF model had the best performance in the neural network, and the accuracy, sensitivity, specificity, PPV, and NPV of the external validation were 0.800, 0.774, 0.833, 0.857, and 0.741, respectively. The RF model had the best performance in the external clinical validation datasets, with accuracies of 0.646, 0.697, 0.695, and 0.713, respectively. ConclusionsNDM shows predictive value for AIS short-term prognosis in all ML models in this study. The optimal model in screening characteristic variables and the performance of in external clinical datasets was RF model. In the analysis of medical data with small sample size and outcome as categorical variables, RF could be used as the main algorithm to build a model.
Read full abstract