This study aimed to apply machine learning (ML) techniques to develop and validate a risk prediction model for post-stroke lower extremity deep vein thrombosis (DVT) based on patients' limb function, activities of daily living (ADL), clinical laboratory indicators, and DVT preventive measures. We retrospectively analyzed 620 stroke patients. Eight ML models-logistic regression (LR), support vector machine (SVM), random forest (RF), decision tree (DT), neural network (NN), extreme gradient boosting (XGBoost), Bayesian (NB), and K-nearest neighbor (KNN)-were used to build the model. These models were extensively evaluated using ROC curves, AUC, PR curves, PRAUC, accuracy, sensitivity, specificity, and clinical decision curves (DCA). Shapley's additive explanation (SHAP) was used to determine feature importance. Finally, based on the optimal ML algorithm, different functional feature set models were compared with the Padua scale to select the best feature set model. Our results indicated that the RF algorithm demonstrated superior performance in various evaluation metrics, including AUC (0.74/0.73), PRAUC (0.58/0.58), accuracy (0.75/0.77), and sensitivity (0.78/0.80) in both the training set and test set. DCA analysis revealed that the RF model had the highest clinical net benefit. SHAP analysis showed that D-dimer had the most significant influence on DVT, followed by age, Brunnstrom stage (lower limb), prothrombin time (PT), and mobility ability. The RF algorithm can predict post-stroke DVT to guide clinical practice.
Read full abstract