This study aims to use patient feature and catheterization technology feature variables to train the corresponding machine learning (ML) models to predict peripherally inserted central catheters-deep vein thrombosis (PICCs-DVT) and analyze the importance of the two types of features to PICCs-DVT from the aspect of "input-output" correlation. To comprehensively and systematically summarize the variables used to describe patient features and catheterization technical features, this study combined 18 literature involving the two types of features in predicting PICCs-DVT. A total of 21 variables used to describe the two types of features were summarized, and feature values were extracted from the data of 1,065 PICCs patients from January 1, 2021 to August 31, 2022, to construct a data sample set. Then, 70% of the sample set is used for model training and hyperparameter optimization, and 30% of the sample set is used for PICCs-DVT prediction and feature importance analysis of three common ML classification models (i.e. support vector classifier [SVC], random forest [RF], and artificial neural network [ANN]). In terms of prediction performance, this study selected four metrics to evaluate the prediction performance of the model: precision (P), recall (R), accuracy (ACC), and area under the curve (AUC). In terms of feature importance analysis, this study chooses a single feature analysis method based on the "input-output" sensitivity principle-Permutation Importance. For the mean model performance, the three ML models on the test set are P = 0.92, R = 0.95, ACC = 0.88, and AUC = 0.81. Specifically, the RF model is P = 0.95, R = 0.96, ACC = 0.92, AUC = 0.86; the ANN model is P = 0.92, R = 0.95, ACC = 0.88, AUC = 0.81; the SVC model is P = 0.88, R = 0.94, ACC = 0.85, AUC = 0.77. For feature importance analysis, Catheter-to-vein rate (RF: 91.55%, ANN: 82.25%, SVC: 87.71%), Zubrod-ECOG-WHO score (RF: 66.35%, ANN: 82.25%, SVC: 44.35%), and insertion attempt (RF: 44.35%, ANN: 37.65%, SVC: 65.80%) all occupy the top three in the ML models prediction task of PICCs-DVT, showing relatively consistent ranking results. The ML models show good performance in predicting PICCs-DVT and reveal a relatively consistent ranking of feature importance from the data. The important features revealed might help clinical medical staff to better understand and analyze the formation mechanism of PICCs-DVT from a data-driven perspective.
Read full abstract