To construct and validate the best predictive model for 28-day death risk in patients with septic shock based on different supervised machine learning algorithms. The patients with septic shock meeting the Sepsis-3 criteria were selected from Medical Information Mart for Intensive Care-IV v2.0 (MIMIC-IV v2.0). According to the principle of random allocation, 70% of these patients were used as the training set, and 30% as the validation set. Relevant predictive variables were extracted from three aspects: demographic characteristics and basic vital signs, serum indicators within 24 hours of intensive care unit (ICU) admission and complications possibly affecting indicators, functional scoring and advanced life support. The predictive efficacy of models constructed using five mainstream machine learning algorithms including decision tree classification and regression tree (CART), random forest (RF), support vector machine (SVM), linear regression (LR), and super learner [SL; combined CART, RF and extreme gradient boosting (XGBoost)] for 28-day death in patients with septic shock was compared, and the best algorithm model was selected. The optimal predictive variables were determined by intersecting the results from LASSO regression, RF, and XGBoost algorithms, and a predictive model was constructed. The predictive efficacy of the model was validated by drawing receiver operator characteristic curve (ROC curve), the accuracy of the model was assessed using calibration curves, and the practicality of the model was verified through decision curve analysis (DCA). A total of 3 295 patients with septic shock were included, with 2 164 surviving and 1 131 dying within 28 days, resulting in a mortality of 34.32%. Of these, 2 307 were in the training set (with 792 deaths within 28 days, a mortality of 34.33%), and 988 in the validation set (with 339 deaths within 28 days, a mortality of 34.31%). Five machine learning models were established based on the training set data. After including variables at three aspects, the area under the ROC curve (AUC) of RF, SVM, and LR machine learning algorithm models for predicting 28-day death in septic shock patients in the validation set was 0.823 [95% confidence interval (95%CI) was 0.795-0.849], 0.823 (95%CI was 0.796-0.849), and 0.810 (95%CI was 0.782-0.838), respectively, which were higher than that of the CART algorithm model (AUC = 0.750, 95%CI was 0.717-0.782) and SL algorithm model (AUC = 0.756, 95%CI was 0.724-0.789). Thus above three algorithm models were determined to be the best algorithm models. After integrating variables from three aspects, 16 optimal predictive variables were identified through intersection by LASSO regression, RF, and XGBoost algorithms, including the highest pH value, the highest albumin (Alb), the highest body temperature, the lowest lactic acid (Lac), the highest Lac, the highest serum creatinine (SCr), the highest Ca2+, the lowest hemoglobin (Hb), the lowest white blood cell count (WBC), age, simplified acute physiology score III (SAPS III), the highest WBC, acute physiology score III (APS III), the lowest Na+, body mass index (BMI), and the shortest activated partial thromboplastin time (APTT) within 24 hours of ICU admission. ROC curve analysis showed that the Logistic regression model constructed with above 16 optimal predictive variables was the best predictive model, with an AUC of 0.806 (95%CI was 0.778-0.835) in the validation set. The calibration curve and DCA curve showed that this model had high accuracy and the highest net benefit could reach 0.3, which was significantly outperforming traditional models based on single functional score [APS III score, SAPS III score, and sequential organ failure assessment (SOFA) score] with AUC (95%CI) of 0.746 (0.715-0.778), 0.765 (0.734-0.796), and 0.625 (0.589-0.661), respectively. The Logistic regression model, constructed using 16 optimal predictive variables including pH value, Alb, body temperature, Lac, SCr, Ca2+, Hb, WBC, SAPS III score, APS III score, Na+, BMI, and APTT, is identified as the best predictive model for the 28-day death risk in patients with septic shock. Its performance is stable, with high discriminative ability and accuracy.
Read full abstract