ObjectiveTo develop a machine learning model to predict hospital mortality and identify risk factors in cancer-related sepsis patients. MethodWe obtained data from the Medical Information Mart for Intensive Care (MIMIC)-IV critical care data set, which included patients who diagnosed with cancer and fulfilled the definition of sepsis between 2008 and 2019. The data set was randomly split into a training set and a validation set. The dataset was imputed using the K-Nearest Neighbor (KNN) imputation model. An advanced machine learning model called CatBoost was established and then assessed by SHAP value. ResultsA total of 5081 patients were included in the final analysis. The cancer-related sepsis patients had a lower hospital survival (13.8% vs. 25.3%, P < 0.001) than non-cancer-related patients.For cancer-related sepsis patients, ensemble learning algorithms were superior to others with better accuracy and larger AUC, such as CatBoost (AUC: 0.828), LightGBM (AUC: 0.818), and Random Forest Classifier (AUC: 0.803). An evaluation of the performance suggested that the CatBoost model with the most powerful discrimination to predict hospital mortality, outperformed other models with a sensitivity of 76% and a specificity of 74%. The best cutoff was 0.223 for the CatBoost model. In addition, CatBoost also outperformed other severity scores such as SAPS-II (AUC: 0.725) and SOFA (AUC: 0.682). Urine output and the minimum BUN level on admission were the most important features for the hospital mortality prediction of cancer-related sepsis, while the patients’ age and the urine output on admission for non-cancer-related patients. ConclusionFor cancer-related sepsis patients, CatBoost model was a better prediction model. It was easy for clinicians to access by use of common clinical vital signs or laboratory examination parameters, which provides convenience for them to evaluate patient’s condition and make follow-up treatments.
Read full abstract