Abstract
BackgroundSepsis is a significant cause of mortality in-hospital, especially in ICU patients. Early prediction of sepsis is essential, as prompt and appropriate treatment can improve survival outcomes. Machine learning methods are flexible prediction algorithms with potential advantages over conventional regression and scoring system. The aims of this study were to develop a machine learning approach using XGboost to predict the 30-days mortality for MIMIC-III Patients with sepsis-3 and to determine whether such model performs better than traditional prediction models.MethodsUsing the MIMIC-III v1.4, we identified patients with sepsis-3. The data was split into two groups based on death or survival within 30 days and variables, selected based on clinical significance and availability by stepwise analysis, were displayed and compared between groups. Three predictive models including conventional logistic regression model, SAPS-II score prediction model and XGBoost algorithm model were constructed by R software. Then, the performances of the three models were tested and compared by AUCs of the receiver operating characteristic curves and decision curve analysis. At last, nomogram and clinical impact curve were used to validate the model.ResultsA total of 4559 sepsis-3 patients are included in the study, in which, 889 patients were death and 3670 survival within 30 days, respectively. According to the results of AUCs (0.819 [95% CI 0.800–0.838], 0.797 [95% CI 0.781–0.813] and 0.857 [95% CI 0.839–0.876]) and decision curve analysis for the three models, the XGboost model performs best. The risk nomogram and clinical impact curve verify that the XGboost model possesses significant predictive value.ConclusionsUsing machine learning technique by XGboost, more significant prediction model can be built. This XGboost model may prove clinically useful and assist clinicians in tailoring precise management and therapy for the patients with sepsis-3.
Highlights
Sepsis is a significant cause of mortality in-hospital, especially in intensive care unit (ICU) patients
Traditional prediction models based on small sample data such as logistic regression analysis and scoring systems including acute physiology and chronic health evaluation-II (APHACHE-II), Simplified acute physiology score-II (SAPS-II) and etc. [6–8], are still providing comprehensively clinical importance of identifying patients who are at risk of unfavourable prognostic outcomes, but these methods and scores require the statistical assumption of the independent and linear relationship between explanatory and outcome variables or preclude the analysis of a large number of valuable variables
MIMIC-III, a publicly available single-center critical care database which was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (BIDMC, Boston, MA, USA) and the Massachusetts Institute of Technology (MIT, Cambridge, MA, USA), includes information on 46,520 patients who were admitted to various ICUs of BIDMC in Boston, Massachusetts from 2001 to 2012 [11–13]
Summary
Sepsis is a significant cause of mortality in-hospital, especially in ICU patients. Early prediction of sepsis is essential, as prompt and appropriate treatment can improve survival outcomes. Hou et al J Transl Med (2020) 18:462 life-threatening organ dysfunction caused by dysregulated host response [3] Different from those previous diagnostic criteria for sepsis, sepsis-3 highlighted the strong association between infection and organ failure according to the Third International Consensus Definitions for Sepsis and Septic Shock in February 2016 [2], the early identification and diagnosis for sepsis are essential, which could provide meaningful information for clinicians to assess patients’ condition and improve survival outcomes through prompt and appropriate treatment. [6–8], are still providing comprehensively clinical importance of identifying patients who are at risk of unfavourable prognostic outcomes, but these methods and scores require the statistical assumption of the independent and linear relationship between explanatory and outcome variables or preclude the analysis of a large number of valuable variables. Insufficient prognostic strength, large fluctuation range, poor stability and operability, tedious process, and other shortcomings exist in these predictive serum markers, models and scores to a certain extent
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.