Aiding clinical assessment of neonatal sepsis using hematological analyzer data with machine learning techniques.

Brian Huang,Aaron J Masino,Robin Wang,Amrom E Obstfeld

doi:10.1111/ijlh.13549

Abstract

Early diagnosis and antibiotic administration are essential for reducing sepsis morbidity and mortality; however, diagnosis remains difficult due to complex pathogenesis and presentation. We created a machine learning model for bacterial sepsis identification in the neonatal intensive care unit (NICU) using hematological analyzer data. Hematological analyzer data were gathered from NICU patients up to 48hours prior to clinical evaluation for bacterial sepsis. Five models, Support Vector Machine, K-nearest-neighbors, Logistic Regression, Random Forest (RF), and Extreme Gradient boosting (XGBoost), were trained on 60 hematological and nine clinical variables for 2357 cases (1692 control, 665 septic). Clinical feature only models (nine variables) were additionally trained and compared with models including hematological variables. Feature importance was used to assess relative contributions of parameters to performance. The three best performing models were RF, Logistic Regression, and XGBoost. RF achieved an average accuracy of 0.74, AUC-ROC of 0.73, Sensitivity of 0.38, and Specificity of 0.88. Logistic Regression achieved an average accuracy of 0.70, AUC-ROC of 0.74, Sensitivity of 0.62, and Specificity of 0.73. XGBoost achieved an average accuracy of 0.72, AUC-ROC of 0.71, Sensitivity of 0.40, and Specificity of 0.85. All models with hematological variables had significantly stronger performance than models trained on only clinical features. Neutrophil parameters had the highest average feature importance. Machine learning models using hematological analyzer data can classify NICU patients as sepsis positive or negative with stronger performance compared to clinical feature only models. Hematological analyzer variables could augment current sepsis classification machine learning algorithms.

Full Text