Predicting mortality risk in neonatal intensive care units (NICUs) is challenging due to complex, variable clinical and physiological data. Machine learning (ML) offers potential for more accurate risk stratification. To compare the performance of various ML models in predicting NICU mortality using a team-based modeling competition. We conducted a modeling competition with five neonatologist-led teams applying ML techniques-logistic regression, CatBoost, neural networks, random forest, and XGBoost-to a shared dataset from over 6,000 NICU admissions. The dataset included static demographic and clinical variables, alongside daily samples of heart rate and oxygen saturation. Each team developed models to predict mortality risk at baseline and within 7 days. Models were evaluated using the area under the receiver operator characteristic curve (AUC). Results were presented at a national meeting, where an audience poll ranked models before AUC results were revealed. The audience favored the most complex model (CNN) for real-world application, though logistic regression achieved the highest AUC on test data. Teams employed varied feature selection, tuning, and evaluation strategies. Logistic regression outperformed more complex models, highlighting the importance of selecting modeling methods based on data characteristics, interpretability, and expertise rather than model complexity alone. By demonstrating that model complexity does not necessarily equate to better predictive performance, this research encourages the careful selection of modeling approaches.
Read full abstract