BackgroundDespite declines in infant death rates in recent decades in the United States, the national goal of reducing infant death has not been reached. This study aims to predict infant death using machine-learning approaches. MethodsA population-based retrospective study of live births in the United States between 2016 and 2021 was conducted. Thirty-three factors related to birth facility, prenatal care and pregnancy history, labor and delivery, and newborn characteristics were used to predict infant death. ResultsXGBoost demonstrated superior performance compared to the other four compared machine learning models. The original imbalanced dataset yielded better results than the balanced datasets created through oversampling procedures. The cross-validation of the XGBoost-based model consistently achieved high performance during both the pre-pandemic (2016–2019) and pandemic (2020–2021) periods. Specifically, the XGBoost-based model performed exceptionally well in predicting neonatal death (AUC: 0.98). The key predictors of infant death were identified as gestational age, birth weight, 5-min APGAR score, and prenatal visits. A simplified model based on these four predictors resulted in slightly inferior yet comparable performance to the all-predictor model (AUC: 0.91 vs. 0.93). Furthermore, the four-factor risk classification system effectively identified infant deaths in 2020 and 2021 for high-risk (88.7%–89.0%), medium-risk (4.6%–5.4%), and low-risk groups (0.1), outperforming the risk screening tool based on accumulated risk factors. ConclusionsXGBoost-based models excel in predicting infant death, providing valuable prognostic information for perinatal care education and counselling. The simplified four-predictor classification system could serve as a practical alternative for infant death risk prediction.
Read full abstract