BackgroundChild mortality is a reliable and significant indicator of a nation’s health. Although the child mortality rate in Bangladesh is declining over time, it still needs to drop even more in order to meet the Sustainable Development Goals (SDGs). Machine Learning models are one of the best tools for making more accurate and efficient forecasts and gaining in-depth knowledge. A deeper understanding is crucial for significantly reducing child mortality rates. Accurate predictions using machine learning models can empower authorities to implement timely interventions and raise awareness. So, the study aimed to explore the factors related to child mortality and assess the efficacy of various machine-learning models in predicting child mortality in Bangladesh.Methods and materialsAbout Forty-two thousand observations, except the missing observations, were extracted for this study from the Bangladesh Demographic and Health Survey (BDHS) data conducted in 2017-18. The survey utilized a two-stage stratified sampling method, selecting 675 enumeration areas-250 in urban settings and 425 in rural areas-resulting in effective data collection from 672 clusters and 20160 households. The Chi-square test and recursive feature elimination (RFE) are used to find the relevant risk factors of child mortality among the number of factors. Six ML-based algorithms were implemented for predicting child mortality, such as Naïve Bayes, Classification and Regression Trees, Random Forest, C5.0 Classification, Gradient Boosting Machine, and Logistic Regression. Model evaluation metrics like accuracy, specificity, sensitivity, negative predictive value, F1 score, positive predictive value, k-fold cross-validation, and area under the curve (AUC) techniques were used to evaluate the performance of the models.Results and discussionThe child mortality rate is 8.2%, according to the data. The bivariate analysis showed that the child mortality rate was higher among the children whose mothers were uneducated, impoverished, underweight, aged 35-49, and gave birth before age 20. Families’ water sources and religious connections had no statistically significant impact on child mortality. The prediction of child mortality using machine learning models is the main objective of this study. None of the machine learning models correctly classified dead occurrences. Therefore, this study conducted over-sampling and under-sampling analysis. Approximately 76727 and 6910 observations were sampled for over-sampling and under-sampling techniques, respectively. According to the findings of the over-sampling data, the Random Forest outperformed all the other models in terms of total performance based on training and testing sets, with an accuracy of seventy percent. The k-fold cross-validation approach demonstrated the Random Forest model’s superior performance, and achieved the highest AUC (0.701). On the other hand, the Gradient Boosting Machine has the highest assessment for predicting child mortality in under-sampling analysis. The k-fold cross-validation also illustrated the better performance of the Gradient Boosting Machine.ConclusionThe Gradient Boosting Machine and Random Forest produce the best predictive power for classifying child mortality and may help to ameliorate policy decision-making in this regard.
Read full abstract