Prediction and feature selection of low birth weight using machine learning algorithms

Tasneem Binte Reza,Nahid Salma

doi:10.1186/s41043-024-00647-8

Abstract

Background and aimsThe birth weight of a newborn is a crucial factor that affects their overall health and future well-being. Low birth weight (LBW) is a widespread global issue, which the World Health Organization defines as weighing less than 2,500 g. LBW can have severe negative consequences on an individual’s health, including neonatal mortality and various health concerns throughout their life. To address this problem, this study has been conducted using BDHS 2017–2018 data to uncover important aspects of LBW using a variety of machine learning (ML) approaches and to determine the best feature selection technique and best predictive ML model.MethodsTo pick out the key features, the Boruta algorithm and wrapper method were used. Logistic Regression (LR) used as traditional method and several machine learning classifiers were then used, including, DT (Decision Tree), SVM (Support Vector Machine), NB (Naïve Bayes), RF (Random Forest), XGBoost (eXtreme Gradient Boosting), and AdaBoost (Adaptive Boosting), to determine the best model for predicting LBW. The model’s performance was evaluated based on the specificity, sensitivity, accuracy, F1 score and AUC value.ResultsResult shows, Boruta algorithm identifies eleven significant features including respondent’s age, highest education level, educational attainment, wealth index, age at first birth, weight, height, BMI, age at first sexual intercourse, birth order number, and whether the child is a twin. Incorporating Boruta algorithm’s significant features, the performance of traditional LR and ML methods including DT, SVM, NB, RF, XGBoost, and AB were evaluated where LR, had a specificity, sensitivity, accuracy and F1 score of 0.85, 0.5, 85.15% and 0.915. While the ML methods DT, SVM, NB, RF, XGBoost, and AB model’s respective accuracy values were 85.35%, 85.15%, 84.54%, 81.18%, and 84.41%. Based on the specificity, sensitivity, accuracy, F1 score and AUC, RF (specificity = 0.99, sensitivity = 0.58, accuracy = 85.86%, F1 score = 0.9243, AUC = 0.549) outperformed the other methods. Both the classical (LR) and machine learning (ML) models’ performance has improved dramatically when important characteristics are extracted using the wrapper method. The LR method identified five significant features with a specificity, sensitivity, accuracy and F1 score of 0.87, 0.33, 87.12% and 0.9309. The region, whether the infant is a twin, and cesarean delivery were the three key features discovered by the DT and RF models, which were implemented using the wrapper technique. All three models had the identical F1 score of 0.9318. However, “child is twin” was recognized as a significant feature by the SVM, NB, and AB models, with an F1 score of 0.9315. Ultimately, with an F1 score of 0.9315, the XGBoost model recognized “child is twin” and “age at first sex” as relevant features. Random Forest again beat the other approaches in this instance.ConclusionsThe study reveals Wrapper method as the optimal feature selection technique. The ML method outperforms traditional methods, with Random Forest (RF) being the most effective predictive model for Low-Birth-Weight prediction. The study suggests that policymakers in Bangladesh can mitigate low birth weight newborns by considering identified risk factors.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Prediction and feature selection of low birth weight using machine learning algorithms

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Health, Population and Nutrition

Lead the way for us

Journal: Journal of Health, Population and Nutrition	Publication Date: Oct 12, 2024
License type: CC BY-NC-ND 4.0

Similar Papers

Seeing the Forest for the Trees: Random Forest Models for Predicting Survival in Kidney Transplant Recipients.
Ruth Sapir-Pichhadze ... Bruce Kaplan
Transplantation | VOL. 104
Ruth Sapir-Pichhadze, et. al.Ruth Sapir-Pichhadze ... Bruce Kaplan
01 May 2020
Transplantation | VOL. 104

Construction and validation of risk prediction models for pulmonary embolism in hospitalized patients based on different machine learning methods.
Tao Huang ... Kaili Fu
Frontiers in cardiovascular medicine | VOL. 11
Tao Huang, et. al.Tao Huang ... Kaili Fu
25 Jun 2024
Frontiers in cardiovascular medicine | VOL. 11

Machine learning study using 2020 SDHS data to determine poverty determinants in Somalia
Abdirizak A Hassan ... Christophe Chesneau
Scientific Reports | VOL. 14
Abdirizak A Hassan, et. al.Abdirizak A Hassan ... Christophe Chesneau
12 Mar 2024
Machine learning study using 2020 SDHS data to determine poverty determinants in Somalia
Abdirizak A Hassan ... Christophe Chesneau

Application of machine learning methods for predicting infant mortality in Rwanda: analysis of Rwanda demographic health survey 2014–15 dataset
Emmanuel Mfateneza ... Emmanuel Biracyaza
BMC Pregnancy and Childbirth | VOL. 22
Emmanuel Mfateneza, et. al.Emmanuel Mfateneza ... Emmanuel Biracyaza
04 May 2022
BMC Pregnancy and Childbirth | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Prediction and feature selection of low birth weight using machine learning algorithms

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Health, Population and Nutrition