Factors underlying the development of childhood underweight, overweight, and obesity are not fully understood. Traditional models have drawbacks in handling large-scale, high-dimensional, and nonlinear data. In this study, we aimed to identify factors responsible for underweight, overweight, and obesity using machine learning methods among Chinese children. Our study participants were children aged 3-14 from 30 kindergartens and 26 schools in Beijing and Tangshan. Weight status was defined per the World Health Organization criteria. We implemented three ensemble learning algorithms and compared their performance and ranked the contributing factors by importance and identified an optimal set. A user-friendly web application was developed to calculate the predicted probability of childhood underweight, overweight, and obesity. We analysed data from 18 503 children aged 3-14, including 1798 underweight, 10 579 of normal weight, 3257 overweight, and 2869 with obesity. Of all algorithms, random forest performed the best, with the area under the receiver operating characteristic reaching 0.759 for underweight, 0.806 for overweight, and 0.849 for obesity, with other metrics also reinforcing this algorithm. Further cumulative analyses showed that, for underweight, the optimal set of six factors included maternal body mass index (BMI), age, paternal BMI, maternal reproductive age, paternal reproductive age, and birth weight. The optimal set for overweight comprised of five factors: age, fast food intake, maternal BMI, paternal BMI, and sedentary time. For obesity, the optimal set included six factors: age, fast food intake, maternal BMI, paternal BMI, sedentary time, and maternal reproductive age. Further logistic regression analyses confirmed the predictive capability of individual top factors. Our findings indicate that random forest is the best ensemble learning algorithm for predicting underweight, overweight, and obesity in children aged 3-14 years. We identified the optimal set of significant factors for each malnutrition status and incorporated them into a web application to support the application of this study's findings.
Read full abstract