Early identification of high-risk individuals for weight problems in children and adolescents is crucial for implementing timely preventive measures. While machine learning (ML) techniques have shown promise in addressing this complex challenge with high-dimensional data, feature selection is vital for identifying the key predictors that can facilitate effective and targeted interventions. This study aims to utilize feature selection process to identify a robust and minimal set of predictors that can aid in the early prediction of short- and long-term weight problems in children and adolescents. We utilized demographic, physical, and psychological wellbeing predictors to model weight status (normal, underweight, overweight, and obese) for 1-, 3-, and 5-year periods. To select the most influential features, we employed four feature selection methods: (1) Chi-Square test; (2) Information Gain; (3) Random Forest; (4) eXtreme Gradient Boosting (XGBoost) with six ML approaches. The stability of the feature selection methods was assessed by Jaccard's index, Spearman's rank correlation and Pearson's correlation. Model evaluation was performed by various accuracy metrics. With 3,862,820 million student-visits were included in this population-based study, the mean age of 11.6 (SD = 3.64) for the training set and 10.8 years (SD = 3.50) for the temporal test set. From the initial set of 38 predictors, we identified 6, 9, and 13 features for 1-, 3-, and 5-year predictions, respectively, by the best performed feature selection method of Chi-Square test in XGBoost models. These feature sets demonstrated excellent stability and achieved prediction accuracies of 0.82, 0.73, and 0.70; macro-AUCs of 0.94, 0.86, and 0.83; micro-AUCs of 0.96, 0.93, and 0.92 for different prediction windows, respectively. Weight, height, sex, total score of self-esteem, and age were consistently the most influential predictors across all prediction windows. Additionally, several psychological and social wellbeing predictors showed relatively high importance in long-term weight status prediction. We demonstrate the potential of ML in identifying key predictors of weight status in children and adolescents. While traditional anthropometric measures remain important, psychological and social wellbeing factors also emerge as crucial predictors, potentially informing targeted interventions to address childhood and adolescence weight problems.
Read full abstract