Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions

Fadhila Tangguh Admojo,Nurul Rismayanti Nurul Rismayanti

doi:10.56705/ijodas.v5i1.126

Abstract

This study harnesses the predictive capabilities of machine learning to explore the determinants of obesity within populations from Mexico, Peru, and Colombia, using a Decision Tree algorithm bolstered by 5-fold cross-validation. Our comprehensive analysis of 2111 individuals' lifestyle and physical condition data yielded accuracy, precision, recall, and F1-scores that notably peaked in the third and fifth folds. The findings affirmed the significance of dietary habits and physical activity as substantial predictors of obesity levels. The variability in model performance across the folds underscored the importance of robust cross-validation in enhancing the model's generalizability. This research contributes to the burgeoning field of data science in public health by providing a viable model for obesity prediction and laying the groundwork for targeted health interventions. Our study's insights are pivotal for public health officials and policymakers, serving as a stepping stone towards more sophisticated, data-driven approaches to combating obesity. The study, however, recognizes the inherent limitations of self-reported data and the need for broader datasets that encompass more diverse variables. Future research directions include the analysis of longitudinal data to establish causal relationships and the comparison of various machine learning models to optimize predictive performance

Full Text