Abstract

Obesity is strongly associated with multiple risk factors. It is significantly contributing to an increased risk of chronic disease morbidity and mortality worldwide. There are various challenges to better understand the association between risk factors and the occurrence of obesity. The traditional regression approach limits analysis to a small number of predictors and imposes assumptions of independence and linearity. Machine Learning (ML) methods are an alternative that provide information with a unique approach to the application stage of data analysis on obesity. This study aims to assess the ability of ML methods, namely Logistic Regression, Classification and Regression Trees (CART), and Naïve Bayes to identify the presence of obesity using publicly available health data, using a novel approach with sophisticated ML methods to predict obesity as an attempt to go beyond traditional prediction models, and to compare the performance of three different methods. Meanwhile, the main objective of this study is to establish a set of risk factors for obesity in adults among the available study variables. Furthermore, we address data imbalance using Synthetic Minority Oversampling Technique (SMOTE) to predict obesity status based on risk factors available in the dataset. This study indicates that the Logistic Regression method shows the highest performance. Nevertheless, kappa coefficients show only moderate concordance between predicted and measured obesity. Location, marital status, age groups, education, sweet drinks, fatty/oily foods, grilled foods, preserved foods, seasoning powders, soft/carbonated drinks, alcoholic drinks, mental emotional disorders, diagnosed hypertension, physical activity, smoking, and fruit and vegetables consumptions are significant in predicting obesity status in adults. Identifying these risk factors could inform health authorities in designing or modifying existing policies for better controlling chronic diseases especially in relation to risk factors associated with obesity. Moreover, applying ML methods on publicly available health data, such as Indonesian Basic Health Research (RISKESDAS) is a promising strategy to fill the gap for a more robust understanding of the associations of multiple risk factors in predicting health outcomes.

Highlights

  • Obesity is a major health problem strongly associated with many chronic illnesses with negative effects and long-term consequences, for the patients and their families

  • Risk factors for obesity have been studied extensively, and in general, they are divided into several categories: demographic and socio-economic factors [4,5,6]; lifestyle factors [6, 7]; and genetic factors [4, 5]

  • The obesity status description can be seen in detail in the Supplementary Table 2

Read more

Summary

Introduction

Obesity is a major health problem strongly associated with many chronic illnesses with negative effects and long-term consequences, for the patients and their families. Risk factors for obesity have been studied extensively, and in general, they are divided into several categories: demographic and socio-economic factors (gender, age, education, income, marital status, and urban areas) [4,5,6]; lifestyle factors (consumption of fast food, stress, smoking, alcoholic drinks, and low level of physical activity) [6, 7]; and genetic factors (obese parents) [4, 5]. Among these risk factors, some can be changed or modified, while others cannot. A novel method recently introduced to answer this question uses Machine Learning (ML), which is currently one of the most popular topics in the scientific community for large-scale datasets

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call