Abstract

BackgroundThe increasing prevalence of childhood obesity makes it essential to study the risk factors with a sample representative of the population covering more health topics for better preventive policies and interventions. It is aimed to develop an ensemble feature selection framework for large-scale data to identify risk factors of childhood obesity with good interpretability and clinical relevance.MethodsWe analyzed the data collected from 426,813 children under 18 during 2000–2019. A BMI above the 90th percentile for the children of the same age and gender was defined as overweight. An ensemble feature selection framework, Bagging-based Feature Selection framework integrating MapReduce (BFSMR), was proposed to identify risk factors. The framework comprises 5 models (filter with mutual information/SVM-RFE/Lasso/Ridge/Random Forest) from filter, wrapper, and embedded feature selection methods. Each feature selection model identified 10 variables based on variable importance. Considering accuracy, F-score, and model characteristics, the models were classified into 3 levels with different weights: Lasso/Ridge, Filter/SVM-RFE, and Random Forest. The voting strategy was applied to aggregate the selected features, with both feature weights and model weights taken into consideration. We compared our voting strategy with another two for selecting top-ranked features in terms of 6 dimensions of interpretability.ResultsOur method performed the best to select the features with good interpretability and clinical relevance. The top 10 features selected by BFSMR are age, sex, birth year, breastfeeding type, smoking habit and diet-related knowledge of both children and mothers, exercise, and Mother’s systolic blood pressure.ConclusionOur framework provides a solution for identifying a diverse and interpretable feature set without model bias from large-scale data, which can help identify risk factors of childhood obesity and potentially some other diseases for future interventions or policies.

Highlights

  • The increasing prevalence of childhood obesity makes it essential to study the risk factors with a sam‐ ple representative of the population covering more health topics for better preventive policies and interventions

  • A recent study used electronic health records (EHR) data to predict the risk of childhood obesity, including almost one million participants [8], because of the characteristics of data sources, the features were all clinical variables, the environmental factors related with family and school were not included

  • Lasso and Ridge were the specializations of linear regression with different regularization method, they selected similar features, including smoking habits, exercise habits, and diet knowledge

Read more

Summary

Introduction

The increasing prevalence of childhood obesity makes it essential to study the risk factors with a sam‐ ple representative of the population covering more health topics for better preventive policies and interventions. A review in 2018 explored the obesity studies using big data collected from different sources [7], such as social media, smartphones and healthcare wearable devices, transportation and so on These data samples had their own limitations, for example sample bias, ethical issues, or lack of linkage with nutrition information. A recent study used EHR data to predict the risk of childhood obesity, including almost one million participants [8], because of the characteristics of data sources, the features were all clinical variables, the environmental factors related with family and school were not included. The Osakidetza database in Basque region can be of great value as it is large scale data covering millions of participants and includes specific information on different aspects of environmental factors of childhood obesity at the same time

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.