Clustering analysis and machine learning algorithms in the prediction of dietary patterns: Cross-sectional results of the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil).

Vanderlei Carneiro Silva,Tânia Aparecida De Araujo,Paulo Andrade Lotufo,Bartira Gorgulho,Isabela Martins Benseñor,Itamar De Souza Santos,Dirce Maria Marchioni

doi:10.1111/jhn.12992

Abstract

Machine learning investigates how computers can automatically learn. The present study aimed to predict dietary patterns and compare algorithm performance in making predictions of dietary patterns. We analysed the data of public employees (n = 12,667) participating in the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil). The K-means clustering algorithm and six other classifiers (support vector machines, naïve Bayes, K-nearest neighbours, decision tree, random forest and xgboost) were used to predict the dietary patterns. K-means clustering identified two dietary patterns. Cluster 1, labelled the Western pattern, was characterised by a higher energy intake and consumption of refined cereals, beans and other legumes, tubers, pasta, processed and red meats, high-fat milk and dairy products, and sugary beverages; Cluster 2, labelled the Prudent pattern, was characterised by higher intakes of fruit, vegetables, whole cereals, white meats, and milk and reduced-fat milk derivatives. The most important predictors were age, sex, per capita income, education level and physical activity. The accuracy of the models varied from moderate to good (69%-72%). The performance of the algorithms in dietary pattern prediction was similar, and the models presented may provide support in screener tasks and guide health professionals in the analysis of dietary data.

Full Text