Natural language processing and machine learning approaches for food categorization and nutrition quality prediction compared with traditional methods

Guanlan Hu,Mavra Ahmed,Mary R L'Abbé

doi:10.1016/j.ajcnut.2022.11.022

Guanlan Hu, Mavra Ahmed + Show 1 more

Open Access

https://doi.org/10.1016/j.ajcnut.2022.11.022

Copy DOI

Journal: The American Journal of Clinical Nutrition	Publication Date: Dec 23, 2022
Citations: 19	License type: publisher-specific-oa

Affiliation: University of Toronto

Abstract

BackgroundFood categorization and nutrient profiling are labor intensive, time consuming, and costly tasks, given the number of products and labels in large food composition databases and the dynamic food supply.Objectives: This study used a pretrained language model and supervised machine learning to automate food category classification and nutrition quality score prediction based on manually coded and validated data, and compared prediction results with models using bag-of-words and structured nutrition facts as inputs for predictions. MethodsFood product information from University of Toronto Food Label Information and Price Database 2017 (n = 17,448) and University of Toronto Food Label Information and Price Database 2020 (n = 74,445) databases were used. Health Canada’s Table of Reference Amounts (TRA) (24 categories and 172 subcategories) was used for food categorization and the Food Standards of Australia and New Zealand (FSANZ) nutrient profiling system was used for nutrition quality score evaluation. TRA categories and FSANZ scores were manually coded and validated by trained nutrition researchers. A modified pretrained sentence-Bidirectional Encoder Representations from Transformers model was used to encode unstructured text from food labels into lower-dimensional vector representations, followed by supervised machine learning algorithms (i.e., elastic net, k-Nearest Neighbors, and XGBoost) for multiclass classification and regression tasks. ResultsPretrained language model representations utilized by the XGBoost multiclass classification algorithm reached overall accuracy scores of 0.98 and 0.96 in predicting food TRA major and subcategories, outperforming bag-of-words methods. For FSANZ score prediction, our proposed method reached a similar prediction accuracy (R2: 0.87 and MSE: 14.4) compared with bag-of-words methods (R2: 0.72–0.84; MSE: 30.3–17.6), whereas structured nutrition facts machine learning model performed the best (R2: 0.98; MSE: 2.5). The pretrained language model had a higher generalizable ability on the external test datasets than bag-of-words methods. ConclusionsOur automation achieved high accuracy in classifying food categories and predicting nutrition quality scores using text information found on food labels. This approach is effective and generalizable in a dynamic food environment, where large amounts of food label data can be obtained from websites.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Natural language processing and machine learning approaches for food categorization and nutrition quality prediction compared with traditional methods

Abstract

Talk to us

Similar Papers

More From: The American Journal of Clinical Nutrition

Lead the way for us

Similar Papers

Towards an Enhanced Understanding of Bias in Pre-trained Neural Language Models: A Survey with Special Emphasis on Affective Bias
Anoop K ... Lajish V L
-
Anoop K, et. al. Anoop K ... Lajish V L
01 Jan 2021
01 Jan 2021

On the Power of Pre-Trained Text Representations
Yu Meng ... Jiawei Han
-
Yu Meng, et. al.Yu Meng ... Jiawei Han
14 Aug 2021
14 Aug 2021

A Study of Vietnamese Sentiment Classification with Ensemble Pre-Trained Language Models
Dang Van Thin ... Duong Ngoc Hao
Vietnam Journal of Computer Science | VOL. 11
Dang Van Thin, et. al.Dang Van Thin ... Duong Ngoc Hao
07 Dec 2023
Vietnam Journal of Computer Science | VOL. 11

The Solution of Huawei Cloud & Noah’s Ark Lab to the NLPCC-2020 Challenge: Light Pre-Training Chinese Language Model for NLP Task
Yuyang Zhang ... Cheng Chen
-
Yuyang Zhang, et. al.Yuyang Zhang ... Cheng Chen
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Natural language processing and machine learning approaches for food categorization and nutrition quality prediction compared with traditional methods

Abstract

Talk to us

Similar Papers

More From: The American Journal of Clinical Nutrition