A Machine Learning Approach to Predict the Added-Sugar Content of Packaged Foods

Tazman Davies,Jimmy Chun Yu Louie,Rhoda Ndanuko,Sebastiano Barbieri,Oscar Perez-Concha,Jason H Y Wu

doi:10.1093/jn/nxab341

Tazman Davies, Jimmy Chun Yu Louie + Show 4 more

Open Access

https://doi.org/10.1093/jn/nxab341

Copy DOI

Journal: The Journal of nutrition	Publication Date: Jan 1, 2022
Citations: 12	License type: publisher-specific-oa

Affiliation: UNSW Sydney, University of Hong Kong

Abstract

BackgroundDietary guidelines recommend limiting the intake of added sugars. However, despite the public health importance, most countries have not mandated the labeling of added-sugar content on packaged foods and beverages, making it difficult for consumers to avoid products with added sugar, and limiting the ability of policymakers to identify priority products for intervention. ObjectiveThe aim was to develop a machine learning approach for the prediction of added-sugar content in packaged products using available nutrient, ingredient, and food category information. MethodsThe added-sugar prediction algorithm was developed using k-nearest neighbors (KNN) and packaged food information from the US Label Insight dataset (n = 70,522). A synthetic dataset of Australian packaged products (n = 500) was used to assess validity and generalization. Performance metrics included the coefficient of determination (R2), mean absolute error (MAE), and Spearman rank correlation (ρ). To benchmark the KNN approach, the KNN approach was compared with an existing added-sugar prediction approach that relies on a series of manual steps. ResultsCompared with the existing added-sugar prediction approach, the KNN approach was similarly apt at explaining variation in added-sugar content (R2 = 0.96 vs. 0.97, respectively) and ranking products from highest to lowest in added-sugar content (ρ = 0.91 vs. 0.93, respectively), while less apt at minimizing absolute deviations between predicted and true values (MAE = 1.68 g vs. 1.26 g per 100 g or 100 mL, respectively). ConclusionsKNN can be used to predict added-sugar content in packaged products with a high degree of validity. Being automated, KNN can easily be applied to large datasets. Such predicted added-sugar levels can be used to monitor the food supply and inform interventions aimed at reducing added-sugar intake.

Full Text