Abstract

Natural language processing research has made major advances with the concept of representing words, sentences, paragraphs, and even documents by embedded vector representations. We apply this idea to the problem of relating foods, as expressed in natural language meal descriptions, to corresponding database entries. We generate fixed-length embeddings for U.S. Department of Agriculture (USDA) food database entries, as well as vector-based representations of natural language meal descriptions, through a convolutional neural network (CNN) architecture that predicts whether or not a USDA food item is present in the meal description. We compute dot products between each token in a meal description and a USDA food entry. By ranking the network's predicted average dot product between each possible database food entry and a meal description, we show it is possible to directly predict the USDA foods mentioned in a meal without requiring intermediate steps that would be used in a conventional database access application. We report the performance of this model on a binary verification task of over 48k meal descriptions, and show that this approach, when integrated with a Markov model, substantially outperforms our previous best multistage approach involving a conditional random field tagger, probabilistic segmentation, and database lookup.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.