Abstract
In the ever-evolving domain of food computing, named entity recognition (NER) presents transformative potential that extends far beyond mere word tagging in recipes. Its implications encompass intelligent recipe recommendations, health analysis, and personalization. Nevertheless, existing NER models in food computing encounter challenges stemming from variations in recipe input standards, limited annotations, and dataset quality. This article addresses the specific problem of ingredient NER and introduces two innovative models: SINERA, an efficient and robust model, and SINERAS, a semi-supervised variant that leverages a Gaussian Mixture Model (GMM) to learn from untagged ingredient list entries. To mitigate issues associated with data quality and availability in food computing, we introduce the ARTI dataset, a diverse and comprehensive repository of ingredient lines. Additionally, we identify and tackle a pervasive challenge—spurious correlations between entity positions and predictions. To address this, we propose a set of data augmentation rules tailored for food NER. Extensive evaluations conducted on the ARTI dataset and a revised TASTEset dataset underscore the performance of our models. They outperform several state-of-the-art benchmarks and rival the BERT model while maintaining smaller parameter sizes and reduced training times.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.