A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.

Riste Stojanov,Gjorgjina Cenikj,Barbara Koroušić Seljak,Tome Eftimov,Gorjan Popovski

doi:10.2196/28229

Abstract

BackgroundRecently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work has been done in natural language processing and machine learning to enable biomedical information extraction. However, machine learning in food science domains remains inadequately resourced, which brings to attention the problem of developing methods for food information extraction. There are only few food semantic resources and few rule-based methods for food information extraction, which often depend on some external resources. However, an annotated corpus with food entities along with their normalization was published in 2019 by using several food semantic resources.ObjectiveIn this study, we investigated how the recently published bidirectional encoder representations from transformers (BERT) model, which provides state-of-the-art results in information extraction, can be fine-tuned for food information extraction.MethodsWe introduce FoodNER, which is a collection of corpus-based food named-entity recognition methods. It consists of 15 different models obtained by fine-tuning 3 pretrained BERT models on 5 groups of semantic resources: food versus nonfood entity, 2 subsets of Hansard food semantic tags, FoodOn semantic tags, and Systematized Nomenclature of Medicine Clinical Terms food semantic tags.ResultsAll BERT models provided very promising results with 93.30% to 94.31% macro F1 scores in the task of distinguishing food versus nonfood entity, which represents the new state-of-the-art technology in food information extraction. Considering the tasks where semantic tags are predicted, all BERT models obtained very promising results once again, with their macro F1 scores ranging from 73.39% to 78.96%.ConclusionsFoodNER can be used to extract and annotate food entities in 5 different tasks: food versus nonfood entities and distinguishing food entities on the level of food groups by using the closest Hansard semantic tags, the parent Hansard semantic tags, the FoodOn semantic tags, or the Systematized Nomenclature of Medicine Clinical Terms semantic tags.

Highlights

Food is one of the most important environmental factors that affects human health [1]
Enabled by the availability of several food resources that were published toward the end of 2019, we introduce a fine-tuned bidirectional encoder representations from transformers (BERT) model that can be used for food information extraction, called as FoodNER
The experimental setups for fine-tuning the BERT and BioBERT models in each classification task are explained, followed by the experimental results obtained by the evaluation

Summary

Introduction

Food is one of the most important environmental factors that affects human health [1]. The data consist of entities from different domains such as food and nutrition, medicine, pharmacy, ecology, and agriculture The extraction of this information allows the creation of knowledge graphs [2], which represent a collection of interlinked descriptions of entities—objects, events, or concepts—by using semantic metadata and providing a framework for data integration, unification, analytics, and sharing. Methods: We introduce FoodNER, which is a collection of corpus-based food named-entity recognition methods It consists of 15 different models obtained by fine-tuning 3 pretrained BERT models on 5 groups of semantic resources: food versus nonfood entity, 2 subsets of Hansard food semantic tags, FoodOn semantic tags, and Systematized Nomenclature of Medicine Clinical Terms food semantic tags. Conclusions: FoodNER can be used to extract and annotate food entities in 5 different tasks: food versus nonfood entities and distinguishing food entities on the level of food groups by using the closest Hansard semantic tags, the parent Hansard semantic tags, the FoodOn semantic tags, or the Systematized Nomenclature of Medicine Clinical Terms semantic tags

Objectives

Methods

Results

Discussion

Conclusion