On the Use of Parsing for Named Entity Recognition

Miguel A Alonso,Jesús Vilares,Carlos Gómez-Rodríguez

doi:10.3390/app11031090

Miguel A Alonso, Jesús Vilares + Show 1 more

Open Access

https://doi.org/10.3390/app11031090

Copy DOI

Journal: Applied sciences	Publication Date: Jan 25, 2021
Citations: 8	License type: CC BY 4.0

Affiliation: University of A Coruña

Abstract

Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.

Highlights

Named entity recognition (NER) is a task originally defined at the 6th Message Understanding Conference in 1996 [1], and it consists in finding relevant named entities in the text belonging to a set of predefined categories
We must consider the numbers of true positives (TP), false positives (FP), and false negatives (FN) with respect to said ground truth, where:
A true positive is counted for each entity that is returned by a NER system and appears in the ground truth; A false positive is counted for each entity that is returned by a NER system but does not appear in the ground truth; A false negative is counted for each entity that is not returned by a NER system but does appear in the ground truth

Summary

Introduction

Named entity recognition (NER) is a task originally defined at the 6th Message Understanding Conference in 1996 [1], and it consists in finding relevant named entities in the text belonging to a set of predefined categories. NER is a challenging problem that requires advanced natural language processing (NLP) techniques, as entities tend to have numerous synonyms and variations that include long phrases and abbreviations [6]. NER is essential to any information extraction task, while being the basis of other related or dependent tasks, from relation and event extraction to knowledge discovery and management [7], semantic indexing or question answering [8], with their performance being conditioned by the effectiveness of the entity recognition process.

Results

Discussion

Conclusion