Abstract

Named entity recognition is a task to extract named entities with predefined entity types. Span classification is a popular method to support this task. It has the advantage to solve nested structures and make full use of token features in a span. The problem is that exhaustively enumerating and verifying all entity spans suffer from high computational complexity and data imbalance. Furthermore, spans with a high overlapping ratio share the same contextual features in a sentence, which is easy to lead to false positive errors caused by inaccurate entity boundaries. In this paper, we present a model to detect the entity boundaries and predict entity candidates jointly. Instead of labeling tokens, our model makes the prediction based on gap representations between words, which avoids the ambiguity when a token has several labels. We also propose a neighborhood span proposal strategy to generate reasonable negative samples for training, which effectively reduces the data imbalance problem. Our model is evaluated on the ACE2005 and GENIA corpora. It achieves performance close to the state-of-the-art in F1 scores of 88.55% and 79.81%, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call