Abstract

Fine-grained Named entity recognition (NER) is crucial to natural language processing (NLP) applications like relation extraction and knowledge graph construction. Most existing fine-grained NER systems suffer from inefficiency problem as they use manually annotated training datasets. To address such issue, our NER system could automatically generate datasets from Wikipedia in distant supervision paradigm through mapping hyperlinks in Wikipedia documents to Freebase. In addition, previous NER models can not effectively process fine-grained labels with more than 100 types. So we introduce a ‘BIO’ tagging strategy which can identify the position and type attributes simultaneously. Such tagging scheme transfers NER problem into a sequence-to-sequence (seq2seq) based issue. We propose a seq2seq framework to comprehend the input sentence in a comprehensive way. Specifically, we adopt a Bi-LSTM as the encoder to equally process the past and future information of the input. Then we add a self-attention mechanism to handle the long-term dependency problem in a long sequence. When classifying the entity tags, we choose CRF model as it adds more constraints to avoid position logical problem. Experiments are performed on large-scale datasets for fine-grained NER tasks. Experimental results verify the effectiveness of FSeqC, and it outperforms other state-of-the-art alternatives consistently and significantly.

Highlights

  • Named Entity Recognition (NER) is to identify the named entities in a specific regions of a text and classify them into different types which are pre-defined. These types could be persons, locations, organizations and etc. Such named entity types could play a vital part in natural language processing (NLP) tasks like event extraction [1], relation extraction [2], and other downstream applications such as entity recommendation [3], question answering [4] and knowledge graph construction [5]

  • Existing fine-grained NER models are not able to effectively classify the entity classes since they can not handle more than 100 types

  • NAMED ENTITY RECOGNITION There haven been plenty of researches focus on coarsegrained NER [17], [18], but they only classify three classes: person, location and organization or which are far from enough for downstream tasks like relation extraction and knowledge graph construction

Read more

Summary

INTRODUCTION

Named Entity Recognition (NER) is to identify the named entities in a specific regions of a text and classify them into different types which are pre-defined. Existing fine-grained NER models are not able to effectively classify the entity classes since they can not handle more than 100 types To address such issue, we first introduce a tagging strategy named as sec-part tagging scheme. NAMED ENTITY RECOGNITION There haven been plenty of researches focus on coarsegrained NER [17], [18], but they only classify three classes: person, location and organization or (miscellaneous) which are far from enough for downstream tasks like relation extraction and knowledge graph construction. Hyena is a representative model based on supervised learning using a hierarchical classifier [19] It conducted co-reference resolution and used a binary feature to tag the entity mentions which are annotated with its corresponding types in YAGO. In this paper we limit the number of layers to be less than 3 for simplicity and it is enough for the downstream tasks

PROPOSED MODEL
SELF-ATTENTION LAYER
DECODER LAYER
CRF LAYER
EXPERIMENTS AND RESULTS
EXPERIMENTAL SETUP

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.