Abstract

Usually taken as linguistic features by Part-Of-Speech (POS) tagging, Named Entity Recognition (NER) is a major task in Natural Language Processing (NLP). In this paper, we put forward a new comprehensive-embedding, considering three aspects, namely character-embedding, word-embedding, and pos-embedding stitched in the order we give, and thus get their dependencies, based on which we propose a new Character–Word–Position Combined BiLSTM-Attention (CWPC_BiAtt) for the Chinese NER task. Comprehensive-embedding via the Bidirectional Llong Short-Term Memory (BiLSTM) layer can get the connection between the historical and future information, and then employ the attention mechanism to capture the connection between the content of the sentence at the current position and that at any location. Finally, we utilize Conditional Random Field (CRF) to decode the entire tagging sequence. Experiments show that CWPC_BiAtt model we proposed is well qualified for the NER task on Microsoft Research Asia (MSRA) dataset and Weibo NER corpus. A high precision and recall were obtained, which verified the stability of the model. Position-embedding in comprehensive-embedding can compensate for attention-mechanism to provide position information for the disordered sequence, which shows that comprehensive-embedding has completeness. Looking at the entire model, our proposed CWPC_BiAtt has three distinct characteristics: completeness, simplicity, and stability. Our proposed CWPC_BiAtt model achieved the highest F-score, achieving the state-of-the-art performance in the MSRA dataset and Weibo NER corpus.

Highlights

  • Named Entity Recognition (NER) plays an important role in the field of natural language processing

  • If the prediction is failed, and the positive class is predicted as a false negative (FN), so the recall rate is defined as the following formula: true positive (TP)

  • We found that [22,23] used traditional machine learning for NER (Named Entity Recognition) tasks, which required complex feature engineering, and that training results would not be so effective if they had fewer features

Read more

Summary

Introduction

Named Entity Recognition (NER) plays an important role in the field of natural language processing. In recent years, it has gradually become an essential component of information extraction technologies [1]. The Long Short-Term Memory (LSTM) was proposed in 1997 by [18], having achieved unprecedented performance in the field of NLP in recent years. In the NER task, the use of Bidirectional Long Short Term Memory-Convolutional Neural Networks (BiLSTM-CNNS) on the CoNLL-2003 data set by [19] achieved a good score of 91.23%. Taking into account the above considerations, we proposed a new model, Character-Word-Position Combined BiLSTM-Attention (CWPC_BiAtt), for Chinese

Related Work
Attention Mechanism
Comprehensive-Embedding
BiLSTM Layer
The main ofto is as follows
Attention Layer
Global Self-Attention
Local Self-Attention
CRF Layer
Evaluation Metrics
MSRA Dataset
Weibo NER Corpus
Settings
Experiment and Analyze on MSRA Dataset
Experiment and Analyze on Weibo NER Corpus
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.