Abstract

Named entity recognition (NER) is a subtask in natural language processing, and its accuracy greatly affects the effectiveness of downstream tasks. Aiming at the problem of insufficient expression of potential Chinese features in named entity recognition tasks, this paper proposes a multifeature adaptive fusion Chinese named entity recognition (MAF-CNER) model. The model uses bidirectional long short-term memory (BiLSTM) neural network to extract stroke and radical features and adopts a weighted concatenation method to fuse two sets of features adaptively. This method can better integrate the two sets of features, thereby improving the model entity recognition ability. In order to fully test the entity recognition performance of this model, we compared the basic model and other mainstream models on Microsoft Research Asia (MSRA) and “China People’s Daily” dataset from January to June 1998. Experimental results show that this model is better than other models, with F1 values of 97.01% and 96.78%, respectively.

Highlights

  • Word representation learning has been widely concerned as a basic problem in the field of natural language processing

  • In the comparison between bidirectional long short-term memory (BiLSTM) and BiLSTM-Conditional Random Field (CRF), after adding the CRF module, it can be seen that the BiLSTM-CRF model has various aspects

  • Both are better than BiLSTM, which is mainly due to the fact that CRF considers the global label information in the sequence during the decoding process, which improves the performance of the model

Read more

Summary

Introduction

Word representation learning has been widely concerned as a basic problem in the field of natural language processing. E standard model for solving NER problems in the English domain is the BiLSTM-CRF model proposed by Huang et al [17], which is more robust and less dependent on word embedding Based on this structure, Lample et al proposed to use BiLSTM to extract word representations on character-level embedding. Cho et al proposed a deep learning NER model that effectively represents biomedical word tokens through the design of a combinatorial feature embedding, enhanced by integrating two different character-level representations extracted from CNN and BiLSTM [18]. Xu et al proposed a simple and effective neural network framework ME-CNER (Multiple Embeddings for Chinese Named Entity Recognition), which embeds rich semantic information at multiple levels from radicals, characters to words [23]. About 171M training corpus is obtained. e pretraining of character embedding is implemented with the Python version of Word2Vec in Gensim, and the dimension of the feature vector is set to 100

Adaptive Fusion Representation of Strokes and Radical Features
Experiments and Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call