Enhanced character embedding for Chinese named entity recognition

Bingjing Jia,Yutong Liu,Zhongli Wu,Pengpeng Zhou,Bin Wu

doi:10.1177/0020294020952456

Bingjing Jia, Yutong Liu + Show 3 more

Open Access

https://doi.org/10.1177/0020294020952456

Copy DOI

Abstract

Traditional named entity recognition methods mainly explore the application of hand-crafted features. Currently, with the popularity of deep learning, neural networks have been introduced to capture deep features for named entity recognition. However, most existing methods only aim at modern corpus. Named entity recognition in ancient literature is challenging because names in it have evolved over time. In this paper, we attempt to recognise entities by exploring the characteristics of characters and strokes. The enhanced character embedding model, named ECEM, is proposed on the basis of bidirectional encoder representations from transformers and strokes. First, ECEM can generate the semantic vectors dynamically according to the context of the words. Second, the proposed algorithm introduces morphological-level information of Chinese words. Finally, the enhanced character embedding is fed into the bidirectional long short term memory-conditional random field model for training. To explore the effect of our proposed algorithm, experiments are carried out on both ancient literature and modern corpus. The results indicate that our algorithm is very effective and powerful, compared with traditional ones.

Highlights

Because of the popularity of the web, a great many unstructured texts have emerged to represent web contents
Numerous machine learning approaches have been carefully studied for named entity recognition (NER) task, including Conditional Random Fields (CRFs), Support Vector Machines (SVMs) and Hidden Markov Models (HMMs).[9]
ECEM is first used in ancient literature named entity recognition, which captures context information and abundant knowledge by fine-tuning Bidirectional Encoder Representations from Transformers (BERT), and flexibly acquires morphological information generated through strokes

Summary

Introduction

Because of the popularity of the web, a great many unstructured texts have emerged to represent web contents. Word-based algorithms have achieved certain effect in Chinese NER,[11,15] there are still many challenges. This is because the names of people, places and organisations are increasing without a uniform naming rule, and the ambiguity of Chinese language is inherent. The precision is high, but recall is low In response to these challenges, an enhanced character embedding algorithm, named ECEM, is proposed, while BERT and strokes are integrated to learn the character representation and explore the performance in the Chinese NER domain. ECEM is first used in ancient literature named entity recognition, which captures context information and abundant knowledge by fine-tuning BERT, and flexibly acquires morphological information generated through strokes.

Related works

Experimental results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Measurement and Control	Publication Date: Sep 21, 2020
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Enhanced character embedding for Chinese named entity recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Measurement and Control

Lead the way for us

Similar Papers

Chinese agricultural diseases and pests named entity recognition with multi-scale local context features and self-attention mechanism
Xuchao Guo ... Lin Li
Computers and Electronics in Agriculture | VOL. 179
Xuchao Guo, et. al.Xuchao Guo ... Lin Li
07 Nov 2020
Computers and Electronics in Agriculture | VOL. 179

A Joint Learning Model to Extract Entities and Relations for Chinese Literature Based on Self-Attention
Li-Xin Liang ... Lin Lin
Mathematics | VOL. 10
Li-Xin Liang, et. al.Li-Xin Liang ... Lin Lin
24 Jun 2022
Mathematics | VOL. 10

Named entity recognition of local adverse drug reactions in Xinjiang based on transfer learning
Keming Kang ... Shengwei Tian
Journal of Intelligent & Fuzzy Systems | VOL. 40
Keming Kang, et. al.Keming Kang ... Shengwei Tian
01 Jan 2020
Journal of Intelligent & Fuzzy Systems | VOL. 40

Chinese named entity recognition in power domain based on Bi-LSTM-CRF
Zhenqiang Zhao ... Zhenyu Chen
-
Zhenqiang Zhao, et. al.Zhenqiang Zhao ... Zhenyu Chen
16 Aug 2019
16 Aug 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhanced character embedding for Chinese named entity recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Measurement and Control