Improving Distantly-Supervised Named Entity Recognition for Traditional Chinese Medicine Text via a Novel Back-Labeling Approach

Dezheng Zhang,Shibing Yang,Qi Jia,Xiong Luo,Chao Xia,Cong Xu,Yonghong Xie

doi:10.1109/access.2020.3015056

Abstract

Recent advances in deep neural networks (DNNs) have enabled us to achieve reliable named entity recognition (NER) models without handcrafting features. However, these are also some obstacles imposed by using those machine learning methods, in need of a large amount of manually labeled data. To avoid such limitations, we could replace human annotation with distant supervision, however there remain a technical challenge on the error label issue caused by ignoring the entities that are not included in the vocabulary, which should be addressed to achieve the effective NER model. Then, we propose a novel back-labeling approach and integrate it into a tagging scheme, especially, we apply this scheme to handle the NER task in traditional Chinese medicine (TCM) field. In addition, we discuss how to use distant supervision methods to achieve better performance of the NER model. We conduct some experiments and verify that our scheme can effectively improve the entity recognition on the basis of distant supervision.

Highlights

The task of recognizing the entity of traditional Chinese medicine (TCM) has recently received the attention of the research community
Because the entity recognition task is the basis of the construction of ancient Chinese medicine Knowledge Graph, it plays a key role in the downstream task such as relationship extraction, the construction of Knowledge Graph and the auxiliary diagnosis based on the Knowledge Graph
We compare the entity recognition method proposed in this paper with the distant supervision entity identification method Distant-LSTM-Conditional random fields (CRF) [3] and dictionary matching method on TCM texts, and fully verify the effectiveness of the proposed method; The experiment verifies the feasibility of the method, that is, it only needs about 3000 distant supervision sentences to achieve better result so our method can be applied to TCM texts with few sentences

Summary

INTRODUCTION

The task of recognizing the entity of TCM has recently received the attention of the research community. The existing distant supervision NER model usually solves the problem of entity span detection through heuristic matching rules, such as POS tag-based regular expressions and precise string matching. We propose a new back-labeling approach based on vocabulary to decompose and process the possible situation of entity phrase conflicts in TCM texts, and the processing method is combined with the new entity recognition ‘‘Tie or Break’’ tagging scheme. The combination of the new back-labeling approach, the ‘‘Tie or Break’’ tagging scheme and the neural structure constitute the TCM NER method in this paper, which proves its effectiveness in our experiments. We combine the new designed back-labeling approach based on vocabulary with the ‘‘Tie or Break’’ tagging scheme to solve the boundary uncertainty of the entity phrase, the type uncertainty of the entity phrase, and the conflict between entities. We compare the entity recognition method proposed in this paper with the distant supervision entity identification method Distant-LSTM-CRF [3] and dictionary matching method on TCM texts, and fully verify the effectiveness of the proposed method; The experiment verifies the feasibility of the method, that is, it only needs about 3000 distant supervision sentences to achieve better result so our method can be applied to TCM texts with few sentences

RELATED WORK

TAGGING SCHEME AND BACK-LABELING APPROACH

EFFECT OF DISTANT SUPERVISION CORPUS SIZE ON MODEL

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2020
Citations: 16	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Improving Distantly-Supervised Named Entity Recognition for Traditional Chinese Medicine Text via a Novel Back-Labeling Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study.
Steven S Doerstling ... Felicia Chen
Journal of Medical Internet Research | VOL. 24
Steven S Doerstling, et. al.Steven S Doerstling ... Felicia Chen
21 Jun 2022
Journal of Medical Internet Research | VOL. 24

Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training
...
-
, et. al. ...
21 Oct 2021
21 Oct 2021

Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training
Yu Meng ... Heng Ji
-
Yu Meng, et. al.Yu Meng ... Heng Ji
01 Jan 2020
01 Jan 2020

Pattern-enhanced Named Entity Recognition with Distant Supervision
Xuan Wang ... Yingjun Guan
-
Xuan Wang, et. al.Xuan Wang ... Yingjun Guan
10 Dec 2020
10 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving Distantly-Supervised Named Entity Recognition for Traditional Chinese Medicine Text via a Novel Back-Labeling Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions