Abstract

Recent advances in deep neural networks (DNNs) have enabled us to achieve reliable named entity recognition (NER) models without handcrafting features. However, these are also some obstacles imposed by using those machine learning methods, in need of a large amount of manually labeled data. To avoid such limitations, we could replace human annotation with distant supervision, however there remain a technical challenge on the error label issue caused by ignoring the entities that are not included in the vocabulary, which should be addressed to achieve the effective NER model. Then, we propose a novel back-labeling approach and integrate it into a tagging scheme, especially, we apply this scheme to handle the NER task in traditional Chinese medicine (TCM) field. In addition, we discuss how to use distant supervision methods to achieve better performance of the NER model. We conduct some experiments and verify that our scheme can effectively improve the entity recognition on the basis of distant supervision.

Highlights

  • The task of recognizing the entity of traditional Chinese medicine (TCM) has recently received the attention of the research community

  • Because the entity recognition task is the basis of the construction of ancient Chinese medicine Knowledge Graph, it plays a key role in the downstream task such as relationship extraction, the construction of Knowledge Graph and the auxiliary diagnosis based on the Knowledge Graph

  • We compare the entity recognition method proposed in this paper with the distant supervision entity identification method Distant-LSTM-Conditional random fields (CRF) [3] and dictionary matching method on TCM texts, and fully verify the effectiveness of the proposed method; The experiment verifies the feasibility of the method, that is, it only needs about 3000 distant supervision sentences to achieve better result so our method can be applied to TCM texts with few sentences

Read more

Summary

INTRODUCTION

The task of recognizing the entity of TCM has recently received the attention of the research community. The existing distant supervision NER model usually solves the problem of entity span detection through heuristic matching rules, such as POS tag-based regular expressions and precise string matching. We propose a new back-labeling approach based on vocabulary to decompose and process the possible situation of entity phrase conflicts in TCM texts, and the processing method is combined with the new entity recognition ‘‘Tie or Break’’ tagging scheme. The combination of the new back-labeling approach, the ‘‘Tie or Break’’ tagging scheme and the neural structure constitute the TCM NER method in this paper, which proves its effectiveness in our experiments. We combine the new designed back-labeling approach based on vocabulary with the ‘‘Tie or Break’’ tagging scheme to solve the boundary uncertainty of the entity phrase, the type uncertainty of the entity phrase, and the conflict between entities. We compare the entity recognition method proposed in this paper with the distant supervision entity identification method Distant-LSTM-CRF [3] and dictionary matching method on TCM texts, and fully verify the effectiveness of the proposed method; The experiment verifies the feasibility of the method, that is, it only needs about 3000 distant supervision sentences to achieve better result so our method can be applied to TCM texts with few sentences

RELATED WORK
TAGGING SCHEME AND BACK-LABELING APPROACH
EFFECT OF DISTANT SUPERVISION CORPUS SIZE ON MODEL
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call