Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine

Tingting Zhang,Ying Ye,Xiaofeng Wang,Yaqiang Wang,Yafei Yang

doi:10.1186/s12911-020-1079-2

Abstract

BackgroundIn this study, we focus on building a fine-grained entity annotation corpus with the corresponding annotation guideline of traditional Chinese medicine (TCM) clinical records. Our aim is to provide a basis for the fine-grained corpus construction of TCM clinical records in future.MethodsWe developed a four-step approach that is suitable for the construction of TCM medical records in our corpus. First, we determined the entity types included in this study through sample annotation. Then, we drafted a fine-grained annotation guideline by summarizing the characteristics of the dataset and referring to some existing guidelines. We iteratively updated the guidelines until the inter-annotator agreement (IAA) exceeded a Cohen’s kappa value of 0.9. Comprehensive annotations were performed while keeping the IAA value above 0.9.ResultsWe annotated the 10,197 clinical records in five rounds. Four entity categories involving 13 entity types were employed. The final fine-grained annotated entity corpus consists of 1104 entities and 67,799 tokens. The final IAAs are 0.936 on average (for three annotators), indicating that the fine-grained entity recognition corpus is of high quality.ConclusionsThese results will provide a foundation for future research on corpus construction and named entity recognition tasks in the TCM clinical domain.

Highlights

In this study, we focus on building a fine-grained entity annotation corpus with the corresponding annotation guideline of traditional Chinese medicine (TCM) clinical records
The inter-annotator agreement (IAA) values exceeded 0.9, indicating that the three annotators had a high degree of consistency in the understanding of labels and TCM records, and they had ability to accomplish these annotation tasks with satisfactory consistency
We presented a method of building a fine-grained annotated entity corpus based on case records of TCM

Summary

Introduction

We focus on building a fine-grained entity annotation corpus with the corresponding annotation guideline of traditional Chinese medicine (TCM) clinical records. The lack of TCM clinical datasets is partly due to concerns regarding patients’ privacy as well as concerns about revealing unfavorable institutional practices [14], so these records are very private and scarce; another reason is the high complexity of Chinese clinical text analysis. This type of text has sublanguage features [15], so the characteristics of raw TCM free-text clinical records are very different from the characteristics of common texts in the Chinese language. Constructing a corpus of TCM clinical records remains difficult, and the electronic capture or retrieval of TCM clinical text data has been a challenge; research into NLP tasks on TCM clinical free text is still at a preliminary stage

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Informatics and Decision Making	Publication Date: Apr 6, 2020
Citations: 11	License type: open-access

R Discovery Prime

R Discovery Prime

Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making

Lead the way for us

Similar Papers

A Short Review on Deep Learning for Entity Recognition
Hien T Nguyen ... Thuan Quoc Nguyen
-
Hien T Nguyen, et. al.Hien T Nguyen ... Thuan Quoc Nguyen
01 Jan 2018
01 Jan 2018

Innovative Deep Neural Network Modeling for Fine-Grained Chinese Entity Recognition
Jingang Liu ... Haihua Yan
Electronics | VOL. 9
Jingang Liu, et. al.Jingang Liu ... Haihua Yan
15 Jun 2020
Electronics | VOL. 9

Fine-grained legal entity annotation: A case study on the Brazilian Supreme Court
Fernando A Correia ... Hélio Lopes
Information Processing & Management | VOL. 59
Fernando A Correia, et. al.Fernando A Correia ... Hélio Lopes
06 Nov 2021
Information Processing & Management | VOL. 59

BioRED: a rich biomedical relation extraction dataset.
Ling Luo ... Po-Ting Lai
Briefings in Bioinformatics | VOL. 23
Ling Luo, et. al.Ling Luo ... Po-Ting Lai
19 Jul 2022
Briefings in Bioinformatics | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making