Named Entity Recognition in Chinese Medical Literature Using Pretraining Models

Yu Wang,Yang Xu,Lisheng Gao,Zuchang Ma,Yining Sun

doi:10.1155/2020/8812754

Yu Wang, Yang Xu + Show 3 more

Open Access

PDF Available

https://doi.org/10.1155/2020/8812754

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The medical literature contains valuable knowledge, such as the clinical symptoms, diagnosis, and treatments of a particular disease. Named Entity Recognition (NER) is the initial step in extracting this knowledge from unstructured text and presenting it as a Knowledge Graph (KG). However, the previous approaches of NER have often suffered from small-scale human-labelled training data. Furthermore, extracting knowledge from Chinese medical literature is a more complex task because there is no segmentation between Chinese characters. Recently, the pretraining models, which obtain representations with the prior semantic knowledge on large-scale unlabelled corpora, have achieved state-of-the-art results for a wide variety of Natural Language Processing (NLP) tasks. However, the capabilities of pretraining models have not been fully exploited, and applications of other pretraining models except BERT in specific domains, such as NER in Chinese medical literature, are also of interest. In this paper, we enhance the performance of NER in Chinese medical literature using pretraining models. First, we propose a method of data augmentation by replacing the words in the training set with synonyms through the Mask Language Model (MLM), which is a pretraining task. Then, we consider NER as the downstream task of the pretraining model and transfer the prior semantic knowledge obtained during pretraining to it. Finally, we conduct experiments to compare the performances of six pretraining models (BERT, BERT-WWM, BERT-WWM-EXT, ERNIE, ERNIE-tiny, and RoBERTa) in recognizing named entities from Chinese medical literature. The effects of feature extraction and fine-tuning, as well as different downstream model structures, are also explored. Experimental results demonstrate that the method of data augmentation we proposed can obtain meaningful improvements in the performance of recognition. Besides, RoBERTa-CRF achieves the highest F1-score compared with the previous methods and other pretraining models.

Highlights

In recent decades, it has been generally known that the rapid growth of information technology has resulted in huge amounts of information generated and shared in the field of medicine, where the number of published documents, such as articles, books, and technical reports, is increasing exponentially [1]
We enhance the performance of Named Entity Recognition (NER) in Chinese medical literature using pretraining models. e dataset we used is “A Labelled Chinese Dataset for Diabetes (LCDD),” which contains authoritative Chinese medical literature in recent seven years. e main contributions of this paper can be summarized as follows: (1) Firstly, we proposed a method of data augmentation based on the Masked Language Model (MLM)
We will introduce the dataset for the NER task and show the results. e experiments were performed with PaddlePaddle, which is a framework of deep learning

Summary

Introduction

It has been generally known that the rapid growth of information technology has resulted in huge amounts of information generated and shared in the field of medicine, where the number of published documents, such as articles, books, and technical reports, is increasing exponentially [1]. E medical literature contains valuable knowledge, such as the clinical symptoms, diagnosis, and treatments of a particular disease. It is time-consuming and laborious for medical researchers to obtain knowledge from these documents. Us, it is critical to extract information and knowledge from unstructured medical literature using novel information extraction techniques and present the findings in a visually intuitive Knowledge Graph which supports machine-understandable information about the medicine [2, 3]. Named Entity Recognition (NER) is the fundamental task in Natural Language Processing (NLP). It is the initial step in extracting valuable knowledge from unstructured text and building a medical Knowledge Graph (KG).

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Programming	Publication Date: Sep 9, 2020
Citations: 20	License type: CC BY 4.0

R Discovery Prime

Named Entity Recognition in Chinese Medical Literature Using Pretraining Models

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Scientific Programming

Lead the way for us

Similar Papers

Study on Named Entity Recognition in Chinese Literatures on Hypertension treatment
Jing Wang
-
Jing WangJing Wang
13 Aug 2021
13 Aug 2021

Application of Pre-training Models in Named Entity Recognition
Yu Wang ... Yining Sun
-
Yu Wang, et. al.Yu Wang ... Yining Sun
01 Aug 2020
01 Aug 2020

UMLS-based data augmentation for natural language processing of clinical research literature.
Tian Kang ... Chunhua Weng
Journal of the American Medical Informatics Association | VOL. 28
Tian Kang, et. al.Tian Kang ... Chunhua Weng
23 Dec 2020
Journal of the American Medical Informatics Association | VOL. 28

Analysis of the cognition of Chinese medicine undergraduates in the course of Chinese medical literature
...
Traditional Chinese Medicine | VOL. 41
, et. al. ...
30 Sep 2019
Traditional Chinese Medicine | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Named Entity Recognition in Chinese Medical Literature Using Pretraining Models

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Scientific Programming