Developing a linguistically annotated corpus of Chinese electronic medical record

Zhipeng Jiang,Fangfang Zhao,Yi Guan

doi:10.1109/bibm.2014.6999174

Abstract

Electronic Medical Record (EMR) is the material base of smart healthcare, its automatic analysis is dependent on nature language processing (NLP) technologies. Syntactic analysis, as the basic technology of NLP, can be used to convert the free text of EMR to structured text. However, research on syntactic analysis, even Chinese word segmentation and part-of-speech (POS) tagging on Chinese electronic Medical record (CEMR), is currently at a blank stage because of the lack of annotated corpus on CEMR. To resolve this problem, we propose the annotated scheme from Chinese word segmentation to syntactic analysis, and built the first syntactically annotated corpus of CEMR. Through analyzing the annotated CEMR, we find it has stronger grammatical regularity and particular statistical distribution. These finds are taken advantage to improve the Stanford parser and develop a state-of-the-art Chinese word segmentation and POS tagging system for CEMR. The evaluation results show a substantial benefit to statistical machine learning models from the annotated CEMR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Developing a linguistically annotated corpus of Chinese electronic medical record

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Intelligent diagnosis with Chinese electronic medical records based on convolutional neural networks
Xiaozheng Li ... Jian Chen
BMC Bioinformatics | VOL. 20
Xiaozheng Li, et. al.Xiaozheng Li ... Jian Chen
01 Feb 2019
BMC Bioinformatics | VOL. 20

Combining External Medical Knowledge for Improving Obstetric Intelligent Diagnosis: Model Development and Validation
Kunli Zhang ... Tao Liu
JMIR Medical Informatics | VOL. 9
Kunli Zhang, et. al.Kunli Zhang ... Tao Liu
10 May 2021
JMIR Medical Informatics | VOL. 9

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
An Fang ... Ming Feng
BMC Medical Informatics and Decision Making | VOL. 22
An Fang, et. al.An Fang ... Ming Feng
23 Mar 2022
BMC Medical Informatics and Decision Making | VOL. 22

Entity relationship extraction from Chinese electronic medical records based on feature augmentation and cascade binary tagging framework.
Xiaoqing Lu ... Shudong Xia
Mathematical Biosciences and Engineering | VOL. 21
Xiaoqing Lu, et. al.Xiaoqing Lu ... Shudong Xia
01 Jan 2023
Mathematical Biosciences and Engineering | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Developing a linguistically annotated corpus of Chinese electronic medical record

Abstract

Talk to us

Similar Papers