A unified framework of medical information annotation and extraction for Chinese clinical text

Enwei Zhu,Qilin Sheng,Huanwan Yang,Yiyang Liu,Ting Cai,Jinpeng Li

doi:10.1016/j.artmed.2023.102573

Enwei Zhu, Qilin Sheng + Show 4 more

Open Access

PDF Available

https://doi.org/10.1016/j.artmed.2023.102573

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Medical information extraction consists of a group of natural language processing (NLP) tasks, which collaboratively convert clinical text to pre-defined structured formats. This is a critical step to exploit electronic medical records (EMRs). Given the recent thriving NLP technologies, model implementation and performance seem no longer an obstacle, whereas the bottleneck locates on a high-quality annotated corpus and the whole engineering workflow. This study presents an engineering framework consisting of three tasks, i.e., medical entity recognition, relation extraction and attribute extraction. Within this framework, the whole workflow is demonstrated from EMR data collection through model performance evaluation. Our annotation scheme is designed to be comprehensive and compatible between the multiple tasks. With the EMRs from a general hospital in Ningbo, China, and the manual annotation by experienced physicians, our corpus is of large scale and high quality. Built upon this Chinese clinical corpus, the medical information extraction system show performance that approaches human annotation. The annotation scheme, (a subset of) the annotated corpus, and the code are all publicly released, to facilitate further research.

Full Text