Language inference-based learning for Low-Resource Chinese clinical named entity recognition using language model

Zhaojian Cui,Kai Yu,Zhenming Yuan,Xiaofeng Dong,Weibin Luo

doi:10.1016/j.jbi.2023.104559

Abstract

Electronic health records (EHRs) have been widely used and are gradually replacing paper records. Therefore, extracting valuable information from EHRs has become the focus and hotspot of current research. Clinical named entity recognition (CNER) is an important task in information extraction. Most current research methods used standard supervised learning approaches to fine-tune pre-trained language models (PLMs), which require a large amount of annotated data for model training. However, in realistic medical scenarios, annotated data are scarce, especially in the healthcare field. The process of annotating data in real clinical settings is time-consuming and labour-intensive. In this paper, a language inference-based learning method (LANGIL) is proposed to study clinical NER tasks with limited annotated samples, i.e., in low-resource clinical scenarios. A method based on prompt learning is designed to reformulate the entity recognition task into a language inference-based task. Differing from the standard fine-tuning method, the approach introduced in this paper does not design the additional network layers that train from scratch. This alleviates the gap between pre-training tasks and downstream tasks, allowing the comprehension capabilities of PLMs to be leveraged under the condition of limited training samples. The experiments on four Chinese clinical named entity recognition datasets showed that LANGIL achieves significant improvements in F1-score compared to the former method.

Full Text