Does the magic of BERT apply to medical code assignment? A quantitative study

Shaoxiong Ji,Matti Hölttä,Pekka Marttinen

doi:10.1016/j.compbiomed.2021.104998

Shaoxiong Ji, Matti Hölttä + Show 1 more

Open Access

https://doi.org/10.1016/j.compbiomed.2021.104998

Copy DOI

Journal: Computers in Biology and Medicine	Publication Date: Oct 30, 2021
Citations: 37	License type: cc-by

Affiliation: Aalto University

Abstract

Unsupervised pretraining is an integral part of many natural language processing systems, and transfer learning with language models has achieved remarkable results in downstream tasks. In the clinical application of medical code assignment, diagnosis and procedure codes are inferred from lengthy clinical notes such as hospital discharge summaries. However, it is not clear if pretrained models are useful for medical code prediction without further architecture engineering. This paper conducts a comprehensive quantitative analysis of various contextualized language models' performances, pretrained in different domains, for medical code assignment from clinical notes. We propose a hierarchical fine-tuning architecture to capture interactions between distant words and adopt label-wise attention to exploit label information. Contrary to current trends, we demonstrate that a carefully trained classical CNN outperforms attention-based models on a MIMIC-III subset with frequent codes. Our empirical findings suggest directions for building robust medical code assignment models.

Highlights

Clinical notes generated by healthcare professionals are parts of electronic health records and provide an essential source for intelligent healthcare applications (Zhang et al, 2020)
We evaluate the metrics of precision at k, where k = 5 for MIMIC-III subset with top-50 frequent codes and k = 8, 15 for full sets of MIMIC-III, given the observation that most medical documents are assigned no more than 20 codes
This paper presented a comprehensive quantitive analysis of medical code assignment from clinical notes using various pretrained models with BERT

Summary

Introduction

Clinical notes generated by healthcare professionals are parts of electronic health records and provide an essential source for intelligent healthcare applications (Zhang et al, 2020). Practical medical code assignment requires to capture semantic concepts (Falis et al, 2019) and tackle the challenges of lengthy note encoding and large-dimensional code schemes. Pretrained language models (PTM) such as BERT (Devlin et al, 2019) learn contextualized text representation and have started a new era in NLP. NLP applications benefit from large-scale pretraining on massive corpora, and universal language representations from PTMs have been successfully utilized in downstream tasks via transfer learning. In the field of clinical NLP, incorporating pretrained contextualized language models to encode lengthy clinical notes for large-scale medical code prediction has not been well-studied. Li and Yu (2020), ? and Dong et al (2020) performed preliminary experiments with pretrained models; these three pilot studies failed to achieve satisfactory results or provide in-depth analysis

Methods

Results

Discussion

Conclusion