Limitations of Transformers on Clinical Text Classification.

Shang Gao,Noah Schaefferkoetter,Hong Jun Yoon,John Gounley,Eric B Durbin,Jennifer Doherty,Georgia Tourassi,M Todd Young,Mohammed Alawad,Antoinette Stroup,Linda Coyle,Xiao-Cheng Wu

doi:10.1109/jbhi.2021.3062322

Abstract

Bidirectional Encoder Representations from Transformers (BERT) and BERT-based approaches are the current state-of-the-art in many natural language processing (NLP) tasks; however, their application to document classification on long clinical texts is limited. In this work, we introduce four methods to scale BERT, which by default can only handle input sequences up to approximately 400 words long, to perform document classification on clinical texts several thousand words long. We compare these methods against two much simpler architectures - a word-level convolutional neural network and a hierarchical self-attention network - and show that BERT often cannot beat these simpler baselines when classifying MIMIC-III discharge summaries and SEER cancer pathology reports. In our analysis, we show that two key components of BERT - pretraining and WordPiece tokenization - may actually be inhibiting BERT's performance on clinical text classification tasks where the input document is several thousand words long and where correctly identifying labels may depend more on identifying a few key words or phrases rather than understanding the contextual meaning of sequences of text.

Highlights

Document classification is an essential task in clinical natural language processing (NLP)
Our experiments show that Bidirectional Encoder Representations from Transformers (BERT) generally does not achieve the best performance on our clinical text classification tasks compared to the much simpler convolutional neural network (CNN) and hierarchical self-attention network (HiSAN) models
Using our fine-tuned BlueBERT model, we started from the attention weights from the very final layer that are associated with the [CLS] token and multiplied these attention weights through all 12 self-attention layers of the BERT model; these weights represent the most important subword tokens accounting for all the inter-word relationships captured during pretraining and fine-tuning

Summary

Introduction

Document classification is an essential task in clinical natural language processing (NLP). Labels are often available only at the document level rather than at the individual word level, such as when unstructured clinical notes are linked to structured data from electronic health records (EHRs), and document classification is an essential tool in practical automation of clinical workflows. In the clinical setting, human annotation of EHRs can be extremely time-consuming and expensive due to the technical nature of the content and the expert knowledge required to parse it; effective automated classification of clinical text such as cancer pathology reports and patient notes. To limit the vocabulary size and generalize better to new words outside the training vocabulary, BERT utilizes subword-level WordPiece tokens rather that word-level tokens as input

Methods

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Journal of Biomedical and Health Informatics	Publication Date: Feb 26, 2021
Citations: 87	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Limitations of Transformers on Clinical Text Classification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Journal of Biomedical and Health Informatics

Lead the way for us

Similar Papers

Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
Rajesh Gupta
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3
Rajesh GuptaRajesh Gupta
02 Mar 2024
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3

Bert model fine-tuning for text classification in knee OA radiology reports
L Chen ... V Pedoia
Osteoarthritis and Cartilage | VOL. 28
L Chen, et. al.L Chen ... V Pedoia
01 Apr 2020
Osteoarthritis and Cartilage | VOL. 28

T-BERT:臺灣語言模型–以臺灣在地語言預訓練BERT模型

-

01 Jan 2020
01 Jan 2020

Cross2Self-attentive Bidirectional Recurrent Neural Network with BERT for Biomedical Semantic Text Similarity
Zhengguang Li ... Hongfei Lin
-
Zhengguang Li, et. al.Zhengguang Li ... Hongfei Lin
16 Dec 2020
16 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Limitations of Transformers on Clinical Text Classification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Journal of Biomedical and Health Informatics