Relation Classification for Bleeding Events From Electronic Health Records Using Deep Learning Systems: An Empirical Study

Bhanu Pratap Singh Rawat,David D Mcmanus,Hong Yu,Avijit Mitra

doi:10.2196/27527

Abstract

BackgroundAccurate detection of bleeding events from electronic health records (EHRs) is crucial for identifying and characterizing different common and serious medical problems. To extract such information from EHRs, it is essential to identify the relations between bleeding events and related clinical entities (eg, bleeding anatomic sites and lab tests). With the advent of natural language processing (NLP) and deep learning (DL)-based techniques, many studies have focused on their applicability for various clinical applications. However, no prior work has utilized DL to extract relations between bleeding events and relevant entities.ObjectiveIn this study, we aimed to evaluate multiple DL systems on a novel EHR data set for bleeding event–related relation classification.MethodsWe first expert annotated a new data set of 1046 deidentified EHR notes for bleeding events and their attributes. On this data set, we evaluated three state-of-the-art DL architectures for the bleeding event relation classification task, namely, convolutional neural network (CNN), attention-guided graph convolutional network (AGGCN), and Bidirectional Encoder Representations from Transformers (BERT). We used three BERT-based models, namely, BERT pretrained on biomedical data (BioBERT), BioBERT pretrained on clinical text (Bio+Clinical BERT), and BioBERT pretrained on EHR notes (EhrBERT).ResultsOur experiments showed that the BERT-based models significantly outperformed the CNN and AGGCN models. Specifically, BioBERT achieved a macro F1 score of 0.842, outperforming both the AGGCN (macro F1 score, 0.828) and CNN models (macro F1 score, 0.763) by 1.4% (P<.001) and 7.9% (P<.001), respectively.ConclusionsIn this comprehensive study, we explored and compared different DL systems to classify relations between bleeding events and other medical concepts. On our corpus, BERT-based models outperformed other DL models for identifying the relations of bleeding-related entities. In addition to pretrained contextualized word representation, BERT-based models benefited from the use of target entity representation over traditional sequence representation

Highlights

BackgroundBleeding refers to the escape of blood from the circulatory system either internally or externally
The contributions of this work can be summarized as follows: (1) this is the first study to identify the relations between bleeding events and other relevant medical concepts; (2) we provide comparative analyses of three different deep learning (DL) architectures for the relation classification task on a new electronic health record EhrBERT (EHR) data set; and (3) we explored the effects of additional domain knowledge on the attention-guided graph convolutional network (AGGCN) model, as well as how entity position representations influence Bidirectional Encoder Representations from Transformers (BERT) models’ predictions
We used three variants of BERT (BERT pretrained on biomedical data [BioBERT] [46], BioBERT pretrained on clinical text [Bio+Clinical BERT] [47], and BioBERT pretrained on EHR notes [EhrBERT] [45]), all of which have been shown to improve clinical natural language processing (NLP) applications

Summary

Introduction

BackgroundBleeding refers to the escape of blood from the circulatory system either internally or externally. Accurate detection of bleeding events from electronic health records (EHRs) is crucial for identifying and characterizing different common and serious medical problems. To extract such information from EHRs, it is essential to identify the relations between bleeding events and related clinical entities (eg, bleeding anatomic sites and lab tests). Methods: We first expert annotated a new data set of 1046 deidentified EHR notes for bleeding events and their attributes On this data set, we evaluated three state-of-the-art DL architectures for the bleeding event relation classification task, namely, convolutional neural network (CNN), attention-guided graph convolutional network (AGGCN), and Bidirectional Encoder Representations from Transformers (BERT).

Methods

Results

Discussion

Conclusion