An ensemble of neural models for nested adverse drug events and medication extraction with subwords.

Meizhi Ju,Makoto Miwa,Sophia Ananiadou,Nhung T H Nguyen

doi:10.1093/jamia/ocz075

Abstract

ObjectiveThis article describes an ensembling system to automatically extract adverse drug events and drug related entities from clinical narratives, which was developed for the 2018 n2c2 Shared Task Track 2.Materials and MethodsWe designed a neural model to tackle both nested (entities embedded in other entities) and polysemous entities (entities annotated with multiple semantic types) based on MIMIC III discharge summaries. To better represent rare and unknown words in entities, we further tokenized the MIMIC III data set by splitting the words into finer-grained subwords. We finally combined all the models to boost the performance. Additionally, we implemented a featured-based conditional random field model and created an ensemble to combine its predictions with those of the neural model.ResultsOur method achieved 92.78% lenient micro F1-score, with 95.99% lenient precision, and 89.79% lenient recall, respectively. Experimental results showed that combining the predictions of either multiple models, or of a single model with different settings can improve performance.DiscussionAnalysis of the development set showed that our neural models can detect more informative text regions than feature-based conditional random field models. Furthermore, most entity types significantly benefit from subword representation, which also allows us to extract sparse entities, especially nested entities.ConclusionThe overall results have demonstrated that the ensemble method can accurately recognize entities, including nested and polysemous entities. Additionally, our method can recognize sparse entities by reconsidering the clinical narratives at a finer-grained subword level, rather than at the word level.

Highlights

Electronic health records (EHRs)—a digital version of a patient’s information and medical history—are an important source of health data that can impact on a patient’s care
The World Health Organization defines an adverse drug event (ADE) as “an injury resulting from medical intervention related to a drug.”[4]. This work focuses on extracting such ADE mentions and their related medications from EHRs
Our conditional random fields (CRF) model achieved the best lenient F-score when using only cluster features and the highest recall when further combined with df

Summary

Introduction

Electronic health records (EHRs)—a digital version of a patient’s information and medical history—are an important source of health data that can impact on a patient’s care. Mining such data would help improve the understanding of treatment and diagnosis of disease.[1,2]. Among the many known application areas of EHR mining,[2,3] adverse drug event detection has been proven to improve and complement drug safety surveillance strategies. The World Health Organization defines an adverse drug event (ADE) as “an injury resulting from medical intervention related to a drug.”[4] This work focuses on extracting such ADE mentions and their related medications from EHRs. We base our analysis on data.

Methods

Results

Discussion

Conclusion