Mining Adverse Drug Reactions from Unstructured Mediums at Scale

Hasham Ul Haq,Veysel Kocaman,David Talby

doi:10.1007/978-3-031-14771-5_26

Abstract

AbstractAdverse drug reactions/events (ADR/ADE) have a major impact on patient health and health care costs. While most ADR’s are not reported via formal channels, they are often documented in a variety of unstructured conversations such as social media posts or customer support call transcripts. In this paper, we propose a natural language processing (NLP) solution that detects ADR’s in such unstructured free-text conversations, which improves on previous work in three ways. First, a new Named Entity Recognition (NER) model obtains state-of-the-art accuracy for ADR and Drug entity extraction on the ADE, CADEC, and SMM4H benchmark datasets (91.75, 78.76, and 83.41% F1 scores respectively). Second, two new Relation Extraction (RE) models are introduced—one based on BioBERT while the other utilizing crafted features over a Fully Connected Neural Network (FCNN)—perform on par with existing state-of-the-art models, and outperform them when trained with a supplementary clinician-annotated RE dataset. Third, a new text classification model, obtains new state-of-the-art accuracy on the CADEC dataset (86.69% F1 score). The complete solution is implemented as a unified NLP pipeline in a production-grade library built on top of Apache Spark, making it natively scalable for processing millions of records on commodity clusters.KeywordsNLPNERRelation ExtractionPharmacovigilanceSparknlp

Full Text