Detecting Adverse Drug Reactions on Twitter with Convolutional Neural Networks and Word Embedding Features.

Aaron J. Masino,Daniel Forsyth,Alexander G. Fiks

doi:10.1007/s41666-018-0018-9

Aaron J. Masino, Daniel Forsyth + Show 1 more

Open Access

https://doi.org/10.1007/s41666-018-0018-9

Copy DOI

Abstract

Motivated by limitations of adverse drug reaction (ADR) detection in clinical trials and passive post-market drug safety surveillance systems, a number of researchers have examined social media data as a potential ADR information source. Twitter is a particularly attractive platform because it has a large, diverse user community. Two challenges faced in applying Twitter data are that ADR descriptions are infrequent relative to the overall number of user posts and human review of all posts is impractical. To address these challenges, we framed the ADR detection problem as a binary classification task, where our objective was to develop a computational method that can classify user posts, known as tweets, relative to the presence of an ADR description. We developed a convolutional neural network model (ConvNet) that processes tweets as represented by word vectors created using unsupervised learning on large datasets. The ConvNet model achieved an F1-score of 0.46 and sensitivity of 0.78 for tweet ADR classification on the test dataset, compared to 0.37 F1-score and 0.33 sensitivity obtained by two baseline support vector machine (SVM) models that incorporated word embedding, n-gram, and lexicon features. We attribute the superior ConvNet model performance to its ability to process arbitrary length inputs, which allows it to evaluate every word embedding in a given tweet and make better use of their semantic content as compared to the SVM models which require a fixed length, aggregated embedding input. The results presented demonstrate the feasibility of detection of infrequent ADR mentions in large-scale media data.

Full Text