Abstract

With many adults using social media to discuss health information, researchers have begun to dive into this resource to monitor or detect health conditions at the population level. Twitter, in particular, has grown to several hundred million users and can attend rich source of information for detecting serious medical conditions, such as adverse drug reactions (ADRs). However, Twitter too presents unique challenges due to brevity, lack of structure, and informal language. We crawled data from Twitter presenting 10,822 freely available tweets, which can be used to train automated tools to mine Twitter for ADR. We collect tweets using drug names as keywords, but expanding it by applying the Natural Language Processing (NLP) algorithm to produce misspelled versions of drug names for and drug interactions. We annotate each tweet for the presence of mentioning interactions, and for those who have, mention annotations. Agreement between our annotators for binary classification. We evaluate the usefulness of the dataset with machine learning algorithm training classes: using C.45..

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call