Digital Epidemiology of Prescription Drug References on X (Formerly Twitter): Neural Network Topic Modeling and Sentiment Analysis.

Varun K Rao,Danny Valdez,Rasika Muralidharan,Jon Agley,Kate S Eddens,Aravind Dendukuri,Vandana Panth,Maria A Parker

doi:10.2196/57885

Varun K Rao, Danny Valdez + Show 6 more

Open Access

https://doi.org/10.2196/57885

Copy DOI

Export

Save

Cite

Journal: Journal of medical Internet research	Publication Date: Aug 23, 2024
License type: cc-by

Abstract
Full-Text
Similar Papers

Abstract

Listen

Data from the social media platform X (formerly Twitter) can provide insights into the types of language that are used when discussing drug use. In past research using latent Dirichlet allocation (LDA), we found that tweets containing "street names" of prescription drugs were difficult to classify due to the similarity to other colloquialisms and lack of clarity over how the terms were used. Conversely, "brand name" references were more amenable to machine-driven categorization. This study sought to use next-generation techniques (beyond LDA) from natural language processing to reprocess X data and automatically cluster groups of tweets into topics to differentiate between street- and brand-name data sets. We also aimed to analyze the differences in emotional valence between the 2 data sets to study the relationship between engagement on social media and sentiment. We used the Twitter application programming interface to collect tweets that contained the street and brand name of a prescription drug within the tweet. Using BERTopic in combination with Uniform Manifold Approximation and Projection and k-means, we generated topics for the street-name corpus (n=170,618) and brand-name corpus (n=245,145). Valence Aware Dictionary and Sentiment Reasoner (VADER) scores were used to classify whether tweets within the topics had positive, negative, or neutral sentiments. Two different logistic regression classifiers were used to predict the sentiment label within each corpus. The first model used a tweet's engagement metrics and topic ID to predict the label, while the second model used those features in addition to the top 5000 tweets with the largest term-frequency-inverse document frequency score. Using BERTopic, we identified 40 topics for the street-name data set and 5 topics for the brand-name data set, which we generalized into 8 and 5 topics of discussion, respectively. Four of the general themes of discussion in the brand-name corpus referenced drug use, while 2 themes of discussion in the street-name corpus referenced drug use. From the VADER scores, we found that both corpora were inclined toward positive sentiment. Adding the vectorized tweet text increased the accuracy of our models by around 40% compared with the models that did not incorporate the tweet text in both corpora. BERTopic was able to classify tweets well. As with LDA, the discussion using brand names was more similar between tweets than the discussion using street names. VADER scores could only be logically applied to the brand-name corpus because of the high prevalence of non-drug-related topics in the street-name data. Brand-name tweets either discussed drugs positively or negatively, with few posts having a neutral emotionality. From our machine learning models, engagement alone was not enough to predict the sentiment label; the added context from the tweets was needed to understand the emotionality of a tweet.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Digital Epidemiology of Prescription Drug References on X (Formerly Twitter): Neural Network Topic Modeling and Sentiment Analysis.

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of medical Internet research

Lead the way for us

Similar Papers

Results and Methodological Implications of the Digital Epidemiology of Prescription Drug References Among Twitter Users: Latent Dirichlet Allocation (LDA) Analyses.
Maria A Parker ... Danny Valdez
Journal of Medical Internet Research | VOL. 25
Maria A Parker, et. al.Maria A Parker ... Danny Valdez
28 Jul 2023
Journal of Medical Internet Research | VOL. 25

Sentiment Analysis Technique and Neutrosophic Set Theory for Mining and Ranking Big Data From Online Reviews
Ibrahim Awajan ... Ashraf Al-Quran
IEEE Access | VOL. 9
Ibrahim Awajan, et. al.Ibrahim Awajan ... Ashraf Al-Quran
01 Jan 2020
IEEE Access | VOL. 9

Automatic Annotation Performance of TextBlob and VADER on Covid Vaccination Dataset
Badriya Murdhi Alenzi ... Abdul Khader Jilani Saudagar
Intelligent Automation & Soft Computing | VOL. 34
Badriya Murdhi Alenzi, et. al.Badriya Murdhi Alenzi ... Abdul Khader Jilani Saudagar
01 Jan 2021
Intelligent Automation & Soft Computing | VOL. 34

Using Machine Learning to Establish the Concerns of Persons With HIV/AIDS During the COVID-19 Pandemic From Their Tweets
Richard K Lomotey ... Rita Orji
IEEE Access | VOL. 11
Richard K Lomotey, et. al.Richard K Lomotey ... Rita Orji
01 Jan 2023
IEEE Access | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Digital Epidemiology of Prescription Drug References on X (Formerly Twitter): Neural Network Topic Modeling and Sentiment Analysis.

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of medical Internet research