Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study.

Shyam Visweswaran,Patrick O'Halloran,Kar-Hai Chu,Jason B Colditz,Jaime E Sidani,Joel Welling,Na-Rae Han,Brian A Primack,Sanya B Taneja

doi:10.2196/17478

Abstract

BackgroundTwitter presents a valuable and relevant social media platform to study the prevalence of information and sentiment on vaping that may be useful for public health surveillance. Machine learning classifiers that identify vaping-relevant tweets and characterize sentiments in them can underpin a Twitter-based vaping surveillance system. Compared with traditional machine learning classifiers that are reliant on annotations that are expensive to obtain, deep learning classifiers offer the advantage of requiring fewer annotated tweets by leveraging the large numbers of readily available unannotated tweets.ObjectiveThis study aims to derive and evaluate traditional and deep learning classifiers that can identify tweets relevant to vaping, tweets of a commercial nature, and tweets with provape sentiments.MethodsWe continuously collected tweets that matched vaping-related keywords over 2 months from August 2018 to October 2018. From this data set of tweets, a set of 4000 tweets was selected, and each tweet was manually annotated for relevance (vape relevant or not), commercial nature (commercial or not), and sentiment (provape or not). Using the annotated data, we derived traditional classifiers that included logistic regression, random forest, linear support vector machine, and multinomial naive Bayes. In addition, using the annotated data set and a larger unannotated data set of tweets, we derived deep learning classifiers that included a convolutional neural network (CNN), long short-term memory (LSTM) network, LSTM-CNN network, and bidirectional LSTM (BiLSTM) network. The unannotated tweet data were used to derive word vectors that deep learning classifiers can leverage to improve performance.ResultsLSTM-CNN performed the best with the highest area under the receiver operating characteristic curve (AUC) of 0.96 (95% CI 0.93-0.98) for relevance, all deep learning classifiers including LSTM-CNN performed better than the traditional classifiers with an AUC of 0.99 (95% CI 0.98-0.99) for distinguishing commercial from noncommercial tweets, and BiLSTM performed the best with an AUC of 0.83 (95% CI 0.78-0.89) for provape sentiment. Overall, LSTM-CNN performed the best across all 3 classification tasks.ConclusionsWe derived and evaluated traditional machine learning and deep learning classifiers to identify vaping-related relevant, commercial, and provape tweets. Overall, deep learning classifiers such as LSTM-CNN had superior performance and had the added advantage of requiring no preprocessing. The performance of these classifiers supports the development of a vaping surveillance system.

Highlights

BackgroundMachine learning methods provide a valuable framework for systematic and automated processing and analysis of data on social media platforms such as Twitter for developing surveillance systems with application to public health
Using the annotated data set and a larger unannotated data set of tweets, we derived deep learning classifiers that included a convolutional neural network (CNN), long short-term memory (LSTM) network, LSTM-CNN network, and bidirectional LSTM (BiLSTM) network
We derived and evaluated traditional machine learning and deep learning classifiers to identify vaping-related relevant, commercial, and provape tweets. Deep learning classifiers such as LSTM-CNN had superior performance and had the added advantage of requiring no preprocessing. The performance of these classifiers supports the development of a vaping surveillance system

Summary

Introduction

BackgroundMachine learning methods provide a valuable framework for systematic and automated processing and analysis of data on social media platforms such as Twitter for developing surveillance systems with application to public health. The continuous generation of an enormous amount of content by a vast number of users allows for efficient real-time monitoring of sources of information and user sentiment if it can be automated. Such monitoring can lead to the discovery of emergent patterns of information flow and changes in sentiments that may occur in response to public health and policy interventions. We derived and evaluated traditional machine learning and deep learning classifiers that can be used to build a Twitter-based surveillance system to identify and monitor vaping-related content and sentiments. Twitter presents a valuable and relevant social media platform to study the prevalence of information and sentiment on vaping that may be useful for public health surveillance. Compared with traditional machine learning classifiers that are reliant on annotations that are expensive to obtain, deep learning classifiers offer the advantage of requiring fewer annotated tweets by leveraging the large numbers of readily available unannotated tweets

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Medical Internet Research	Publication Date: Aug 12, 2020
Citations: 26	License type: cc-by

R Discovery Prime

R Discovery Prime

Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Medical Internet Research

Lead the way for us

Similar Papers

Early Stage Prediction of Autism Spectrum Disorder: Analyzing Different Hyperparameter Tuned Machine Learning Classifier
Md Fakrul Taraque ... Nahrin Jannat
-
Md Fakrul Taraque, et. al.Md Fakrul Taraque ... Nahrin Jannat
29 Dec 2022
29 Dec 2022

Deep Learning-Based Human Activity Real-Time Recognition for Pedestrian Navigation
Junhua Ye ... Wu Chen
Sensors | VOL. 20
Junhua Ye, et. al.Junhua Ye ... Wu Chen
30 Apr 2020
Sensors | VOL. 20

Comprehensive Review of Feature Extraction Techniques for sEMG Signal Classification: From Handcrafted Features to Deep Learning Approaches
Sidi Mohamed Sid'El Moctar ... Sofiane Boudaoud
IRBM | VOL. -
Sidi Mohamed Sid'El Moctar, et. al.Sidi Mohamed Sid'El Moctar ... Sofiane Boudaoud
01 Nov 2024
IRBM | VOL. -

Relevance of deep sequence models for recognising automated construction activities: a case study on a low-rise construction system
Aparna Harichandran ... Abhijit Mukherjee
Journal of Information Technology in Construction | VOL. 28
Aparna Harichandran, et. al.Aparna Harichandran ... Abhijit Mukherjee
25 Aug 2023
Journal of Information Technology in Construction | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Medical Internet Research