Abstract

BackgroundTimely understanding of public perceptions allows public health agencies to provide up-to-date responses to health crises such as infectious diseases outbreaks. Social media such as Twitter provide an unprecedented way for the prompt assessment of the large-scale public response.ObjectiveThe aims of this study were to develop a scheme for a comprehensive public perception analysis of a measles outbreak based on Twitter data and demonstrate the superiority of the convolutional neural network (CNN) models (compared with conventional machine learning methods) on measles outbreak-related tweets classification tasks with a relatively small and highly unbalanced gold standard training set.MethodsWe first designed a comprehensive scheme for the analysis of public perception of measles based on tweets, including 3 dimensions: discussion themes, emotions expressed, and attitude toward vaccination. All 1,154,156 tweets containing the word “measles” posted between December 1, 2014, and April 30, 2015, were purchased and downloaded from DiscoverText.com. Two expert annotators curated a gold standard of 1151 tweets (approximately 0.1% of all tweets) based on the 3-dimensional scheme. Next, a tweet classification system based on the CNN framework was developed. We compared the performance of the CNN models to those of 4 conventional machine learning models and another neural network model. We also compared the impact of different word embeddings configurations for the CNN models: (1) Stanford GloVe embedding trained on billions of tweets in the general domain, (2) measles-specific embedding trained on our 1 million measles related tweets, and (3) a combination of the 2 embeddings.ResultsCohen kappa intercoder reliability values for the annotation were: 0.78, 0.72, and 0.80 on the 3 dimensions, respectively. Class distributions within the gold standard were highly unbalanced for all dimensions. The CNN models performed better on all classification tasks than k-nearest neighbors, naïve Bayes, support vector machines, or random forest. Detailed comparison between support vector machines and the CNN models showed that the major contributor to the overall superiority of the CNN models is the improvement on recall, especially for classes with low occurrence. The CNN model with the 2 embedding combination led to better performance on discussion themes and emotions expressed (microaveraging F1 scores of 0.7811 and 0.8592, respectively), while the CNN model with Stanford embedding achieved best performance on attitude toward vaccination (microaveraging F1 score of 0.8642).ConclusionsThe proposed scheme can successfully classify the public’s opinions and emotions in multiple dimensions, which would facilitate the timely understanding of public perceptions during the outbreak of an infectious disease. Compared with conventional machine learning methods, our CNN models showed superiority on measles-related tweet classification tasks with a relatively small and highly unbalanced gold standard. With the success of these tasks, our proposed scheme and CNN-based tweets classification system is expected to be useful for the analysis of tweets about other infectious diseases such as influenza and Ebola.

Highlights

  • 40 million cases of measles, caused by a highly contagious virus, lead to over 300,000 deaths worldwide every year [1]

  • The convolutional neural network (CNN) model with the combination of 2 embeddings achieved the best performance on emotions expressed and the highest macroaveraging F score on discussion themes

  • Discussion themes Emotions expressed Attitude toward Discussion themes Emotions expressed Attitude toward vaccination vaccination aKNN: k-nearest neighbor. bSVM: support vector machines. cBi-LSTM: bidirectional long short-term memory. dCNN_M: convolutional neural network using the measles tweets embedding. eCNN_S: convolutional neural network using the pretrained GloVe tweets embedding from Stanford. fCNN_M+S: convolutional neural network using the combination of pretrained GloVe tweets embedding and measles tweets embedding

Read more

Summary

Introduction

40 million cases of measles, caused by a highly contagious virus, lead to over 300,000 deaths worldwide every year [1]. Prompt understanding of the public’s perceptions will allow public health agencies to respond to people’s attitudes, emotions, and needs in real time instead of relying on a predetermined timeline based on stages. Using traditional methods such as surveys to study public perceptions during an infectious disease outbreak is both costly and time-consuming [4,6]. Understanding of public perceptions allows public health agencies to provide up-to-date responses to health crises such as infectious diseases outbreaks Social media such as Twitter provide an unprecedented way for the prompt assessment of the large-scale public response

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call