Discovering health topics in social media using topic models.

Michael J Paul,Mark Dredze

doi:10.1371/journal.pone.0103408

Abstract

By aggregating self-reported health statuses across millions of users, we seek to characterize the variety of health information discussed in Twitter. We describe a topic modeling framework for discovering health topics in Twitter, a social media website. This is an exploratory approach with the goal of understanding what health topics are commonly discussed in social media. This paper describes in detail a statistical topic model created for this purpose, the Ailment Topic Aspect Model (ATAM), as well as our system for filtering general Twitter data based on health keywords and supervised classification. We show how ATAM and other topic models can automatically infer health topics in 144 million Twitter messages from 2011 to 2013. ATAM discovered 13 coherent clusters of Twitter messages, some of which correlate with seasonal influenza (r = 0.689) and allergies (r = 0.810) temporal surveillance data, as well as exercise (r = .534) and obesity (r = −.631) related geographic survey data in the United States. These results demonstrate that it is possible to automatically discover topics that attain statistically significant correlations with ground truth data, despite using minimal human supervision and no historical data to train the model, in contrast to prior work. Additionally, these results demonstrate that a single general-purpose model can identify many different health topics in social media.

Highlights

Several studies have utilized social media for tracking trends and analyzing real world events, including news events, [1] natural disasters, [2] user sentiment, [3] and political opinions. [4,5] Twitter is an especially compelling source of social media data, with over half a billion user-generated status messages (‘‘tweets’’) posted every day, often publicly and accessible with streaming tools. [6] By aggregating the words used by millions of people to express what they are doing and thinking, automated systems can approximately infer what is happening around the world
Our specific contributions are: (1) we describe a current end-to-end framework for data collection and analysis, which includes multiple data streams, keyword filters, and supervised classifiers for identifying relevant data; (2) we analyze a set of 144 million health-related tweets that we have been downloading continuously since August 2011; (3) we provide many previously unpublished details about the creation of our classifier for identifying health tweets and details of Ailment Topic Aspect Model (ATAM), our specialized health topic model, including procedures for large-scale inference; (4) we evaluate this framework and topic model quality by comparing temporal and geographic trends in the data with external data sources
These results show that topic models can discover a number of ailments that are significantly and often strongly correlated with ground truth surveillance and survey data

Summary

Introduction

Several studies have utilized social media for tracking trends and analyzing real world events, including news events, [1] natural disasters, [2] user sentiment, [3] and political opinions. [4,5] Twitter is an especially compelling source of social media data, with over half a billion user-generated status messages (‘‘tweets’’) posted every day, often publicly and accessible with streaming tools. [6] By aggregating the words used by millions of people to express what they are doing and thinking, automated systems can approximately infer what is happening around the world. Several studies have utilized social media for tracking trends and analyzing real world events, including news events, [1] natural disasters, [2] user sentiment, [3] and political opinions. Many researchers have tracked influenza in social media data, most commonly Twitter, using a variety of techniques such as linear regression, [8,9,10] supervised classification, [11,12] and social network analysis. We instead describe how to perform discovery of ailments and health topics We do this using topic models, which automatically infer interesting patterns in large text corpora. Discovery-driven approach can serve us a useful starting point for medical data mining of social media, by automatically identifying and characterizing the health topics that are prominently discussed on social media. Our list of discovered illnesses contains several that have previously been unexplored in Twitter, suggesting new areas for directed research, described in the Discussion section

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Aug 1, 2014
Citations: 250	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Discovering health topics in social media using topic models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Non-pharmaceutical Interventions and the Infodemic on Twitter: Lessons Learned from Italy during the Covid-19 Pandemic
Maurizio Massaro ... Francesca Dal Mas
Journal of Medical Systems | VOL. 45
Maurizio Massaro, et. al.Maurizio Massaro ... Francesca Dal Mas
06 Mar 2021
Journal of Medical Systems | VOL. 45

Detection and analysis of emergency topic in social media considering changing roles of stakeholders
Xiaoyue Ma ... Pengzhen Xue
Online Information Review | VOL. 47
Xiaoyue Ma, et. al.Xiaoyue Ma ... Pengzhen Xue
03 Jun 2022
Online Information Review | VOL. 47

Jointly Discovering Fine-grained and Coarse-grained Sentiments via Topic Modeling
Hanqi Wang ... Fei Wu
-
Hanqi Wang, et. al.Hanqi Wang ... Fei Wu
03 Nov 2014
03 Nov 2014

Mining and analysis of hot topics in Social Media
Min-Chuan Huang ... Shun-Bo Xiang
-
Min-Chuan Huang, et. al.Min-Chuan Huang ... Shun-Bo Xiang
01 May 2021
01 May 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discovering health topics in social media using topic models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE