Abstract

In retrospective assessments, internet news reports have been shown to capture early reports of unknown infectious disease transmission prior to official laboratory confirmation. In general, media interest and reporting peaks and wanes during the course of an outbreak. In this study, we quantify the extent to which media interest during infectious disease outbreaks is indicative of trends of reported incidence. We introduce an approach that uses supervised temporal topic models to transform large corpora of news articles into temporal topic trends. The key advantages of this approach include: applicability to a wide range of diseases and ability to capture disease dynamics, including seasonality, abrupt peaks and troughs. We evaluated the method using data from multiple infectious disease outbreaks reported in the United States of America (U.S.), China, and India. We demonstrate that temporal topic trends extracted from disease-related news reports successfully capture the dynamics of multiple outbreaks such as whooping cough in U.S. (2012), dengue outbreaks in India (2013) and China (2014). Our observations also suggest that, when news coverage is uniform, efficient modeling of temporal topic trends using time-series regression techniques can estimate disease case counts with increased precision before official reports by health organizations.

Highlights

  • Infectious diseases are a threat to public health and economic stability of many countries

  • We analyzed whether the temporal topic trends (ξ ) extracted by the supervised topic model are able to capture disease dynamics - including seasonality, abrupt peaks and troughs

  • To understand how the supervised topic model discovers words from the HealthMap corpus related to these input seed words, we show some of the regular words having higher probabilities in the regular topic distribution φzr

Read more

Summary

Introduction

Infectious diseases are a threat to public health and economic stability of many countries. Open source indicators (e.g., news articles1,2 , blogs[3 ], search engine query volume4–7 , social media chatter[8,9,10,11] and other sources12) are an attractive option for monitoring infectious disease progression, primarily due to their sheer volume and capacity to capture early signals of disease outbreaks, and in some cases, trends in population health-seeking behavior. Traditional surveillance systems are not always effective at real-time monitoring of emerging public health threats. Informal digital sources, such as news media, blogs, and micro-blogging sites (Twitter) are typically available in (near) real-time. Proper mining of signals from these digital sources can effectively help in minimizing the time lag between an outbreak start and formal recognition of an outbreak, allowing for an accelerated response to public health threats. The gains in supplementing traditional surveillance using digital sources have been discussed in Nsoesie et al.[15 ], Salatheet al.[16,17] and Hartley et al.[18]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call