Abstract Introduction Digital disease surveillance (DDS) detects public health events from internet-based data e.g., online news. Event features depicting epidemiological and social characteristics of health events can be extracted from news using the natural language process techniques. However, few studies have leveraged the event features to support anomaly detection in DDS. We aimed to understand the distribution of the event features and explore anomaly detection using the frequency of these features. Methods We collected event data from COVID-19-related news collected from October 1 to December 31, 2021, sourced from BioCaster, an infectious-disease-focused DDS system. The predefined event features in BioCaster include disease, pathogen, location and 14 binary features, such as if an event was caused by an unclassified virus. We described the distribution of the features and detected changes in the frequency of event features using a Bayesian online change point detection. We compared the change points with the number of new cases and of genomic samples collected. Results We included 170,168 news articles reporting COVID-19 in 155 countries. The event feature indicating that an event was caused by an unclassified virus was identified as positive among 3831 (2.25%) news and 12.91% of news had positive value for the feature indicating cases who had travelled across international borders. The change points detected from these two features were temporally correlated to the emergence of the Omicron variant in corresponding countries, which was more significant in countries with at least 300 news articles. Conversely, event features irrelevant to this case study, e.g., if the cases were military workers, were identified as negative in all news and no change points were detected. Conclusions Our study highlights the potential of monitoring the frequency of event features extracted from online news for anomaly detection in DDS, which relies on sufficient news coverage. Key messages • Monitoring the event features extracted from online news provide is useful approach for automatic anomaly detection in digital disease surveillance. • Increasing media coverage is fundamental for improving the early detection in a digital disease surveillance system.
Read full abstract