Abstract

Purpose The emergence and re-emergence of disease outbreaks of international concern in the last several years has raised the importance of health surveillance systems that exploit the open media for their timely and precise detection of events. However, one of the key barriers faced by current event-based health surveillance systems is in identifying fine-grained terms for an outbreak's geographical location. In this article, we present a method to tackle this problem by associating each reported event with the most specific spatial information available in a news report. This would be useful not only for health surveillance systems, but also for other event-centered processing systems. Methods To develop an automated spatial attribute annotation system, we first created a gold standard corpus for training a machine learning model. Since the qualitative analysis on data suggested that the event class might have an impact on the spatial attribute annotation, we also developed an event classification system to incorporate event class information into the spatial attribute annotation model. To automatically recognize the spatial attribute of events, several approaches, ranging from a simple heuristic technique to a more sophisticated approach based on a state-of-the-art Conditional Random Fields (CRFs) model were explored. Different feature sets were incorporated into the model and compared. Results The evaluations were conducted on 100 outbreak news articles. Spatial attribute recognition performance was evaluated based on three metrics; precision, recall and the harmonic mean of precision and recall ( F-score). Among three strategies proposed in this article, the CRF model appeared to be the most promising for spatial attribute recognition with a best performance of 85.5% F-score (86.3% precision and 84.7% recall). Conclusion We presented a methodology for associating each event in media outbreak reports with their spatial attribute at the finest level of granularity. Our goal has been to provide a means for enhancing the spatial understanding of outbreak-related events. Evaluation studies showed promising results for automatic spatial attribute annotation. In the future, we plan to explore more features, such as semantic correlation between words, that maybe useful for the spatial attribute annotation task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call