Abstract

BackgroundPrevious studies have suggested that epidemiological reasoning needs a fine-grained modelling of events, especially their spatial and temporal attributes. While the temporal analysis of events has been intensively studied, far less attention has been paid to their spatial analysis. This article aims at filling the gap concerning automatic event-spatial attribute analysis in order to support health surveillance and epidemiological reasoning.ResultsIn this work, we propose a methodology that provides a detailed analysis on each event reported in news articles to recover the most specific locations where it occurs. Various features for recognizing spatial attributes of the events were studied and incorporated into the models which were trained by several machine learning techniques. The best performance for spatial attribute recognition is very promising; 85.9% F-score (86.75% precision/85.1% recall).ConclusionsWe extended our work on event-spatial attribute recognition by focusing on machine learning techniques, which are CRF, SVM, and Decision tree. Our approach avoided the costly development of an external knowledge base by employing the feature sources that can be acquired locally from the analyzed document. The results showed that the CRF model performed the best. Our study indicated that the nearest location and previous event location are the most important features for the CRF and SVM model, while the location extracted from the verb's subject is the most important to the Decision tree model.

Highlights

  • Previous studies have suggested that epidemiological reasoning needs a fine-grained modelling of events, especially their spatial and temporal attributes

  • For the spatial attribute recognition of reporting events, the results show that the performance was reduced the most when the spatial information of the subject and subject co-reference was removed from the model (F = 81.4% for Conditional Random Fields (CRF); F = 78.9% for Support Vector Machine (SVM), and F = 78.7% for Decision tree)

  • We found out that the main causes of the errors can be grouped into 5 cases, which are: Case 1: Incorrect event class prediction Most errors in this group occurred when an event in the normal class was classified by the model as an information class or vice versa

Read more

Summary

Introduction

Previous studies have suggested that epidemiological reasoning needs a fine-grained modelling of events, especially their spatial and temporal attributes. For automatic encoding, systems tend to adopt ad-hoc strategies, generally in the form of detecting the first disease and location pair that match predefined criteria or similar heuristics These strategies are effective for reducing the computational time and pruning out locations that are irrelevant to the outbreaks, they may lead to under-reporting of the outbreak or issuing reports at sub-optimal levels of granularity. They cause difficulties when extending the functionality of the health surveillance system to abstract the spatial attributes of every event reported in text, which is reported to be necessary for epidemiologic reasoning [8]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call