Event Geoparser with Pseudo-Location Entity Identification and Numerical Argument Extraction Implementation and Evaluation in Indonesian News Domain

Agung Dewandaru,Saiful Akbar,Dwi Hendratmo Widyantoro

doi:10.3390/ijgi9120712

Agung Dewandaru, Saiful Akbar + Show 1 more

Open Access

https://doi.org/10.3390/ijgi9120712

Copy DOI

Abstract

Geoparser is a fundamental component of a Geographic Information Retrieval (GIR) geoparser, which performs toponym recognition, disambiguation, and geographic coordinate resolution from unstructured text domain. However, geoparsing of news articles which report several events across many place-mentions in the document are not yet adequately handled by regular geoparser, where the scope of resolution is either toponym-level or document-level. The capacity to detect multiple events and geolocate their true coordinates along with their numerical arguments is still missing from modern geoparsers, much less in Indonesian news corpora domain. We propose an event geoparser model with three stages of processing, which tightly integrates event extraction model into geoparsing and provides precise event-level resolution scope. The model casts the geotagging and event extraction as sequence labeling and uses LSTM-CRF inferencer equipped with features derived using Aggregated Topic Model from a large corpus to increase the generalizability. Throughout the proposed workflow and features, the geoparser is able to significantly improve the identification of pseudo-location entities, resulting in a 23.43% increase for weighted F1 score compared to baseline gazetteer and POS Tag features. As a side effect of event extraction, various numerical arguments are also extracted, and the output is easily projected to a rich choropleth map from a single news document.

Highlights

The exponential rate of information shared through the world wide web provides ample opportunities to automate the understanding and extraction of information from the huge unstructured text collection
The recent works on geoparsers are more equipped with natural language processing and machine learning techniques to better cope with the sheer size of unstructured text data
Even in the modern geoparsers landscape, little has been studied on integration of geoparsing with event extraction framework for the event geolocation needs, especially in dealing with the resolution on the event-level scope where existing geoparsers are only

Summary

Introduction

The exponential rate of information shared through the world wide web provides ample opportunities to automate the understanding and extraction of information from the huge unstructured text collection. One estimate stated at least 20 percent of Web pages include recognizable geographic identifiers [1] that are mainly present in unstructured form It explains the development of numerous types of Geographical Information Retrieval (GIR) models, method, and prototypes with the aim of extracting, retrieving, and exploiting location and geospatial information within these unstructured textual data, such as online news articles [2], tweets [3], social media posts, or even blogs. These systems allow improvement to useful types of applications ranging from analytics [4], health [5], retrieval [6], categorization, and many others by leveraging the geospatial data that is prevalent in the internet. The result will be further processed by GIR application to infer associations between varied information that is described in the document with the geographical coordinate of the resolved toponyms, which will be served or ranked across documents according to the geo-query input typically in some forms of thematic map

Results

Discussion

Conclusion