Abstract

Geographic information extraction from textual data sources, called geoparsing, is a key task in text processing and central to subsequent spatial analysis approaches. Several geoparsers are available that support this task, each with its own (often limited or specialized) gazetteer and its own approaches to toponym detection and resolution. In this demonstration paper, we present HeidelPlace, an extensible framework in support of geoparsing. Key features of HeidelPlace include a generic gazetteer model that supports the integration of place information from different knowledge bases, and a pipeline approach that enables an effective combination of diverse modules tailored to specific geoparsing tasks. This makes HeidelPlace a valuable tool for testing and evaluating different gazetteer sources and geoparsing methods. In the demonstration, we show how to set up a geoparsing workflow with HeidelPlace and how it can be used to compare and consolidate the output of different geoparsing approaches.

Highlights

  • The ever-growing amount of available text data raises the need for automated Information Extraction (IE) to obtain structured information from text

  • Key features of HeidelPlace include a generic gazetteer model that supports the integration of place information from different knowledge bases, and a pipeline approach that enables an effective combination of diverse modules tailored to specific geoparsing tasks

  • We show how to set up a geoparsing workflow with HeidelPlace and how it can be used to compare and consolidate the output of different geoparsing approaches

Read more

Summary

Introduction

The ever-growing amount of available text data raises the need for automated Information Extraction (IE) to obtain structured information from text. Geoparsing describes the process of identifying place mentions in text (so called toponyms) and linking them to unambiguous spatial references. If the underlying gazetteer data model is not designed to be flexible, or if the geoparser is too tightly coupled with a specific gazetteer source, a reuse may not be possible. The framework strongly relies on GeoNames as gazetteer, which is often not an ideal choice Switching to another data source would entail a significant rewrite of the code base. While CLAVIN’s modular design allows to add new modules, its processing pipeline is too restrictive to enable complex geoparsing approaches that rely on information exchange between different modules. We present HeidelPlace, a geoparsing framework that includes a generic gazetteer model and an implementation of the entire geoparsing process. The open-source project HeidelPlace and the data used for this demonstration are available for download from the EventAE project website

Framework and System Overview
Demonstration Scenario
Conclusion and Ongoing Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.