Abstract

Geocoding is an essential procedure in geographical information retrieval to associate place names with coordinates. Due to the inherent ambiguity of place names in natural language and the scarcity of place names in textual data, it is widely recognized that geocoding is challenging. Recent advances in deep learning have promoted the use of the neural network to improve the performance of geocoding. However, most of the existing approaches consider only the local context, e.g., neighboring words in a sentence, as opposed to the global context, e.g., the topic of the document. Lack of global information may have a severe impact on the robustness of the model. To fill the research gap, this paper proposes a novel global context embedding approach to generate linguistic and geospatial features through topic embedding and location embedding, respectively. A deep neural network called LGGeoCoder, which integrates local and global features, is developed to solve the geocoding as a classification problem. The experiments on a Wikipedia place name dataset demonstrate that LGGeoCoder achieves competitive performance compared with state-of-the-art models. Furthermore, the effect of introducing global linguistic and geospatial features in geocoding to alleviate the ambiguity and scarcity problem is discussed.

Highlights

  • Web and smartphone technologies have brought vast volumes of unstructured text information to the Web, which has gradually changed people’s needs for searching information, leading to changes in search services

  • This paper proposed a novel global context embedding approach, including topic embedding and location embedding, to introduce global information for linguistics and geospatial features

  • The location embedding uses the inherent spatial clustering or influence of place names to construct the rough boundary of the place name to enrich geospatial features

Read more

Summary

Introduction

Web and smartphone technologies have brought vast volumes of unstructured text information to the Web, which has gradually changed people’s needs for searching information, leading to changes in search services. Geoparsing is a procedure to detect the geographic information in texts and link with gazetteers, a database storing place names and their attributes, including coordinates, population, size, and type [4]. It is difficult to obtain the complete context of place names in the geocoding problem due to the lack of geographical location information in natural language. This paper proposes to use two global context embedding methods, including topic embedding and location embedding for linguistic and geospatial feature extraction, respectively. Traditional geocoding tasks ignore topic information and are limited to the syntax and semantics of text It employs location embedding from deep learning to transform spatial distribution around the place reference into low-dimensional vectors and enrich the geospatial features vector.

Related Work
Methodology
Preliminaries
Global Context Embedding for Linguistic Features and Geospatial Features
Word Embedding for Linguistic Features
Topic Embedding for Global Linguistic Features
Location Embedding for Geospatial Features
Training of Embedding Model
LGGeoCoder
Experimental Settings
Performance Comparison
Ablation Study
Findings
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call