Abstract
Geocoding aims to assign unambiguous locations (i.e., geographic coordinates) to place names (i.e., toponyms) referenced within documents (e.g., within spreadsheet tables or textual paragraphs). This task comes with multiple challenges, such as dealing with referent ambiguity (multiple places with a same name) or reference database completeness. In this work, we propose a geocoding approach based on modeling pairs of toponyms, which returns latitude-longitude coordinates. One of the input toponyms will be geocoded, and the second one is used as context to reduce ambiguities. The proposed approach is based on a deep neural network that uses Long Short-Term Memory (LSTM) units to produce representations from sequences of character n-grams. To train our model, we use toponym co-occurrences collected from different contexts, namely textual (i.e., co-occurrences of toponyms in Wikipedia articles) and geographical (i.e., inclusion and proximity of places based on Geonames data). Experiments based on multiple geographical areas of interest—France, United States, Great-Britain, Nigeria, Argentina and Japan—were conducted. Results show that models trained with co-occurrence data obtained a higher geocoding accuracy, and that proximity relations in combination with co-occurrences can help to obtain a slightly higher accuracy in geographical areas with fewer places in the data sources.
Highlights
IntroductionGeocoding is a core part of the broader text geoparsing task ( known as toponym resolution), in addition to geotagging
Geocoding is a core part of the broader text geoparsing task, in addition to geotagging
In order comparedatasets our model with other geocoding approaches, we evaluate our model on to geocoding used in the literature
Summary
Geocoding is a core part of the broader text geoparsing task ( known as toponym resolution), in addition to geotagging. While geotagging deals with the automatic recognition of named entities (i.e., named entity recognition) corresponding to places, geocoding aims to match the identified place entities to the corresponding locations. One important challenge in geocoding is related to toponym disambiguation [1] which faces, among other types of ambiguity issues [2], the problem of referent ambiguity ( known as geo/geo ambiguity). Referent ambiguity refers to toponyms having multiple locations [3] (e.g., Sofia (capital of Bulgaria) 6= Sofia (province in Bulgaria)). In the case of historical document analysis, new methods were proposed in order to geocode toponyms and solve toponym ambiguity without using gazetteers [4,5,6]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.