Abstract

Georeferencing by place names (known as toponyms) is the most common way of associating textual information with geographic locations. While computers use numeric coordinates (such as longitude-latitude pairs) to represent places, people generally refer to places via their toponyms. Query by toponym is an effective way to find information about a geographic area. However, segmenting and parsing textual addresses to extract local toponyms is a difficult task in the geocoding field, especially in China. In this paper, a local spatial context-based framework is proposed to extract local toponyms and segment Chinese textual addresses. We collect urban points of interest (POIs) as an input data source; in this dataset, the textual address and geospatial position coordinates correspond at a one-to-one basis and can be easily used to explore the spatial distribution of local toponyms. The proposed framework involves two steps: address element identification and local toponym extraction. The first step identifies as many address element candidates as possible from a continuous string of textual addresses for each urban POI. The second step focuses on merging neighboring candidate pairs into local toponyms. A series of experiments are conducted to determine the thresholds for local toponym extraction based on precision-recall curves. Finally, we evaluate our framework by comparing its performance with three well-known Chinese word segmentation models. The comparative experimental results demonstrate that our framework achieves a better performance than do other models.

Highlights

  • In the information age, georeferenced information is considered to be an essential type of value-added information [1]

  • Toponyms are commonly used as part of a textual address to represent a geographic location, and queries to find information about a geographic area are commonly expressed as text—an approach called query by toponym

  • Adding step two increases the precision to 0.957 and increases a recall to 0.945. This result occurs because the proposed framework extracts local toponyms by leveraging local spatial contextual information, which helps improve the precision of textual address segmentation for textual address data containing local toponyms

Read more

Summary

Introduction

In the information age, georeferenced information is considered to be an essential type of value-added information [1]. The semantic information of Chinese textual address data can be parsed by segmenting an address text into multiple address elements (including the local toponyms) for further analysis. Natural language processing (NLP) related methods have been widely applied in applications such as information retrieval, information extraction, machine translation and so on [7] All these methods rely on text string processing; when these existing methods are applied to textual address segmentation, the final result cannot meet the accuracy requirement. We propose a local spatial context-based framework to extract local toponyms and segment Chinese textual addresses from urban points of interest (POIs) data.

Related Work
Method
Input: Urban POI Data with Textual Addresses
Step Two
Exploration of Local Toponym Spatial Distribution Patterns
Merging Neighbor Candidate Pairs
Iteration
8: Calculate the pairwise distances dij between
Output
Experiments
Dataset
Experimental Designs
Performance Evaluation
Ground-Truth Dataset Preparation
Threshold Determination for Extracting Local Toponyms
Findings
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call