Abstract

Aligning points of interest (POIs) from heterogeneous geographical data sources is an important task that helps extend map data with information from different datasets. This task poses several challenges, including differences in type hierarchies, labels (different formats, languages, and levels of detail), and deviations in the coordinates. Scalability is another major issue, as global-scale datasets may have tens or hundreds of millions of entities. In this paper, we propose the GeographicaL Entities AligNment (GLEAN) system for efficiently matching large geographical datasets based on spatial partitioning with an adaptable margin. In particular, we introduce a text similarity measure based on the local-context relevance of tokens used in combination with sentence embeddings. We then come up with a scalable type embedding model. Finally, we demonstrate that our proposed system can efficiently handle the alignment of large datasets while improving the quality of alignments using the proposed entity similarity measure.

Highlights

  • The three related works do not address the main issues covered by our work, i.e., the scalability problem and the use local relevance of tokens to improve text similarity measures, and the multilinguality of labels

  • We proposed GeographicaL Entities AligNment (GLEAN), a scalable approach to aligning geographical entities (i.e., points of interest (POIs)) from different sources based on four attributes

  • The offline method makes use of adaptive-margin partitioning to enable the scalable alignments of large datasets at a global scale

Read more

Summary

Introduction

There have been a large number of points of interest (POIs) incorporated into geospatial databases [1]. Social networks, such as Facebook, host many businesses’. Pages, and posts contain a collection of businesses. Travel companies, such as TripAdvisor, host a selection of tourist attractions. All of these POIs are associated with specific geographical locations. Map service providers (such as Google Maps, Tomtom, Here Maps) extend and enrich their datasets, often with the help of crowdsourcing

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call