Abstract

The amount of spatial data in open data portals has increased rapidly, raising the demand for spatial dataset search in large data repositories. In this paper, we tackle spatial dataset search by using the Earth Mover's Distance (EMD) to measure the similarity between datasets. EMD is a robust similarity measure between two distributions and has been successfully applied to multiple domains such as image retrieval, document retrieval, multimedia, etc. However, the existing EMD-based studies typically depend on a common filtering framework with a single pruning strategy, which still has a high search cost. To address this issue, we propose a Dual-Bound Filtering (DBF) framework to accelerate the EMD-based spatial dataset search. Specifically, we represent datasets by Z-order histograms and organize them as nodes in a tree structure. During a query, two levels of filtering are conducted based on pooling-based bounds and a TICT bound on EMD to prune dissimilar datasets efficiently. We conduct experiments on four real-world spatial data repositories and the experimental results demonstrate the efficiency and effectiveness of our DBF framework.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call