Abstract

The Earth Mover’s Distance (EMD) is a well-known distance metric for data represented as probability distributions over a predefined feature space. Supporting EMD-based similarity search has attracted intensive research effort. Despite the plethora of literature, most existing solutions are optimized for $L^p$ feature spaces (e.g., Euclidean space); while in a spectrum of applications, the relationships between features are better captured using networks. In this paper, we study the problem of answering $k$ -nearest neighbor ( $k$ -NN) queries under network-based EMD metrics (NEMD). We propose Oasis , a new access method which leverages the network structure of feature space and enables efficient NEMD-based similarity search. Specifically, Oasis employs three novel techniques: (i) Range Oracle , a scalable model to estimate the range of $k$ -th nearest neighbor under NEMD, (ii) Boundary Index , a structure that efficiently fetches candidates within given range, and (iii) Network Compression Hierarchy , an incremental filtering mechanism that effectively prunes false positive candidates to save unnecessary computation. Through extensive experiments using both synthetic and real data sets, we confirmed that Oasis significantly outperforms the state-of-the-art methods in query processing cost.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.