Abstract

As a robust similarity measurement, Earth Mover's Distance (EMD) has been widely adopted in many real-world applications, such as machine learning, computer vision and natural language processing. In this paper, we study the problem of EMD-based similarity search, which aims at finding all histogram objects from a dataset whose EMD is within a pre-defined threshold from the given query. Since the time complexity of computing EMD is rather high, it is essential to devise effective techniques to accelerate the query processing. To this end, we propose a filter-and-verification framework: In the filter step, we devise three effective strategies to prune dissimilar objects in batch by sharing the computation between multiple objects. In the verification step, we develop novel flow adjustment techniques to incrementally calculate the EMD of candidates and enable early termination. We justify our proposed framework by conducting both theoretical analysis and extensive experiments. The results on four real world datasets show that our proposed techniques achieve up to an order of magnitude performance gain than state-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call