LiteHST: A Tree Embedding based Method for Similarity Search

Yuxiang Zeng,Yongxin Tong,Lei Chen

doi:10.1145/3588715

Abstract

Similarity search is getting increasingly useful in real applications. This paper focuses on the in-memory similarity search, i.e., the range query and k nearest neighbor (kNN) query, under arbitrary metric spaces, where the only known information is the distance function to measure the similarity between two objects. Although lots of research has studied this problem, the query efficiency of existing solutions is still unsatisfactory. To further improve the query efficiency, we are inspired by the tree embeddings, which map each object into a unique leaf of a well-structured tree solely based on the distances. Unlike existing embedding techniques (e.g., Lipschitz embeddings and pivot mapping) for similarity search, where an extra multi-dimensional index is needed to index the embedding space (e.g., Lp metrics), we directly use this tree to answer similarity search. This seems to be promising, but it is challenging to tailor tree embeddings for efficient similarity search. Specifically, we present a novel index called LiteHST, which is based on the most popular tree embedding (HST) and heavily customized for similarity search in the node structure and storage scheme. We propose a new construction algorithm with lower time complexity than existing methods and prove the optimality of LiteHST in the distance bound. Based on this new index, we also design optimization techniques that heavily reduce the number of distance computations and hence save running time. Finally, extensive experiments demonstrate that our solution outperforms the state-of-the-art in the query efficiency by a large margin.

Full Text