Abstract

To further improve the approximate nearest neighbor (ANN) search performance, an accumulative quantization (AQ) is proposed and applied to effective ANN search. It approximates a vector with the accumulation of several centroids, each of which is selected from a different codebook. To provide accurate approximation for an input vector, an iterative optimization is designed when training codebooks for improving their approximation power. Besides, another optimization is introduced into offline vector quantization procedure for the purpose of minimizing overall quantization errors. A hypersphere-based filtration mechanism is designed when performing AQ-based exhaustive ANN search to reduce the number of candidates put into sorting, thus yielding better search time efficiency. For a query vector, a self-centered hypersphere is constructed, so that those vectors not lying in the hypersphere are filtered out. Experimental results on public datasets demonstrate that hypersphere-based filtration can improve ANN search time efficiency with no weakening of search accuracy; besides, the proposed AQ is superior to the state of the art on ANN search accuracy.

Highlights

  • E natural solution is to perform exact nearest neighbor search, which is inherently expensive for large-scale collects and high dimensional vectors due to the “curse of dimensionality” [3]. is difficulty has led to the development of the solutions to approximate nearest neighbor (ANN) search. e key idea shared by ANN methods is to find the Nearest neighbor (NN) with high probability “only,” instead of probability 1 [4]

  • Product quantization (PQ) is firstly introduced into ANN search [4], where the vector space is decomposed into a Cartesian product of low-dimensional subspaces

  • Given a query vector q, the distance between q and the vectors in database will be computed according to formula (8) when performing exhaustive ANN search. en, a distance sorting method is adopted over all the vectors to return close vectors of presetting number

Read more

Summary

Introduction

Nearest neighbor (NN) search is fundamental and important in many applications, such as machine learning, image classification, content-based image retrieval, deep learning, feature matching [1], and image interpolating [2]. e goal of NN search is to find the closest vector whose distance to the query vector is the smallest among a database according to a predefined distance metric. Hash-based nearest neighbor search methods map vectors from Euclidean space into hamming space, using binary codes to represent the vectors [8]. Computational Intelligence and Neuroscience ere are ANN search methods trying to resolve the nearest neighbor search problem with efficient quantization technology [14] by adopting Euclidean distance which owns better discrimination than hamming distance. To improve the approximation power of codebooks, an optimization is introduced through minimizing the overall error between original vector and the vector reconstructed by accumulative quantization. Given a query vector, its nearest neighbors only locate near the query in the vector space, so we proposed a hypersphere filtration strategy, which has simple but positive effect on improving search time efficiency.

Accumulative Quantization
Fast Distance Computation
Hypersphere-Based Filtration for Exhaustive ANN Search
Experiments
M GIST
Methods
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call