Abstract

Inverted indexing based on vector quantization has been a popular technique in large scale information retrieval. With vector quantization based on a certain similarity metric, the sample space is partitioned into some voronoi cells, and samples in each cell are indexed by an inverted list. The nearest neighbors of a query are efficiently identified by looking up the cell where the query is located. To improve the recall, the sample space partitioning has been performed multiple times with different initializations of k-means to build multiple inverted indexes. While with the single similarity metric, e.g., Euclidean distance, high correlation may exist between multiple inverted indexes, which constrains the possible gain in recall. A new multiple inverted indexing method based on multiple sample space partitioning with multiple different similarity metrics is presented in this paper. Furthermore, several techniques for defining multiple metrics are investigated empirically. Experiments are conducted on 3 representative datasets, million-scale SIFT and GIST feature sets and a deep-learning-produced feature set, to properly evaluate the effectiveness of the proposed method. Experiment results show that the proposed method has competitive performance compared with the state-of-the-art inverted indexing methods in terms of recall and retrieval time, and the Latin-Hypercube weighting method can generate better diverse multiple metrics and get better gain in recall.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call