Abstract

Vector quantization is an essential tool for tasks involving large scale data, for example, large scale similarity search, which is crucial for content-based information retrieval and analysis. In this paper, we propose a novel vector quantization framework that iteratively minimizes quantization error. First, we provide a detailed review on a relevant vector quantization method named residual vector quantization (RVQ). Next, we propose generalized residual vector quantization (GRVQ) to further improve over RVQ. Many vector quantization methods can be viewed as special cases of our proposed method. To enable GRVQ on billion scale data, we introduce a nonexhaustive search scheme named aggregating tree (A-Tree) for high dimensional data that uses GRVQ encodings to build a radix tree and perform the nearest neighbor search by beam search. To search accurately and efficiently, VQ-encodings should satisfy locally aggregating encoding criterion: For any node of the corresponding A-Tree, neighboring vectors should aggregate in fewer subtrees to make beam search efficient. We show that the proposed GRVQ encodings best satisfy the suggested criterion, and the joint use of GRVQ and A-Tree shows significantly better performances on billion scale datasets. Our methods are validated on several standard benchmark datasets. Experimental results and empirical analysis show the superior efficiency and effectiveness of our proposed methods compared to the state-of-the-art for large scale search.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.