Hierarchical multi-VLAD for image retrieval

Yitong Wang,Jie Lin,Zhe Wang,Tiejun Huang,Ling-Yu Duan

doi:10.1109/icip.2015.7351684

Abstract

Constructing discriminative feature descriptors is crucial towards effective image retrieval. The state-of-the-art powerful global descriptor for this purpose is Vector of Locally Aggregated Descriptors (VLAD). Given a set of local features (say, SIFT) extracted from an image, the VLAD is generated by quantizing local features with a small visual vocabulary (64 to 512 centroids), aggregating the residual statistics of quantized features for each centroid and concatenating the aggregated residual vectors from each centroid. One can increase the search accuracy by increasing the size of vocabulary (from hundreds to hundreds of thousands), which, however, it leads to heavy computation cost with flat quantization. In this paper, we propose a hierarchical multi-VLAD to seek the tradeoff between descriptor discriminability and computation complexity. We build up a tree-structured hierarchical quantization (TSHQ) to accelerate the VLAD computation with a large vocabulary. As quantization error may propagate from root to leaf node (centroid) with TSHQ, we introduce multi-VLAD, which constructing a VLAD descriptor for each level of the vocabulary tree, so as to compensate for the quantization error at that level. Extensive evaluation over benchmark datasets has shown that the proposed approach outperforms state-of-the-art in terms of retrieval accuracy, fast extraction, as well as light memory cost.

Full Text