Abstract

Recently, image representation by vector of locally aggregated descriptors (VLADs) has been demonstrated to be super efficient in image representation. Due to the coarse division in the feature space, its discriminative power is limited. One intuitive way to address this issue is to construct a VLAD with a larger vocabulary, but this will lead to a higher dimensional VLAD and suffer more computational complexity when learning the principal component analysis parameters used to project VLAD onto a low-dimensional space. In this paper, we propose a hierarchical scheme to build the VLAD. In our approach, by generating some subwords to each visual word of a coarse vocabulary, a hidden layer visual vocabulary is constructed. With the hidden layer visual vocabulary, the feature space is divided finer. Then, we aggregate the residues in the hidden layer vocabulary to the coarse layer to obtain an image descriptor that is of the same dimension as the original VLAD. In addition, we reveal that performing the whitening operation to local descriptor can further enhance the discriminative power of the VLAD. We validate our approach with experiments mainly conducted on three benchmark data sets, i.e., Holidays data set, UKBench data set, and Oxford Building data set with Flickr1M as distractors and make comparison with the related algorithms on VLAD. The experimental results demonstrate the effectiveness of our algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call