Abstract

Not withstanding its great success and wide adoption in Bag-of-visual Words representation, visual vocabulary created from single image local features is often shown to be ineffective largely due to three reasons. First, many detected local features are not stable enough, resulting in many noisy and non-descriptive visual words in images. Second, single visual word discards the rich spatial contextual information among the local features, which has been proven to be valuable for visual matching. Third, the distance metric commonly used for generating visual vocabulary does not take the semantic context into consideration, which renders them to be prone to noise. To address these three confrontations, we propose an effective visual vocabulary generation framework containing three novel contributions: 1) we propose an effective unsupervised local feature refinement strategy; 2) we consider local features in groups to model their spatial contexts; 3) we further learn a discriminant distance metric between local feature groups, which we call discriminant group distance. This group distance is further leveraged to induce visual vocabulary from groups of local features. We name it contextual visual vocabulary, which captures both the spatial and semantic contexts. We evaluate the proposed local feature refinement strategy and the contextual visual vocabulary in two large-scale image applications: large-scale near-duplicate image retrieval on a dataset containing 1.5 million images and image search re-ranking tasks. Our experimental results show that the contextual visual vocabulary shows significant improvement over the classic visual vocabulary. Moreover, it outperforms the state-of-the-art Bundled Feature in the terms of retrieval precision, memory consumption and efficiency.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.