Modeling spatial and semantic cues for large-scale near-duplicated image retrieval

Shiliang Zhang,Qi Tian,Gang Hua,Wengang Zhou,Qingming Huang,Houqiang Li,Wen Gao

doi:10.1016/j.cviu.2010.11.003

Abstract

Bag-of-visual Words (BoW) image representation has been illustrated as one of the most promising solutions for large-scale near-duplicated image retrieval. However, the traditional visual vocabulary is created in an unsupervised way by clustering a large number of image local features. This is not ideal because it largely ignores the semantic and spatial contexts between local features. In this paper, we propose the geometric visual vocabulary which captures the spatial contexts by quantizing local features in bi-space, i.e., in descriptor space and orientation space. Then, we propose to capture the semantic context by learning a semantic-aware distance metric between local features, which could reasonably measure the semantic similarities between image patches, from which the local features are extracted. The learned distance is hence utilized to cluster the local features for semantic visual vocabulary generation. Finally, we combine the spatial and semantic contexts in a unified framework by extracting local feature groups, computing the spatial configurations between the local features inside the group, and learning a semantic-aware distance between groups. The learned group distance is then utilized to cluster the extracted local feature groups to generate a novel visual vocabulary, i.e., the contextual visual vocabulary. The proposed visual vocabularies, i.e., geometric visual vocabulary, semantic visual vocabulary and contextual visual vocabulary are tested in large-scale near-duplicated image retrieval applications. The geometric visual vocabulary and semantic visual vocabulary achieve better performance than the traditional visual vocabulary. Moreover, the contextual visual vocabulary, which combines both spatial and semantic clues outperforms the state-of-the-art bundled feature in both retrieval precision and efficiency.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Modeling spatial and semantic cues for large-scale near-duplicated image retrieval

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding

Lead the way for us

Journal: Computer Vision and Image Understanding	Publication Date: Nov 11, 2010
Citations: 40

Similar Papers

Building contextual visual vocabulary for large-scale image applications
Shiliang Zhang ... Qi Tian
-
Shiliang Zhang, et. al.Shiliang Zhang ... Qi Tian
25 Oct 2010
25 Oct 2010

Building descriptive and discriminative visual codebook for large-scale image applications
Qi Tian ... Rongrong Ji
Multimedia Tools and Applications | VOL. 51
Qi Tian, et. al.Qi Tian ... Rongrong Ji
18 Nov 2010
Multimedia Tools and Applications | VOL. 51

Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification
Yousef Alqasrawi ... Daniel Neagu
Signal, Image and Video Processing | VOL. 7
Yousef Alqasrawi, et. al.Yousef Alqasrawi ... Daniel Neagu
20 Oct 2011
Signal, Image and Video Processing | VOL. 7

Bag of visual word model based on binary hashing and space pyramid
Fang Li ... Tianqiang Peng
-
Fang Li, et. al.Fang Li ... Tianqiang Peng
29 Aug 2016
29 Aug 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Modeling spatial and semantic cues for large-scale near-duplicated image retrieval

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding