Abstract
Most large scale image retrieval systems are based on Bag-of-Visual-Words (BoV). Typically, no spatial information about the visual words is used despite the ambiguity of visual words. To address this problem, we introduce a spatial weighting framework for BoV to encode spatial information inspired by Geometry-preserving Visual Phrases (GVP). We first interpret GVP method using this framework. We reveal that GVP gives too large spatial weighting when calculating L2-norm for images due to its implicit assumption of the independence of co-occurring GVPs. This makes GVP sensitive to images with small number of visual words. Then we propose an improved practial spatial weighting for BoV (PSW-BoV) to alleviate this effect while keep the efficiency. Experiments on Oxford 5K and MIR Flickr 1M show that PSW-BoV is robust to images with small number of visual words, and also improves the general retrieval accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.