Abstract

Visual patterns, i.e., high-order combinations of visual words, have provided a discriminative abstraction of the high-dimensional bag-of-words image representation. However, the existing visual pattern mining schemes are built upon the ill-posed 2D photographic concurrences of visual words, rather than their real-world 3D concurrences, which incorrectly bind words from different objects/depth into an identical pattern to degenerate the mining precision. On the other hand, how to build a compact yet discriminative image representation from the mined patterns is left open, which is however highly demanded for many emerging applications like mobile visual search. In this paper, we propose a novel compact bag-of-patterns (CBoP) descriptor to address both issues, with application to low bit rate mobile landmark search. First, to overcome the ill-posed 2D photographic configuration, we build a 3D point cloud from the reference images of each target, based on which extract accurate pattern candidates from the 3D concurrences of visual words. Then, we introduce a novel gravity distance metric for pattern mining, which models the relative frequency between the concurrent words as “gravity” to mine more discriminative visual patterns. Finally, we build a CBoP descriptor toward a compact yet discriminative image representation, which conducts sparse coding over the mined patterns to maximally reconstruct the original bag-of-words histogram with a minimal coding length. The proposed CBoP paradigm is deployed in a large-scale low bit rate mobile landmark search prototype, where compact visual descriptors are directly extracted and sent from the mobile end to reduce the query delivery latency. We quantize our performance in several large-scale public benchmarks with comparisons to the state-of-the-art compact descriptors [112, 113], 2D visual patterns [71, 72], topic features [114, 115] and Hashing descriptors [116]. We have reported superior performance, e.g., comparable accuracy to the million-scale bag-of-words histogram with an 1:160,000 descriptor compression rate (approximately 100-bit per descriptor).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call