Embedding Multi-Order Spatial Clues for Scalable Visual Matching and Retrieval

Shiliang Zhang,Qingming Huang,Yong Rui,Qi Tian

doi:10.1109/jetcas.2014.2298272

Abstract

Matching duplicate visual contents among images serves as the basis of many vision tasks. Researchers have proposed different local descriptors for image matching, e.g., floating point descriptors like SIFT, SURF, and binary descriptors like ORB and BRIEF. These descriptors either suffer from relatively expensive computation or limited robustness due to the compact binary representation. This paper studies how to improve the matching efficiency and accuracy of floating points descriptors and the matching accuracy of binary descriptors. To achieve this goal, we embed the spatial clues among local descriptors to a novel local feature, i.e., multi-order visual phrase which contains two complementary clues: 1) the center visual clues extracted at each image keypoint and 2) the neighbor visual and spatial clues of multiple nearby keypoints. Different from existing visual phrase features, two multi-order visual phrases are flexibly matched by first matching their center visual clues, then estimating a match confidence by checking the spatial and visual consistency of their neighbor keypoints. Therefore, multi-order visual phrase does not scarify the repeatability of classic visual word and is more robust to the quantization error than existing visual phrase features. We extract multi-order visual phrases from both SIFT and ORB and test them in image matching and retrieval tasks on UKbench, Oxford5K, and 1 million distractor images collected from Flickr. Comparisons with recent retrieval approaches clearly demonstrate the competitive accuracy and significantly better efficiency of our approaches.

Full Text