BoCNF: efficient image matching with Bag of ConvNet features for scalable and robust visual place recognition

Yi Hou,Shilin Zhou,Hong Zhang

doi:10.1007/s10514-017-9684-3

Abstract

Recent advances in visual place recognition (VPR) have exploited ConvNet features to improve the recognition accuracy under significant environmental and viewpoint changes. However, it remains unsolved how to implement efficient image matching with high dimensional ConvNet features. In this paper, we tackle the problem of matching efficiency using ConvNet features for VPR, where the task is to accurately and quickly recognize a given place in large-scale challenging environments. The paper makes two contributions. First, we propose an efficient solution to VPR, based on the well-known bag-of-words (BoW) framework, to speed up image matching with ConvNet features. Second, in order to alleviate the problem of perceptual aliasing in BoW, we adopt a coarse-to-fine approach where we first, in the coarse stage, search for the top-K candidate images via BoW and then, in the fine stage, identify the best match among the candidates using a hash-based voting scheme. We conduct extensive experiments on six popular VPR datasets to validate the effectiveness of our method. Experimental results show that, in terms of recognition accuracy, our method is comparable to linear search, and outperforms other methods such as FABMAP and SeqSLAM by a significant margin. In terms of efficiecy, our method achieves a significant speed-up over linear search, with an average matching time as low as 23.5 ms per query on a dataset with 21K images.

Full Text