Abstract

Visual place recognition is the task of finding matchings of images that show the same place in the world. Combinations of appearance changes (e.g. changing illumination or weather) and geometric changes (e.g. viewpoint changes or occlusions) challenge existing approaches. Learning-based local image feature pipelines are a promising approach to this type of problem. We present a novel attentive feature pooling method that can be used to train a CNN to jointly detect and describe local image features. It can be trained on small or moderately sized datasets with weak supervision in a classification training setup (e.g. we use a set of 24k images of publicly available web-camera images in our experiments). We propose to use a joint loss function that combines the cross-entropy loss for the classification task with a mean squared error in order to increase the repeatability of feature detections. We show how the approach can be integrated in a place recognition pipeline and run experiments on several standard place recognition datasets. Despite the small training dataset, we demonstrate a 15% improvement in the average performance compared to the best of a number of compared state-of-the-art approaches, and, probably more importantly, a 3x improvement in the worst-case performance. Open source code is available.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call