Abstract
Recent impressive studies on using ConvNet landmarks for visual place recognition take an approach that involves three steps: (a) detection of landmarks, (b) description of the landmarks by ConvNet features using a convolutional neural network, and (c) matching of the landmarks in the current view with those in the database views. Such an approach has been shown to achieve the state-of-the-art accuracy even under significant viewpoint and environmental changes. However, the computational burden in step (c) significantly prevents this approach from being applied in practice, due to the complexity of linear search in high-dimensional space of the ConvNet features. In this article, we propose two simple and efficient search methods to tackle this issue. Both methods are built upon tree-based indexing. Given a set of ConvNet features of a query image, the first method directly searches the features’ approximate nearest neighbors in a tree structure that is constructed from ConvNet features of database images. The database images are voted on by features in the query image, according to a lookup table which maps each ConvNet feature to its corresponding database image. The database image with the highest vote is considered the solution. Our second method uses a coarse-to-fine procedure: the coarse step uses the first method to coarsely find the top- N database images, and the fine step performs a linear search in Hamming space of the hash codes of the ConvNet features to determine the best match. Experimental results demonstrate that our methods achieve real-time search performance on five data sets with different sizes and various conditions. Most notably, by achieving an average search time of 0.035 seconds/query, our second method improves the matching efficiency by the three orders of magnitude over a linear search baseline on a database with 20,688 images, with negligible loss in place recognition accuracy.
Highlights
Visual place recognition is a fundamental problem in computer vision as well as in mobile robotics
Since our aim is to achieve real-time search performance while keeping the original recognition accuracy, we carefully tuned the parameters following a rule: increasing parameters that contribute to improving search accuracy until the recognition accuracy is close to that of the baseline linear Gaussian random projection (GRP)
We have proposed two methods, that is, the one-stage method and the two-stage method, to speed up the visual place recognition using ConvNet landmarks
Summary
Visual place recognition is a fundamental problem in computer vision as well as in mobile robotics. This goal, a visual place recognition system matches the image acquired at the current location with those obtained at previously visited locations. Great strides have been made, visual place recognition is still challenging because a range of environmental and conditional changes usually result in significant variations in image appearances. These changes are due to variations in weather, the time of day, season, camera viewpoint, and so on. Note that i 1⁄4 1; 2; :::; N , and m 1⁄4 1; 2; :::; M in this article
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.