Abstract
Recognising previously visited locations is an important, but unsolved, task in autonomous navigation. Current visual place recognition (VPR) benchmarks typically challenge models to recover the position of a query image (or images) from sequential datasets that include both spatial and temporal components. Recently, Echo State Network (ESN) varieties have proven particularly powerful at solving machine learning tasks that require spatio-temporal modelling. These networks are simple, yet powerful neural architectures that—exhibiting memory over multiple time-scales and non-linear high-dimensional representations—can discover temporal relations in the data while still maintaining linearity in the learning time. In this letter, we present a series of ESNs and analyse their applicability to the VPR problem. We report that the addition of ESNs to pre-processed convolutional neural networks led to a dramatic boost in performance in comparison to non-recurrent networks in five out of six standard benchmarks (GardensPoint, SPEDTest, ESSEX3IN1, Oxford RobotCar, and Nordland), demonstrating that ESNs are able to capture the temporal structure inherent in VPR problems. Moreover, we show that models that include ESNs can outperform class-leading VPR models which also exploit the sequential dynamics of the data. Finally, our results demonstrate that ESNs improve generalisation abilities, robustness, and accuracy further supporting their suitability to VPR applications.
Highlights
Visual Place Recognition (VPR) challenges algorithms to recognise previously visited places despite changes in appearance caused by illuminance, viewpoint, and weather conditions [1]
The performance of Echo State Network (ESN) and ESN+SpaRCe were first evaluated in three datasets (GardensPoint, SPEDTest and ESSEX3IN1)
We have demonstrated the viability of ESNs as a solution to the VPR problem
Summary
Visual Place Recognition (VPR) challenges algorithms to recognise previously visited places despite changes in appearance caused by illuminance, viewpoint, and weather conditions [1] (see Fig. 2 for example images). Unlike in many machine learning domains, typical VPR benchmark require learning of position from images gathered during one route traversal, when compared with data during another route traversal, meaning that there are very few examples to learn from (typically only the images within a few metres of the correct location) making the task even more challenging. One approach is to recognise places based on matching single views using image processing methods to remove the variance between datasets. Models have been developed that use different image descriptors to obtain meaningful image representations that are robust to visual change (e.g. AMOSNet [2], DenseVLAD [3], and NetVLAD [4]).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.