Abstract

Traditional visual place recognition (VPR) methods generally use frame-based cameras, which will easily fail due to rapid illumination changes or fast motion. To overcome this, we propose an end-to-end VPR network using event cameras, which can achieve good recognition performance in challenging environments (e.g., large-scale driving scenes). The key idea of the proposed algorithm is first to characterize the event streams with the EST voxel grid representation, then extract features using a deep residual network, and, finally, aggregate features using an improved VLAD network to realize end-to-end VPR using event streams. To verify the effectiveness of the proposed algorithm, on the event-based driving datasets (MVSEC, DDD17, and Brisbane-Event-VPR) and the synthetic event datasets (Oxford RobotCar and CARLA), we analyze the performance of our proposed method on large-scale driving sequences, including cross-weather, cross-season, and illumination changing scenes, and then, we compare the proposed method with the state-of-the-art event-based VPR method (Ensemble-Event-VPR) to prove its advantages. Experimental results show that the performance of the proposed method is better than that of the event-based ensemble scheme in challenging scenarios. To the best of our knowledge, for the VPR task, this is the first end-to-end weakly supervised deep network architecture that directly processes event stream data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call