Abstract

Accurate and robust localization is a fundamental need for mobile agents. Visual–inertial odometry (VIO) algorithms exploit the information from the camera and inertial sensors to estimate position and translation. Recent deep-learning-based VIO models attract attention as they provide pose information in a data-driven way, without the need of designing hand-crafted algorithms. Existing learning-based VIO models rely on recurrent models to fuse multimodal data and process sensor signals, which are hard to train and not efficient enough. We propose a novel learning-based VIO framework with external memory attention that effectively and efficiently combines visual and inertial features for state estimation. Our proposed model is able to estimate pose accurately and robustly, even in challenging scenarios, for example, on overcast days and water-filled ground, which are difficult for traditional VIO algorithms to extract visual features. Experiments validate that it outperforms both traditional and learning-based VIO baselines in different scenes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call