VIDO: A Robust and Consistent Monocular Visual-Inertial-Depth Odometry

Yuanxi Gao,Jingqi Jiang,Jing Yuan,Xuebo Zhang,Qinxuan Sun

doi:10.1109/tits.2022.3226719

Abstract

Multi-sensor fusion is a mainstream method for localization of unmanned systems. How to achieve 6-degrees of freedom (DOF) pose estimation of the system is challenging in GPS-denied environments. Although map-aided localization methods normally perform well on intelligent transportation systems, prior maps are unavailable in some GPS-denied scenes (e.g., dense forests, tunnels, and underground parking lots). In this paper, we present a robust and consistent monocular visual-inertial-depth odometry (VIDO) to perform 6-DOF pose estimation without the need of prior information. The system contains a visual-inertial subsystem (VIS) based on tightly coupled optimization in a sliding window and a depth subsystem (DS) based on the iterative closest point (ICP) estimation using 3D point clouds obtained by a LiDAR or depth camera. The uncertainties of the estimation results in VIS and DS are rigorously calculated to consider measurement noises of the sensors. The obtained uncertainty estimates are fed into a covariance intersection (CI) filter for pose fusion, and the fused pose is further refined in the mapping process. We perform experiments on public datasets, as well as in various real-world outdoor and indoor scenes to verify the performance on localization and mapping in urban areas with buildings and cars, off-road environments with rugged terrains, as well as indoor structured environments. The results show that the proposed method can provide both a robust 6-DOF pose estimate and a precise 3D map for fully autonomous navigation in different scenes without a prior map, which presents an attractive complement to map-aided automated driving.

Full Text