A monocular visual SLAM system augmented by lightweight deep local feature extractor using in-house and low-cost LIDAR-camera integrated device

Jing Li,Chenhui Shi,Jun Chen,Ruisheng Wang,Zhiyuan Yang,Fan Zhang,Jianhua Gong

doi:10.1080/17538947.2022.2138591

Abstract

ABSTRACT Simultaneous Localization and Mapping (SLAM) has been widely used in emergency response, self-driving and city-scale 3D mapping and navigation. Recent deep-learning based feature point extractors have demonstrated superior performance in dealing with the complex environmental challenges (e.g. extreme lighting) while the traditional extractors are struggling. In this paper, we have successfully improved the robustness and accuracy of a monocular visual SLAM system under various complex scenes by adding a deep learning based visual localization thread as an augmentation to the visual SLAM framework. In this thread, our feature extractor with an efficient lightweight deep neural network is used for absolute pose and scale estimation in real time using the highly accurate georeferenced prior map database at 20cm geometric accuracy created by our in-house and low-cost LiDAR and camera integrated device. The closed-loop error provided by our SLAM system with and without this enhancement is 1.03m and 18.28m respectively. The scale estimation of the monocular visual SLAM is also significantly improved (0.01 versus 0.98). In addition, a novel camera-LiDAR calibration workflow is also provided for large-scale 3D mapping. This paper demonstrates the application and research potential of deep-learning based vision SLAM with image and LiDAR sensors.

Full Text