Abstract

Recently, deep learning techniques have been applied to solve visual or light detection and ranging (LiDAR) simultaneous localization and mapping (SLAM) problems. Supervised deep learning SLAM methods need ground truth data for training, but collecting such data is costly and labour-intensive. Unsupervised training strategies have been adopted by some visual or LiDAR SLAM methods. However, these methods only exploit the potential of single-sensor modalities, which do not take the complementary advantages of LiDAR and visual data. In this paper, we propose a novel unsupervised multi-channel visual-LiDAR SLAM method (MVL-SLAM) which can fuse visual and LiDAR data together. Our SLAM system consists of an unsupervised multi-channel visual-LiDAR odometry (MVLO) component, a deep learning–based loop closure detection component, and a 3D mapping component. The visual-LiDAR odometry component adopts a multi-channel recurrent convolutional neural network (RCNN). Its input consists of front, left, and right view depth images generated from 360^{circ } 3D LiDAR data and RGB images. We use the features from a deep convolutional neural network (CNN) for the loop closure detection component. Our SLAM method does not require ground truth data for training and can directly construct environmental 3D maps from the 3D mapping component. Experiments conducted on the KITTI odometry dataset have shown the rotation and translation errors are lower than some of the other unsupervised methods, including UnMono, SfmLearner, DeepSLAM, and UnDeepVO. Experimental results show that our methods have good performance. By fusing visual and LiDAR data, MVL-SLAM has higher accuracy and robustness of the pose estimation compared with other single-modal SLAM systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call