The current visual simultaneous localization and mapping (SLAM) systems require the use of matched feature point pairs to estimate camera pose and construct environmental maps. Therefore, they suffer from poor performance of the visual feature matchers. To address this problem, a visual SLAM using deep feature matcher is proposed, which is mainly composed of three parallel threads: Visual Odometry, Backend Optimizer and LoopClosing. In the Visual Odometry, the deep feature extractor with convolutional neural networks is utilized for extracting feature points in each image frame. Then, the deep feature matcher is used for obtaining the corresponding feature landmark pairs. Afterwards, a fusion method based on the last and the reference frame is proposed for camera pose estimation. The Backend Optimizer is designed to execute local bundle adjustment for a part of camera poses and landmarks (map points). While LoopClosing, consisting of a lightweight deep loop closure detector and the same matcher as the one used in Visual Odometry is utilized for loop correction based on pose graph. The proposed system has been tested extensively on most of benchmark KITTI odometry dataset. The experimental results show that our system yields better performance than the existing visual SLAM systems. It can not only run in real-time at a speed of 0.08 s per frame, but also reduce estimation error by at least 0.1 m.
Read full abstract