Abstract

Monocular visual SLAM methods can accurately track the camera pose and infer the scene structure by building sparse correspondence between two/multiple views of the scene. However, the reconstructed 3D maps of these methods are extremely sparse. On the other hand, deep learning is widely used to predict dense depth maps from single-view color images, but the results are subject to blurry depth boundaries, which severely deform the structure of 3D scene. Therefore, this paper proposes a dense reconstruction method under the monocular SLAM framework (DRM-SLAM), in which a novel scene depth fusion scheme is designed to fully utilize both the sparse depth samples from monocular SLAM and predicted dense depth maps from convolutional neural network (CNN). In the scheme, a CNN architecture is carefully designed for robust depth estimation. Besides, our approach also accounts for the problem of scale ambiguity existing in the monocular SLAM. Extensive experiments on benchmark datasets and our captured dataset demonstrate the accuracy and robustness of the proposed DRM-SLAM. The evaluation of runtime and adaptability under challenging environments also verify the practicability of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call