A Monocular SLAM System Based on ResNet Depth Estimation

Zheng Li,Zihao Pan,Lei Yu

doi:10.1109/jsen.2023.3275324

Abstract

Currently, monocular SLAM (simultaneous location and mapping) systems cannot extract depth information from monocular cameras directly and require initialization to solve the problem of scale uncertainty. It is extremely difficult to reconstruct maps using such systems, and it is difficult to cope with scenarios that require navigation and obstacle avoidance. In order to solve the above problems, in this paper, a simple monocular depth estimation network framework is proposed. Transfer learning from a pre-trained ResNet is utilized for the encoding part of the framework and a CNN is used as the decoding part. Only a few training parameters and iterations are required to obtain fairly accurate depth information. At the same time, a similarity-based filter is used to denoise the surfels and improve the RGB-D SLAM system, which not only reduces the impact of the depth estimation error on the surfels, but also ensures the quality of the dense mapping. From the results of comparative experiments, it can be seen that the proposed monocular depth estimation network framework is better than current popular methods, and the associated SLAM system can achieve pose estimation and dense mapping tasks. As a monocular camera-based SLAM system, the proposed method is a promising and practical approach.

Full Text