Abstract

Learning monocular depth in a self-supervised manner is desirable for numerous applications ranging from autonomous driving, robotics to augmented reality. However, the current challenges lie in the problems of scale ambiguity, dynamic scene and hardware limitations. To this end, a self-supervised approach is proposed in this work for monocular depth learning and estimation. Specifically, we first introduce a self-supervised depth learning framework that learns a part of the camera intrinsics and the stereo extrinsics, which ensures the absoluteness of the predicted depth while achieving enhanced performance (i.e., lower error between the predicted depth and the ground-truth) on depth estimation. Besides, we further improve the accuracy and the efficiency (i.e., shorter inference time and lower weight/activation footprints) via a specially-designed network that exploits multi-scale context across multi-level feature maps. In addition, we propose a quantization scheme for our depth estimation networks. The scheme allows the network inference to be carried out using INT4-INT8 arithmetic while keeping a high performance on depth estimation. Extensive experiments on KITTI and Make3D datasets demonstrate that our approach substantially boosts the performance compared to the existing state-of-the-art methods on monocular depth estimation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call