Abstract

Obtaining dense depth ground-truth is not trivial, which leads to the introduction of self-supervised monocular depth estimation. Most self-supervised methods utilize the photometric loss as the primary supervisory signal to optimize a depth network. However, such self-supervised training often falls into an undesirable local minimum due to the ambiguity of the photometric loss. In this paper, we propose a novel self-distillation training scheme that provides a new self-supervision signal, depth consistency among different input resolutions, to the depth network. We further introduce a gradient masking strategy that adjusts the self-supervision signal of the depth consistency during back-propagation to boost the effectiveness of our depth consistency. Experiments demonstrate that our method brings meaningful performance improvements when applied to various depth network architectures. Furthermore, our method outperforms the existing self-supervised methods on KITTI, Cityscapes, and DrivingStereo datasets by a noteworthy margin.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call