Abstract
The self-supervised monocular depth estimation paradigm has become an important branch of computer vision depth-estimation tasks. However, the depth estimation problem arising from object edge depth pulling or occlusion is still unsolved. The grayscale discontinuity of object edges leads to a relatively high depth uncertainty of pixels in these regions. We improve the geometric edge prediction results by taking uncertainty into account in the depth-estimation task. To this end, we explore how uncertainty affects this task and propose a new self-supervised monocular depth estimation technique based on multi-scale uncertainty. In addition, we introduce a teacher–student architecture in models and investigate the impact of different teacher networks on the depth and uncertainty results. We evaluate the performance of our paradigm in detail on the standard KITTI dataset. The experimental results show that the accuracy of our method increased from 87.7% to 88.2%, the AbsRel error rate decreased from 0.115 to 0.11, the SqRel error rate decreased from 0.903 to 0.822, and the RMSE error rate decreased from 4.863 to 4.686 compared with the benchmark Monodepth2. Our approach has a positive impact on the problem of texture replication or inaccurate object boundaries, producing sharper and smoother depth images.
Highlights
Monocular depth estimation refers to the ability to learn a dense depth map at the pixel level from the video stream
We study the impact of multi-scale uncertainty on self-supervised monocular depth estimation and find that it yields more edge-depth uncertainty
In addition to the above metrics to evaluate the performance of our depth estimation model, we measure both Area Under the Sparsification Error (AUSE) and Area Under the Random Gain (AURG) as measures to evaluate the quality of uncertainty prediction
Summary
Monocular depth estimation refers to the ability to learn a dense depth map at the pixel level from the video stream. Uncertainty is defined into two categories, epistemic and aleatoric [17] The former can be used to understand examples that are different from those inside a training set, such as new scenes or new targets under which the model will predict the wrong depths with high probability, and such wrong depth results need to be detected. The latter can correctly learn the uncertainty (confidence) of the depth at the edge of the object, which is exactly what we require. The depth uncertainty map correctly represents the uncertainty of geometric edges and can restrict the learning of depth to edge pixels with large uncertainty
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have