Abstract
Supervised monocular depth estimation methods based on learning have shown promising results compared with the traditional methods. However, these methods require a large number of high-quality corresponding ground truth depth data as supervision labels. Due to the limitation of acquisition equipment, it is expensive and impractical to record ground truth depth for different scenes. Compared to supervised methods, the self-supervised monocular depth estimation method without using ground truth depth is a promising research direction, but self-supervised depth estimation from a single image is geometrically ambiguous and suboptimal. In this paper, we propose a novel semi-supervised monocular stereo matching method based on existing approaches to improve the accuracy of depth estimation. This idea is inspired by the experimental results of the paper that the depth estimation accuracy of a stereo pair as input is better than that of a monocular view as input in the same self-supervised network model. Therefore, we decompose the monocular depth estimation problem into two sub-problems, a right view synthesized process followed by a semi-supervised stereo matching process. In order to improve the accuracy of the synthetic right view, we innovate beyond the existing view synthesis method Deep3D by adding a left-right consistency constraint and a smoothness constraint. To reduce the error caused by the reconstructed right view, we propose a semi-supervised stereo matching model that makes use of disparity maps generated by a self-supervised stereo matching model as the supervision cues and joint self-supervised cues to optimize the stereo matching network. In the test, the two networks are able to predict the depth map directly from a single image by pipeline connecting. Both procedures not only obey geometric principles, but also improve estimation accuracy. Test results on the KITTI dataset show that this method is superior to the current mainstream monocular self-supervised depth estimation methods under the same condition.
Highlights
Depth estimation is the fundamental problem of 3D scene reconstruction, which is widely used in virtual reality, self-driving cars, and other fields
This paper proposes a novel monocular depth estimation method without using ground truth depth data, which uses the combinative model of the view synthesis network and stereo matching depth data, which uses the combinative model of the view synthesis network and stereo matching network to achieve a high-quality depth map from a single image
Paper, we proposed aa novel novel semi-supervised semi-supervised stereo stereo matching method from from aa single single image image
Summary
Depth estimation is the fundamental problem of 3D scene reconstruction, which is widely used in virtual reality, self-driving cars, and other fields. It has become a very hot research direction with the development of these fields. As depth estimation from a single image is an ill-posed and geometrically-ambiguous problem, most of the traditional methods adopt feature registration algorithms of polar geometry based on a binocular view or a multi-view of the scene, such as stereo matching [2], structure from motion [3], photometric stereo [4], and depth cue fusion [5]. The 3D scene reconstructed by these methods has low accuracy and is mostly a sparse reconstruction.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have