Abstract

Self-supervised monocular depth estimation has gained popularity due to its convenience of training network without dense ground truth depth annotation. Specifically, the multi-frame monocular depth estimation achieves promising results in virtue of temporal information. However, existing multi-frame solutions ignore the different impacts of pixels of input frame on depth estimation and the geometric information is still insufficiently explored. In this paper, a self-supervised monocular depth estimation framework with geometric prior and pixel-level sensitivity is proposed. Geometric constraint is involved through a geometric pose estimator with prior depth predictor and optical flow predictor. Further, an alternative learning strategy is designed to improve the learning of prior depth predictor by decoupling it with the ego-motion from the geometric pose estimator. On this basis, prior feature consistency regularization is introduced into the depth encoder. By taking the dense prior cost volume based on optical flow map and ego-motion as the supervising signal for feature consistency learning, the cost volume is obtained with more reasonable feature matching. To deal with the pixel-level difference of sensitivity in input frame, a sensitivity-adaptive depth decoder is built by flexibly adding a shorter path from cost volume to the final depth prediction. In this way, the back propagation of gradient to cost volume is adaptively adjusted, and an accurate depth map is decoded. The effectiveness of the proposed method is verified on public datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call