Depth estimation from a camera is an important task for 3D perception. Recently, without using the labeled ground truth of depth map, a self-supervised deep learning network can use relative pose to synthesize the target image from the reference image, and the photometric error between synthesized reference image and real one is used as self-supervisory signal. In this paper, we propose a novel self-supervised depth estimation network, which takes advantage of the quadtree constraint to optimize the depth estimation network. Based on the quadtree constraint, the photometric loss and depth loss of quadtree are proposed. In order to solve the problem that multiple depth values in repeated structures and uniform texture regions can cause relatively low photometric loss, we use quadtree-based photometric loss, which calculates the averaged photometric loss in quadtree blocks instead of the pixel-wise loss. For the problem of imbalanced depth distribution, we use quadtree depth loss, which constrains the depth inconsistency within quadtree blocks. The depth estimation network is composed of deep fusion module and cross-layer feature fusion module, which can better extract the feature information of RGB image and sparse keypoints depths, and makes full use of the detail information of the shallow feature map and the semantic information of the deep feature map to enrich the feature information extraction. Experimental results demonstrate that our method outperforms the state-of-the-art approaches of depth estimation.
Read full abstract