Abstract
Abstract In this work, we aim to tackle the task of monocular depth estimation, i.e., estimating depth map from only one single image. Without the references to determine the scale of the scene, the monocular depth estimation suffers from an inherent problem: depth ambiguity, which means the objects with similar appearances in image might have different depths. The depth ambiguity not only makes the depth estimation model hard to train, but also reduces the depth estimation accuracy. In this work, we present a new method to alleviate this problem. We observe the surface normal map is invariant with respect to the scale of the scene, thus we use the surface normal as reference to assist the depth prediction. Firstly, we present a multitask CNN to simultaneously produce the superpixel-wise depth and surface normal predictions. Then we introduce a CRF with an autoencoder based pairwise potential to refine the superpixel-wise predictions of CNN. At last, we propose a novel joint optimization algorithm which not only can enhance the depth prediction in accordance with the surface normal prediction, but also can transform the superpixel-wise depth map into a fine-grained pixel-wise depth estimation result. The proposed model is evaluated on NYU-D2, SUN RGB-D and Make3D datasets. Experimental results show that the proposed model can produce the state-of-the-art results while consuming relatively low GPU memory.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have