Abstract

Learning depth from a single image is a challenging task in computer vision. Many recent works on monocular depth estimation explore increasingly large convolutional neural networks to learn monocular cues implicitly. Such methods may fail to generalize well around object boundaries as large networks tend to distort the fine details (such as edges and corners) in low-resolution layers, leading to a poor depth prediction near object edges. To reduce depth loss near object boundaries, this paper proposes to explicitly decouple depth features for the body and edges of objects corresponding to low and high-frequency regions of an image, respectively. To this end, we learn a flow field to warp depth features into consistent body features and residual edge features. Afterward, decoupled supervision is employed on both sets of features to learn body and edge depth maps explicitly. Moreover, we also propose a lightweight encoder-decoder network that efficiently combines features at multiple scales to alleviate the loss of fine details in the final feature map. Extensive experiments on NYUD-v2 and KITTI datasets demonstrate that our proposed lightweight network with depth decoupling performs comparably to state-of-the-art methods while drastically reducing the number of parameters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call