Abstract

Depth estimation from single monocular image attracts increasing attention in autonomous driving and computer vision. While most existing approaches regress depth values or classify depth labels based on features extracted from limited image area, the resulting depth maps are still perceptually unsatisfying. Neither local context nor low-level semantic information is sufficient to predict depth. Learning based approaches suffer from inherent defects of supervision signals. This paper addresses monocular depth estimation with a general information exchange convolutional neural network. We maintain a high-resolution prediction throughout the network. Meanwhile, both low-resolution features capturing long-range context and fine-grained features describing local context can be refined with information exchange path stage by stage. Mutual channel attention mechanism is applied to emphasize interdependent feature maps and improve the feature representation of specific semantics. The network is trained under the supervision of improved log-cosh and gradient constraints so that the abnormal predictions have less impacts and the estimation can be consistent in high order. The results of ablation studies verify the efficiency of every proposed components. Experiments on the popular indoor and street-view datasets show competitive results compared with the recent state-of-the-art approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.