Abstract

Deep convolutional neural networks (CNNs) have shown great potential to provide accurate depth estimation based on stereo images. Previous work has focused on developing robust stereo matching architectures, while little attention has been paid on improving the network efficiency. In this paper, we propose an efficient Siamese CNN architecture that combines the low resolution disparity estimation and the depth discontinuity aware super-resolution. Specifically, we propose to construct, filter and perform regression on a low resolution cost volume through the designed stereo matching backbone network. A fast depth discontinuity aware super-resolution subnetwork is proposed for upsampling the low resolution disparity map to the desired resolution. Under the guidance of the intensity edge features extracted from the left color image, depth edge residuals are hierarchically learned to refine the upsampled depth map. A delayed upsampling structure is designed to ensure that the computational complexity is proportional to the spatial size of the input disparity map. We also propose to supervise the first derivative loss of the predicted disparity map that makes the network adaptively aware of the depth discontinuity edges. Experiments show that the proposed stereo matching network achieves a comparable prediction accuracy and much faster running speed compared with state-of-the-art methods.

Highlights

  • Depth estimated from stereo images has been the core information for vision-based practical applications, such as obstacle avoidance for robot navigation [1], 3D scene reconstruction for augmented and virtual reality system [2], and 3D visual object tracking and location [3], [4]

  • We propose an end-to-end convolutional neural networks (CNNs) architecture that combines the low resolution disparity estimation and the depth discontinuity aware super-resolution

  • In this paper, we present ESMNet to address the issue of fast stereo matching by designing a new end-to-end Siamese convolutional neural network architecture

Read more

Summary

Introduction

Depth estimated from stereo images has been the core information for vision-based practical applications, such as obstacle avoidance for robot navigation [1], 3D scene reconstruction for augmented and virtual reality system [2], and 3D visual object tracking and location [3], [4]. With the rapid development of deep learning, lots of convolutional neural network (CNN) based methods have been proposed to solve the stereo matching problem, since the milestone work of MC-CNN [6]. The associate editor coordinating the review of this manuscript and approving it for publication was Zhenhua Guo. Early deep stereo networks are designed to learn similarity metrics from a large set of cropped image patches [6]–[10] According to the taxonomy concluded by Scharstein et al [5], traditional stereo matching algorithms typically include four consecutively performed steps: matching cost computation, cost aggregation, disparity computation and disparity refinement.

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.