Abstract
Stereo matching has been solved as a supervised learning task with convolutional neural network (CNN). However, CNN based approaches basically require huge memory use. In addition, it is still challenging to find correct correspondences between images at ill-posed dim and sensor noise regions. To solve these problems, we propose Sparse Cost Volume Net (SCV-Net) achieving high accuracy, low memory cost and fast computation. The idea of the cost volume for stereo matching was initially proposed in GC-Net. In our work, by making the cost volume compact and proposing an efficient similarity evaluation for the volume, we achieved faster stereo matching while improving the accuracy. Moreover, we propose to use weight normalization instead of commonly-used batch normalization for stereo matching tasks. This improves the robustness to not only sensor noises in images but also batch size in the training process. We evaluated our proposed network on the Scene Flow and KITTI 2015 datasets, its performance overall surpasses the GC-Net. Comparing with the GC-Net, our SCV-Net achieved to: (1) reduce 73.08 % GPU memory cost; (2) reduce 61.11 % processing time; (3) improve the 3PE from 2.87 % to 2.61 % on the KITTI 2015 dataset.
Highlights
Depth images have widely been used as an input to many computer vision applications such as 3D reconstruction [1], object detection [2], and visual odometry [3]
We propose Sparse Cost Volume Network (SCV-Net) costing less GPU memory and less runtime while achieving comparable accuracy with the state-of-the-art methods
We briefly review state-of-the-art stereo matching methods based on deep learning
Summary
Depth images have widely been used as an input to many computer vision applications such as 3D reconstruction [1], object detection [2], and visual odometry [3]. By leveraging the recent advance of machine learning techniques such as deep learning, stereo matching methods using neural networks have been proposed [5,6] Such methods have shown its strong ability on correspondence matching owing to taking advantages of the massive data for the training [7]. Our detailed evaluation indicated that the matching at dim and noisy regions was still challenging even with the state-of-the-art methods. This is one of the main reasons for the decreases in the accuracy and is crucial especially in outdoor environments, it has not been much discussed in the literature. Further improvements are obviously required to widen its use
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.