Abstract

Binocular stereo matching is a challenging problem in computer vision. Recently, convolutional neural networks (CNNs) have emerged as a promising approach. However, matching ambiguities on ill-posed regions remain an intractable challenge for current methods. In this paper, we propose a novel network model consisting of two main parts: a wide context learning network and stacked encoder–decoder 2D CNNs with a spatial diffusion module. The first part leverages the power of a dilated convolutional layer and spatial pyramid pooling to extract global context information and constitute a matching cost volume. The second part performs contextual aggregation over this matching cost volume to optimize and smooth the matching cost. Finally, we estimate a disparity map by computing the probability of each disparity from the predicted matching cost. The proposed network model allows us to train end-to-end without any further post-processing or refinement. In the experiments, we evaluate our method using the synthesis Scene Flow and real-world KITTI datasets. Our proposed method achieves performance results that are competitive against state-of-the-art methods while maintaining a fast run time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call