Abstract

Recently, deep convolutional neural networks (CNNs) have emerged as powerful tools for the correspondence problem in stereo matching task. However, the existence of multiscale objects and inevitable ill-conditioned regions, such as textureless regions, in real-world scene images continue to challenge current CNN architectures. In this article, we present a robust scale-aware stereo matching network, which aims to predict multiscale disparity maps and fuse them to achieve a more accurate disparity map. To this end, powerful feature representations are extracted from stereo images and are concatenated into a 4-D feature volume. The feature volume is then fed into a series of connected encoder–decoder cost aggregation structures for the construction of multiscale cost volumes. Following this, we regress multiscale disparity maps from the multiscale cost volumes and feed them into a fusion module to predict final disparity map. However, uncertainty estimations at each scale and complex disparity relationships among neighboring pixels pose a challenge on the disparity fusion. To overcome this challenge, we design a robust learning-based scale-aware disparity map fusion model, which seeks to map multiscale disparity maps onto the ground truth disparity map by leveraging their complementary strengths. Experimental results show that the proposed network is more robust and outperforms recent methods on standard stereo evaluation benchmarks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call