Abstract

Stereo matching depth estimation for rectified image pairs is of great importance to many compute vision tasks, specifically in autonomous driving. With the flourishing of convolution neural networks, responsible depth estimation of stereo matching with artificial intelligence is the most severe challenge for autonomous driving in recent years. Previous research on end-to-end trainable stereo matching networks has usually used cascading convolution blocks with down-sampling or pooling operations to extract the unary features required for matching cost construction. Such approaches lack a reconstruction stage for increasing feature map pixel-wise alignment and strength, factors which play an important role in representing the similarity between stereo image pairs. To address this issue, in this paper, we propose the progressive fusion stereo matching network (PFSM-Net). We exploit an encoder-decoder feature extraction network architecture for multi-stage and -scale dynamic feature extraction. Moreover, we propose a group-wise concatenation method to construct the cost volume, which provides a more efficient cost volume for cost aggregation. Furthermore, we propose the use of multi-scale cost aggregation networks with a progressive fusion strategy. The aggregated cost volume is progressively fused with the multi-stage and -scale cost volume as the size of the cost volume increases. Multi-stage and -scale outputs are supervised with and learned in a coarse-to-fine manner. Experimental results demonstrate that our method outperforms previous methods on the SceneFlow, KITTI 2012, and KITTI 2015 datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call