Abstract

Stereo matching aims at estimating the disparity of the input stereo images, which is further used to recover the geometric information of the scenes. Based on the disparity range hypothesis, most existing methods propose to construct cost volumes with the disparity range and introduce 3D convolutional nerual networks (CNN) to predict the best disparity for each pixel from the given disparity range. Although these methods achieves notable performance on standard datasets, the given disparity range limits their generalization to different scenes with different disparity range. Moreover, the introduced 3D CNN significantly increases the computational complexity for stereo matching systems. In this paper, instead of constructing cost volumes with the given disparity range, we tackle stereo matching task as the feature matching between stereo images with epipolar constraint for general purpose and we thus present Cascaded Feature Interaction Network (CFINet) for efficient stereo matching. Precisely, we first propose Cross Fusion Module (CFM) to fuse cross-image features and model the cross-image similarity for each pixel within epipolar lines. Secondly, we design the Upsampling Fusion Module (UFM) which ensures a cascaded multi-level feature aggregation from low resolution to high resolution, resulting in our coarse-to-fine network architecture with high efficiency. Experiments demonstrate that the proposed method generalize well in different disparity ranges with encouraging efficiency. Specifically, our CFINet is 7× faster than the well-known STTR with lower memory usage and achieves competitive results on SceneFlow and KITTI 2015 datasets compared to cost-volume based methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call