CGFNet: 3D Convolution Guided and Multi-scale Volume Fusion Network for fast and robust stereo matching

Qingyu Wang,Hao Xing,Yibin Ying,Mingchuan Zhou

doi:10.1016/j.patrec.2023.07.012

Abstract

Nowadays, although significant progress has been made by convolutional neural network, it is still difficult to realize accurate and robust stereo matching in real time. In this article, we study how to achieve more accurate and robust disparity estimation based on real-time requirement. For this reason, a Multi-scale Volume Fusion (MVF) module was proposed and embedded to improve the matching accuracy. To achieve real-time performance, an innovative way to use 3D convolution is proposed. The 3D convolution is used during training for guidance and supervision, making the inference lightweight. Based on these two structures, we designed an end-to-end stereo matching method called 3D Convolution Guided and Multi-scale Cost Volume Fusion Network (CGFNet). Experimental results showed that our CGFNet has better generalization performance on cross-domain datasets, which achieves more accurate disparity estimation without additional fine tuning process in challenging regions. On KITTI benchmark, CGFNet reached D1-all=1.98% with substantial improvement among the State-Of-The-Art (SOTA) real-time models and runs a pair of images within 38 ms (26 fps). The results are notable when considering both matching accuracy and real-time performance.

Full Text