Matching cost aggregation plays a critical role for the stereo matching task. Existing CNN-based methods commonly use 3D convolutions to aggregate matching costs from a local 3D space. However, the high computational cost of 3D convolutions limits their applications. Traditional methods show that a slanted support window in 3D space can help to aggregate matching costs from informative regions, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.,</i> the surfaces of objects. Motivated by this idea, we propose a SPNet with differentiable slanted plane aggregation. Our slanted plane aggregation layers aggregate matching costs from a learnable slanted plane in a local 3D space to reduce computational and memory costs. Experimental results show that our slanted plane aggregation layers can learn to fit the surfaces of objects and effectively aggregate matching costs. Comparison with previous stereo matching methods shows that our network achieves competitive performance with higher efficiency.
Read full abstract