Abstract
Existing stereo matching methods achieve satisfactory average accuracy on a whole predicted disparity map under common global metrics, but ignore the fine-grained performance at region level, especially for far regions in the scene, which is more crucial in actual auto-driving scenarios. There are two factors accounting for this problem: 1) Depth resolution. Existing methods use disparity-based sampling to extract matching candidates uniformly according to the disparity range, but leads to sparser sampling density at far regions than that of close regions in terms of depth range, resulting in low depth resolution in far regions. 2) Feature discriminability. Limited image resolution and inferior feature extraction at far regions result in the obtained features with low discrimination, which influences the subsequent matching process between stereo images. To improve the estimation accuracy of far regions and thus achieve a balanced performance at region level, we design a novel two-stage Balanced Stereo Matching Network (BSMNet) to address the above problems. The coarse stage of BSMNet introduces a direct depth-based sampling strategy, which generates matching candidates according to scene depth instead of disparity, thus improving the depth resolution and obtaining initial depth map with more balanced accuracy. Then, a depth refinement stage is proposed to solve the problem of low feature discriminability and further optimizes the initial depth map obtained from the coarse stage. It selects matching candidates and computes their similarity scores from a carefully designed adaptive feature volume guided by a learnable scale map, thus making the final estimation more accurate. Different from the existing methods that construct stereo matching based on disparity prediction, our proposed pipeline is to directly optimize on depth information. Experiments show that our BSMNet can obtain an obvious performance improvement at far regions without discarding that at close regions, so as to largely outperform existing state-of-the-art methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have