Stereomatching plays an essential role in 3D reconstruction using very-high-resolution (VHR) remote sensing images. However, it still faces unignorable challenges due to the multi-scale objects in large scenes and the multi-modality probability distribution in challenging regions, especially the occluded and textureless areas. Accurate disparity estimation in stereo matching for multi-scale objects has become a hard but crucial task. In this paper, to tackle these problems, we design a novel confidence-aware unimodal cascade and fusion pyramid network for stereo matching. The fused cost volume from the coarsest scale is used to generate the initial disparity map, and then the learnable confidence maps are generated to construct the unimodal cost distributions, which are used to narrow down the next-stage disparity search range. Moreover, we design a cross-scale interaction aggregation module to leverage multi-scale information. Both smooth-L1 loss and stereo focal loss are applied to regularize the disparity map and unimodal cost distribution, respectively. Compared to two state-of-the-art stereo matching networks, extensive experimental results show that our proposed network outperforms them in terms of average endpoint error (EPE) and the fraction of erroneous pixels (D1).
Read full abstract