Learning deformable hypothesis sampling for patchmatch multi-view stereo in the wild

Yao Guo,Xianwei Zheng,Hongjie Li,Linxi Huan,Jiayi Ma,Jianya Gong

doi:10.1016/j.inffus.2024.102646

Abstract

The learnable PatchMatch formulations have recently made progress in Multi-View Stereo (MVS). However, their performance often degrades under complex wild scenes. In this study, we observe that the degradation is mainly caused by the noisy depth hypothesis sampling during iterations of PatchMatch MVS: (i) Within a single iteration, the method mixes all information of a fixed-shape region in the spatial dimension, which introduces sampling noise, especially in regions with abrupt depth changes; (ii) Between iterations, the depth candidate hypotheses are sampled indiscriminately in the depth dimension, which fails to filter out noise in the sampling range and propagates errors to the next iteration. Accordingly, we propose a deformable sampling learning strategy (DefLearn) guided by matching uncertainty to determine the hypothesis set from two perspectives: (i) Encoding the matching uncertainty within the image allows the search region of sampling to fit the depth distribution in the spatial dimension; (i) Modeling the cross-image matching uncertainty more accurately based on the obtained valid hypotheses to capture scene structure information for fine-grained hypothesis searching in the depth dimension. With DefLearn, we develop a deep probabilistic PatchMatch MVS network (PPMNet) for accurate depth estimation with a high awareness of diverse scene structures. Extensive experiments conducted on three challenging datasets demonstrate that our PPMNet significantly outperforms state-of-the-art methods in terms of accuracy and generalization ability, even in particularly challenging wild scenes. Code is available at https://github.com/Geo-Tell/PPMNet.

Full Text