Although learning-based multi-view stereo algorithms have produced exciting results in recent years, few researchers have explored the specific role of deep sampling in the network. We posit that depth sampling accuracy more directly impacts the quality of scene reconstruction. To address this issue, we proposed NTPP-MVSNet, which utilizes normal vector and depth information from neighboring pixels to propagate tangent planes. Based on this, we obtained a more accurate depth estimate through homography transformation. We used deformable convolution to acquire continuous pixel positions on the surface and 3D-UNet to account for the regression of depth and normal vector maps without consuming additional GPU memory. Finally, we applied homography transformation to complete the mapping of the imaging plane and the neighborhood surface tangent plane to generate a depth hypothesis. Experimental trials on the DTU and Tanks and Temples datasets demonstrate the feasibility of NTPP-MVSNet, and ablation experiments confirm the superior performance of our deep sampling methodology.
Read full abstract