Multi-view stereo based on deep learning is increasingly popular as a method for 3D reconstruction. Existing methods have made significant advancements in pixel-level depth estimation. However, challenges such as occlusions and non-Lambertian surfaces in images hinder accurate confidence estimation. Moreover, cost volume regularization often results in excessive smoothing at object boundaries. To tackle these challenges, we propose integrating the High Frequency Information Compensator and 3D Channel Attention Module into the Multi-View Stereo Network, termed HFCA-MVS. Firstly, in the feature volume aggregation stage, we introduce a high-frequency information compensator module to enhance the correlation between 2D semantics and 3D space. Subsequently, in the cost volume regularization stage, a 3D channel attention module is introduced to enhance the representation of channel features by capturing relationships among different channels. Lastly, the 3DCNN network employs the GELU activation function to boost the activation response and mitigate excessive object boundary smoothing. HFCA-MVS demonstrates competitive performance in 3D reconstruction across three benchmark datasets: DTU, BlendMVS, and Tanks&Temples. Particularly, compared to CasMVSNet, MVSTER, and Geo-MVSNet on the DTU benchmark, HFCA-MVS achieves a relative improvement in completeness of 33%, 6.5%, and 0.4%, respectively, and an enhancement in overall performance of 15% and 4.2% compared to CasMVSNet and MVSTER. Furthermore, our model yields comparable reconstruction results to existing models on the Tanks&Temples dataset.
Read full abstract