Abstract

In recent years, significant progress has been made in multi-view stereo through the adoption of learning-based methods. However, current state-of-the-art approaches exhibit high computational and memory costs, owing to their reliance on semantic segmentation networks, neural rendering, and other advanced technologies for enhancing the accuracy of depth map prediction. To improve the efficiency of resource utilization and maintain accuracy in depth prediction, we present a self-supervised multi-view stereo framework that is lightweight and combines depth feature extraction modules, edge feature extraction modules, and data augmentation techniques. Specifically, the depth feature extraction module is employed to adaptively extract significant information from the input images for depth estimation. The edge feature extraction module is utilized to extract object contour features from the input images to augment the accuracy of depth inference at the edges. Experimental results on the DTU dataset show that our proposed lightweight self-supervised multi-view stereo framework, referred to as LS-MVSNet, outperforms the most advanced self-supervised MVS methods in terms of accuracy. Furthermore, extensive experiments on the Tanks and Temples dataset demonstrate the effective generalization ability of our model. The code will be released.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call