Abstract

Deep learning has recently been proven to deliver excellent performance in multi-view stereo (MVS). However, it is difficult for deep learning-based MVS approaches to balance their efficiency and effectiveness. Towards this end, we propose the DSC-MVSNet, a novel coarse-to-fine and end-to-end framework for more efficient and more accurate depth estimation in MVS. In particular, we propose an attention aware 3D UNet-shape network, which first uses the depthwise separable convolutions for cost volume regularization. This mechanism enables effective aggregation of information and significantly reduces the model parameters and computation by transforming the ordinary convolution on cost volume as depthwise convolution and pointwise convolution. Besides, a 3D-Attention module is proposed to alleviate the feature mismatching problem in cost volume regularization and aggregate the important information of cost volume in three dimensions (i.e. channel, space, and depth). Moreover, we propose an efficient Feature Transfer Module to upsample the low-resolution (LR) depth map to a high-resolution (HR) depth map to achieve higher accuracy. With extensive experiments on two benchmark datasets, i.e. DTU and Tanks & Temples, we demonstrate that the parameters of our model are significantly reduced to 25% of the state-of-the-art model MVSNet. Besides, our method outperforms or maintains on par accuracy with the state-of-the-art models. Our source code is available at https://github.com/zs670980918/DSC-MVSNet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call