Abstract

In this paper, we propose the octave deep plane-sweeping network (OctDPSNet). OctDPSNet is a novel learning-based plane-sweeping stereo, which drastically reduces the required GPU memory and computation time while achieving a state-of-the-art depth estimation accuracy. Inspired by octave convolution, we divide image features into high and low spatial frequency features, and two cost volumes are generated from these using our proposed plane-sweeping module. To reduce spatial redundancy, the resolution of the cost volume from the low spatial frequency features is set to half that of the high spatial frequency features, which enables the memory consumption and computational cost to be reduced. After refinement, the two cost volumes are integrated into a final cost volume through our proposed pixel-wise “squeeze-and-excitation” based attention mechanism, and the depth maps are estimated from the final cost volume. We evaluate the proposed model on five datasets: SUN3D, RGB-D SLAM, MVS, Scenes11, and ETH3D. Our model outperforms previous methods on five datasets while drastically reducing the memory consumption and computational cost. Our source code is available at https://github.com/matsuren/octDPSNet .

Highlights

  • Depth estimation is a fundamental task in the fields of computer vision and robotics, especially for autonomous navigation or autonomous driving, as it is necessary to understand the surrounding environments

  • Our motivation is that we reduce the resolution of the cost volume to deal with the memory consumption and computational cost problems

  • To deal with the trade-off between the computation time and accuracy, we focus on reducing the spatial redundancy in a manner inspired by Octave convolution (OctConv) [14]

Read more

Summary

Introduction

Depth estimation is a fundamental task in the fields of computer vision and robotics, especially for autonomous navigation or autonomous driving, as it is necessary to understand the surrounding environments. RGB cameras, RGB-D cameras, and LiDAR are commonly employed for depth estimation. RGB cameras are the most popular sensors owing to their low cost, light weight, and availability. Depth estimation from multi-view images has been comprehensively studied over a long period [1]–[5]. One method is plane-sweeping stereo, where multi-view images are projected onto virtual planes at several distances from the reference image plane to generate a cost volume. The depth maps are estimated using this cost volume

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call