Abstract

In stereo matching, the high-quality cost volume is the key to improve the matching accuracy. Current stereo matching networks only use traditional methods or neural networks to generate one or more cost volumes. They do not consider combining different matching cost computation methods to improve the quality of cost volume. Therefore, we propose BSDCNet, a real-time stereo matching network consisting of two main modules: Double Matching Cost Computation and Bidirectional Cost Aggregation Network. The Double Matching Cost Computation module generates two different cost volumes through traditional methods and neural networks. The Bidirectional Cost Aggregation Network is a two-branch structure, which can aggregate the above two cost volumes with different network depths. Finally, we also design a feature fusion module (FFM) to fuse the two-branch features and use the result for disparity estimation. Our network only uses 3D cost volumes and two-dimensional convolutions. Thus it is much faster than state-of-the-art stereo networks (e.g., 36× than GC-Net, 16× than PSMNet, and 72× than GA-Net). Meanwhile, according to the KITTI official website, our network is more accurate than other fast stereo networks (e.g., Fast DS-CS, RTSNet, and DispNetC), demonstrating that our network can generate a real-time and accurate stereo matching result.

Highlights

  • Stereo matching is the process of finding the pixels in the multiscopy views that correspond to the same 3D point in the scene

  • Considering the high computational cost caused by the 4D cost volume and 3D convolutions, we use the traditional method of census transform [9], [10] and the correlation1D layer proposed by DispNetC [16] to generate two 3D cost volumes with different resolutions and design a bidirectional cost aggregation network based on 2D convolutions

  • We design a novel Bidirectional Stereo Matching Real-time Network, which uses a two-branch network with different depths to aggregate the two cost volumes and design a feature fusion module to fuse the features generated by the two-branch network

Read more

Summary

INTRODUCTION

Stereo matching is the process of finding the pixels in the multiscopy views that correspond to the same 3D point in the scene. B. COST AGGREGATION Most current stereo matching networks use a single branch network to generate the dense disparity map. Considering the high computational cost caused by the 4D cost volume and 3D convolutions, we use the traditional method of census transform [9], [10] and the correlation1D layer proposed by DispNetC [16] to generate two 3D cost volumes with different resolutions and design a bidirectional cost aggregation network based on 2D convolutions. We design a novel Bidirectional Stereo Matching Real-time Network, which uses a two-branch network with different depths to aggregate the two cost volumes and design a feature fusion module to fuse the features generated by the two-branch network

RELATED WORK
TRAINING LOSS
EXPERIMENTS
NETWORK SETTINGS AND DETAILS
RESULTS ON SCENE FLOW
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call