Abstract

Binocular stereo matching, a computer vision task typically using cost volume constructed from the left and right feature maps to estimate disparity and depth, is widely applied in 3D reconstruction, autonomous driving and robotics navigation. Though recent study brings an awareness of the convolution neural networks and the attention algorithms used in this field can make great progress, it is still difficult to satisfy the demand of high-precision applications due to many reasons. Study finds that the exist methods usually incline to ignore the intermediate feature map of other scales, pay less attention to the relationship between left and right feature maps and even just tend to use one type of cost volume to train the model. In this article, we mainly focus on solving the three problems mentioned above. Firstly, we present the Multi-scale Feature Extraction and Fusion Module(MFEFM) to get the informational feature maps via fusing all scale feature maps. And then we design the Effective Channel Attention Module(ECAM) applied to better capture and utilize the channel-wise independencies. Finally, we adopt the Hybrid Cost Volume Computation Module(HCVCM) to construct and aggregate cost volume. With these solutions, we build an end-to-end stereo matching network named HCVNet. Comparison with other state-of-the-art models, it can achieve 0.714px EPE on SceneFlow dataset, descending PSMNet(1.09px EPE) by 37.6%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call