Abstract

Existing stereo matching networks based on deep learning lack multi-level and multi-module attention and integration for feature information. Therefore, we propose an attention-guided aggregation stereo matching network to encode and integrate information multiple times. Specifically, we design a residual network based on the 2D channel attention block to adaptively calibrate weight response, improving the robustness of the feature representation. We also construct a 3D stacked hourglass structure based on the 3D channel attention block to calibrate the weight response of the 4D cost volume in the channel dimension, further enhancing the network guidance and aggregation capabilities. In addition, we introduce a 4D guided cost volume, which pre-groups the extracted image features and exploits the similarity measures in each group to guide the concatenation features, further realizing interactive learning of cost volume. The experimental results on the Scene Flow and KITTI benchmark datasets showed that the proposed network significantly improves the prediction disparity accuracy with a small increase in calculation time.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.