Multi-Attention Network for Stereo Matching

Xiaowei Yang,Lin He,Xian Jing Cheng,Zu Liu Yang,Haiwei Sang,Yong Zhao

doi:10.1109/access.2020.3003375

Abstract

In recent years, convolutional neural network (CNN) algorithms promote the development of stereo matching and make great progress, but some mismatches still occur in textureless, occluded and reflective regions. In feature extraction and cost aggregation, CNNs will greatly improve the accuracy of stereo matching by utilizing global context information and high-quality feature representations. In this paper, we design a novel end-to-end stereo matching algorithm named Multi-Attention Network (MAN). To obtain the global context information in detail at the pixel-level, we propose a Multi-Scale Attention Module (MSAM), combining a spatial pyramid module with an attention mechanism, when we extract the image features. In addition, we introduce a feature refinement module (FRM) and a 3D attention aggregation module (3D AAM) during cost aggregation so that the network can extract informative features with high representational ability and high-quality channel attention vectors. Finally, we obtain the final disparity through bilinear interpolation and disparity regression. We evaluate our method on the Scene Flow, KITTI 2012 and KITTI 2015 stereo datasets. The experimental results show that our method achieves state-of-the-art performance and that every component of our network is effective.

Highlights

Binocular stereo vision simulates the operating principle of biological vision systems
To make better use the global context information for stereo matching, we propose a novel convolutional neural network
We introduce an image feature refinement module to enhance the representation of feature maps at each stage

Summary

INTRODUCTION

Binocular stereo vision simulates the operating principle of biological vision systems. The MC-CNN scene disparity estimation method proposed by Zbontar et al [24] pioneered a Siamese network to compute the similarity between two image patches for stereo matching. Mayer et al created a large synthetic dataset to train an end-to-end network called DispNet [26] to estimate disparity; DispNet consists of a set of convolution layers to extract features, a cost volume formed by patch-wise correlation, an encoder-decoder structure for the second-stage process, and a classification layer to estimate disparity. To make better use the global context information for stereo matching, we propose a novel convolutional neural network. We introduce a 3D aggregation attention module, which can use high-level information to guide low-level texture information and identify high-quality channel attention vector features. Our MAN achieves state-of-the-art performance on the Scene Flow dataset, KITTI stereo 2012 and KITTI stereo 2015 benchmarks

RELATED WORK

FEATURE EXTRACTION

DISPARITY REGRESSION AND LOSS FUNCTION

EXPERIMENTS AND DISCUSSION

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multi-Attention Network for Stereo Matching

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

CTHNet: a network for wheat ear counting with local-global features fusion based on hybrid architecture.
Qingqing Hong ... Changwei Tan
Frontiers in plant science | VOL. 15
Qingqing Hong, et. al.Qingqing Hong ... Changwei Tan
02 Jul 2024
Frontiers in plant science | VOL. 15

HDAM: Heuristic Difference Attention Module for Convolutional Neural Networks
Yu Xue ... Ziming Yuan
Journal on Internet of Things | VOL. 4
Yu Xue, et. al.Yu Xue ... Ziming Yuan
01 Jan 2021
Journal on Internet of Things | VOL. 4

A teacher-student based attention network for fine-grained image recognition
Ang Li ... Bin Kang
Digital Communications and Networks | VOL. -
Ang Li, et. al.Ang Li ... Bin Kang
01 Feb 2023
Digital Communications and Networks | VOL. -

Convolutional cost aggregation for robust stereo matching
Somi Jeong ... Kwanghoon Sohn
-
Somi Jeong, et. al.Somi Jeong ... Kwanghoon Sohn
01 Sep 2017
01 Sep 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Attention Network for Stereo Matching

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access