Feature-Guided Spatial Attention Upsampling for Real-Time Stereo Matching Network

Yun Xie,Weihua Li,Shaowu Zheng

doi:10.1109/mmul.2020.3030027

Yun Xie, Weihua Li + Show 1 more

Open Access

https://doi.org/10.1109/mmul.2020.3030027

Copy DOI

Journal: IEEE MultiMedia	Publication Date: Jan 1, 2021
Citations: 8	License type: publisher-specific, author manuscript

Affiliation: South China University of Technology

Abstract

In this article, we propose an end-to-end real-time stereo matching network (RTSMNet). RTSMNet consists of three modules. The global and local feature extraction (GLFE) module captures the hierarchical context information and generates the coarse cost volume. The initial disparity estimation module is a compact three-dimensional convolution architecture aiming to produce the low-resolution (LR) disparity map rapidly. The feature-guided spatial attention upsampling module takes the LR disparity map and the shared features from the GLFE module as guidance, first estimates residual disparity values and then an attention mechanism is developed to generate context-aware adaptive kernels for each upsampled pixel. The adaptive kernels emphasize higher attention weights on the reliable area, which can significantly reduce blurred edges and recover thin structures. The proposed networks achieve 66 ∼ 175 fps on a 2080Ti and 11 ∼ 42 fps on edge computing devices, with competitive accuracy compared to state-of-the-art methods on multiple benchmarks.

Full Text