Abstract

AbstractStereo matching is a fundamental and long‐standing task in computer vision. Although learning‐based stereo matching algorithms have made remarkable progress, two major challenges still persist. Firstly, existing cost aggregation methods that use stacked three‐dimensional convolutions are complex, leading to heavy computation and memory costs. Secondly these methods continue to struggle with establishing reliable matches in weakly matchable such as that edges and thin structures. To overcome these limitations, we propose an accurate and efficient network called Attention‐guided Aggregation and Error‐aware Enhancement Network (AAEE‐Net). Our approach involves designing an Attention‐guided Aggregation Mechanism (AAM) based on simple image features. This mechanism uses attention weights generated from image features to guide cost aggregation with a more efficient and effective strategy. Additionally, we propose an Error‐aware Enhancement Module (EEM) that refines the raw disparity by combining high‐frequency information from the original image and warp error between the left and right views. EEM enables the network to learn error correction capabilities that produce excellent subtle details and sharp edges. The experimental results on the SceneFlow and KITTI benchmark datasets demonstrate that AAEE‐Net achieves state‐of‐the‐art performance with low inference time. The qualitative results show that AAEE‐Net significantly improves predictions, especially for thin structures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call