Confidence Aware Stereo Matching for Realistic Cluttered Scenario
Recent advancements in stereo matching algorithms, driven by new deep neural architectures, have revitalized interest in binocular stereo. However, the effective use of stereo vision in applications demanding precise 3D sensing with high confidence remains challenging. In this paper, we introduce a novel deep stereo matching method designed to address this challenge efficiently. Our approach estimates disparities using implicitly inferred confidence levels. This capability is facilitated by our newly developed U-net transformer, which incorporates various attention mechanisms to extract global and local contexts from rectified image pairs. Additionally, we present a novel real-world stereo dataset captured using a commercially available stereo sensor. This dataset includes challenging scenes featuring diverse objects, each annotated with accurate and dense ground truth disparities. Our dataset includes 1,000 scenes across 20 different object categories, with each scene consisting of both active and passive combinations. Through various experiments, we demonstrate the superiority of our proposed matching algorithm and dataset.