Abstract

In recent years, depth maps have shown compelling performance as complementary information in semantic segmentation of indoor scenes. This benefits greatly from geometric relationships corresponding to objects captured by the depth sensor. However, the hole or sparse characteristics in depth maps directly fused with RGB images lead to reduced accuracy of the segmentation model. Therefore, it is necessary to design an efficient feature fusion complementary module and dynamically adjust the weight of feature fusion between two modalities according to the quality of the input image to avoid irreparable depth defects, and effectively utilize cross-modality correlation for further correction of depth map noise. To tackle the above challenges, we proposed the attention-guided adaptive channel shuffle gate and feature warp network (AGWNet) for indoor scene RGB-D semantic segmentation with low-quality depth maps. Specifically, our network efficiently captures accurate features in both RGB-D modalities using gating and channel fusion attention modules. Furthermore, the feature fusion is rectified by a multilevel feature correction and alignment module through skip layers to the decoder. Extensive quantitative and qualitative evaluations on the NYU-Depth V2 and SUNRGB-D datasets show that our model outperforms previous state-of-the-art RGB-D semantic segmentation methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call