Abstract
In recent years, convolutional neural networks have effectively improved the accuracy of semantic segmentation tasks. However, semantic segmentation of indoor scenes is still a challenging problem due to the complexity of indoor environments. With the advent of depth sensors, the use of depth information to improve semantic segmentation has been considered. Most of the previous studies simply use equal-weight stitching or summation operations to fuse RGB features and depth features, failing to make full use of the complementary information between RGB features and depth features. In this paper, we propose the network of attention-aware multimodal fusion for RGB-D indoor semantic segmentation method. By designing an attention-aware multimodal fusion module, the multilevel RGB features and depth features are effectively fused. The cross-modal attention mechanism is also introduced in the multimodal fusion module, and the features are fused at different scales so that RGB features and depth features can guide and optimize each other using complementary information, thus extracting feature representations rich in spatial location information. The method can effectively promote the synergistic interaction of multimodal features and further improve the effect of semantic segmentation. Experimental results on publicly available datasets SUN RGB-D, NYU Depth v2 and ScanNet show that our designed algorithm outperforms other RGB-D image semantic segmentation algorithms.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.