Abstract
Red-green-blue and depth (RGB-D) semantic segmentation is essential for indoor service robots to achieve accurate artificial intelligence. Various RGB-D indoor semantic segmentation methods have been proposed since the widespread adoption of depth maps. These methods have focused mainly on integrating the multiscale and crossmodal features extracted from RGB images and depth maps in the encoder and used unified strategies to recover the local details at the decoder progressively. However, these methods emphasized crossmodal fusion at the encoder, neglecting the distinguishability between RGB and depth features during decoding, thereby undermining the segmentation performance. To adequately exploit the features, we propose an efficient encoder-decoder architecture called asymmetric multiscale and crossmodal fusion network (AMCFNet) endowed with a differential feature integration strategy. Unlike existing methods, we use simple crossmodal fusion at the encoder and design an elaborate decoder to improve the semantic segmentation performance. Specifically, considering high- and low-level features, we propose a semantic aggregation module (SAM) to process the multiscale and crossmodal features in the last three network layers to aggregate high-level semantic information through a cascaded pyramid structure. Moreover, we design a spatial detail supplement module using low-level spatial details from depth maps to adaptively fuse these details and the information obtained from the SAM. Extensive experiments are conducted to demonstrate that the proposed AMCFNet outperforms state-of-the-art approaches.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Visual Communication and Image Representation
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.