Abstract

RGB-D indoor multiclass scene understanding is a pixelwise task that interprets RGB-D images using depth information to improve the RGB features for higher performance. We propose a novel asymmetric encoder structure for RGB-D indoor scene understanding that uses a reverse fusion network (RFNet) with an attention mechanism and a simplified feature extraction block. Specifically, the pre-trained ResNet34 and VGG16 networks (two asymmetric input streams) are used as the backbone for the information extraction paths as well as additive fusion and attention modules that further enhance network performance. The strong feature extraction ability of classical networks and the advantages of two-way reverse fusion enable this novel semantic segmentation network to narrow the gap between low- and high-level features, such that the features are better merged for segmentation. We achieved segmentation performances (MIoU) of 53.5% and 50.7% on the SUN RGB-D and NYUDv2 datasets, respectively, thereby outperforming other state-of-the-art approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.