Abstract

Scene parsing is a very challenging work for complex and diverse scenes. In this study, the authors address the problem of semantic segmentation of indoor scenes for red, green, blue‐depth (RGB‐D) images. Most existing works use only the colour or photometric information for this problem. Here, they present an approach to fusing feature maps between colour network branch and depth network branch to integrate the photometric information and geometric information, which improves the semantic segmentation performance. They propose a novel convolutional neural network that uses ResNet as a baseline network. Their proposed network adopts a spatial pyramid pooling module to make full use of different sub‐region representations. Their proposed network also adopts multiple feature maps fusion modules to integrate texture and structure information between the colour branch and depth branch. Moreover, their proposed network has multiple auxiliary loss branches together with the main loss function to prevent the gradient of frontal layers disappear and accelerate the training phase of the fusion part. Comprehensive experimental evaluations show that their proposed network ‘ResFusion’ improves the performance greatly over the baseline network and has achieved competitive performance compared with other state‐of‐the‐art methods on the challenging SUN RGB‐D benchmark.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.