Abstract

Scene understanding is one of the foundations for robots to achieve true artificial intelligence. Semantic segmentation that imitates the mechanism of human visual system can effectively promote the correctness of scene understanding. It conforms to the basic principle of human environment perception and enables robots to better serve human society. In this paper, we propose a multilevel cross-aware network (MCA-Net) for RGBD semantic segmentation. It utilizes basic residual structure to encode texture information and depth geometric information respectively. In view of the flow pattern of different visual features in the visual pathway of human brain, our MCA-Net attempts to jointly reason about 2D appearance and depth geometric information. Therefore, multilevel cross-aware fusion modules are designed to fuse multi-scale complementary features extracted from RGB and depth images. As the depth and color information are independent, the reasonable combination of depth and RGB images can improve the quality of semantic labeling. Experiments conducted on ScanNetv2 dataset show that the proposed network produces high quality segmentation results of RGBD images and outperforms state-of-the-art methods. Furthermore, the semantic labeling results on the bionic binocular robot in real-office scenes further demonstrate the effectiveness of the proposed MCA-Net.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.