Abstract

RGBD semantic segmentation is a popular task in computer vision with applications in autonomous vehicles and virtual reality. This problem is challenging due to the cluttered, dense and diverse scenes. To solve the loss of context information in dense semantic scene segmentation, we propose a novel architecture built on multi-scale feature representation that contains more global and local context cues. The multi-scale features, which are generated via aggregating 3D region features and sparse coding SIFT features extracted from multiresolution RGB and depth images, are fed into a softmax classifier to labeling each region produced by hierarchical segmentation with a predefined class, that is our final semantic scene segmentation. In addition, compared to the rough four categories predefined from the 894 pixel categories in NYUD2 dataset, we define the 40 detailed pixel classes that cover most common object categories and makes a fine-grained semantic segmentation. Extensive experiments on the standard NYUD2 benchmark demonstrate the effectiveness of our method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.