Abstract

RGBD semantic segmentation is a popular task in computer vision with applications in autonomous vehicles and virtual reality. This problem is challenging due to the cluttered, dense and diverse scenes. To solve the loss of context information in dense semantic scene segmentation, we propose a novel architecture built on multi-scale feature representation that contains more global and local context cues. The multi-scale features, which are generated via aggregating 3D region features and sparse coding SIFT features extracted from multiresolution RGB and depth images, are fed into a softmax classifier to labeling each region produced by hierarchical segmentation with a predefined class, that is our final semantic scene segmentation. In addition, compared to the rough four categories predefined from the 894 pixel categories in NYUD2 dataset, we define the 40 detailed pixel classes that cover most common object categories and makes a fine-grained semantic segmentation. Extensive experiments on the standard NYUD2 benchmark demonstrate the effectiveness of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call