Abstract

With the availability of consumer RGB-D sensors, lots of research uses both color information and depth information for semantic segmentation. However, most previous studies simply fuse RGB features and depth features with equal-weight concatenating or summing, which may fail to effectively use complementary information between RGB information and depth information. On the other hand, previous works construct multi-scale representation by utilizing multi-scale convolution kernel with the fixed-parameter, which may lead to parameter redundancy and fail to perform online self-adaption. To effectively utilize the internal context information of multi-modal features, an RGB-D image semantic segmentation network is proposed by introducing a multi-modal adaptive convolution module. The multi-scale adaptive convolution kernel is generated dynamically, and the context information of multi-modal features is embedded into the multi-scale convolution filters effectively. Compared with the traditional multi-scale convolution kernel, proposed method has higher computational efficiency and better accuracy. Experimental results on the public RGB-D indoor semantic segmentation datasets SUN RGB-D and NYU Depth v2 show that the pixel accuracy, mean pixel accuracy, and mean IoU of proposed method is 82.5%, 62.0%, 50.6% and 77.1%, 64.2%, 50.8% Respectively, which outperforms all existing RGB-D semantic segmentation methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call