Abstract
With the availability of consumer RGB-D sensors, lots of research uses both color information and depth information for semantic segmentation. However, most previous studies simply fuse RGB features and depth features with equal-weight concatenating or summing, which may fail to effectively use complementary information between RGB information and depth information. On the other hand, previous works construct multi-scale representation by utilizing multi-scale convolution kernel with the fixed-parameter, which may lead to parameter redundancy and fail to perform online self-adaption. To effectively utilize the internal context information of multi-modal features, an RGB-D image semantic segmentation network is proposed by introducing a multi-modal adaptive convolution module. The multi-scale adaptive convolution kernel is generated dynamically, and the context information of multi-modal features is embedded into the multi-scale convolution filters effectively. Compared with the traditional multi-scale convolution kernel, proposed method has higher computational efficiency and better accuracy. Experimental results on the public RGB-D indoor semantic segmentation datasets SUN RGB-D and NYU Depth v2 show that the pixel accuracy, mean pixel accuracy, and mean IoU of proposed method is 82.5%, 62.0%, 50.6% and 77.1%, 64.2%, 50.8% Respectively, which outperforms all existing RGB-D semantic segmentation methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Computer-Aided Design & Computer Graphics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.