Abstract

RGB-D salient object detection aims to separate salient object from an image aided by depth. While a number of effective approaches have been proposed, difficulties still exist, which is due to two challenges: (1) It is difficult to fully and effectively fuse RGB and depth features especially in challenging scenes; (2) How to enhance the semantic information of low-level feature and enrich the spatial information of high-level feature. Most of the existing approaches design separate modules to address them. In this paper, a unified discriminative feature fusion module is proposed to be used for both multimodal and multiscale feature fusion. The module can also increase the semantic information in low-level features and enrich the spatial information in high-level features. A multi-scale contextual perception module is embedded in the network to accurately localize objects at different scales. Unlike other methods, the depth branch in the network uses pure convolution for complementary feature extraction. This paper conducted a comparison with 14 state-of-the-art methods on 8 datasets, and the experimental results suggest that the proposed approach is more effective and superior.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call