Abstract

RGB-based salient object detection (SOD) algorithms have shown good ability to segment salient objects from images, but the performance is still unsatisfactory when dealing with challenging scenes such as ambiguous object contours, low color contrasts between foreground and background. To overcome this problem, RGB-D or RGB-T SOD has been studied. However, they are currently usually treated as separate visual tasks. And most of them directly extract and fuse raw features from backbones. In this paper, we explore the potential commonalities between the two tasks and propose a novel end-to-end unified framework that can be used for both RGB-D and RGB-T SOD. The framework consists of three key components: multi-modal interactive attention (MIA) unit, joint attention guided cross-modal decoding (JAGCD) module, and multi-level feature progressive decoding (MFPD) module. Specifically, MIA units effectively capture rich multi-layered context features from each modality feature, which serve as a bridge between feature encoding and cross-modal decoding. Moreover, the proposed JAGCD and MFPD modules progressively integrate complementary features from multi-source features and different level of fusion features, respectively. To demonstrate the effectiveness of the proposed approach, we conduct comprehensive experiments not only on RGB-D but also on RGB-T saliency detection benchmark. Experimental results show that our approach outperforms other state-of-the-art methods and has good generalization. Moreover, the proposed framework can provide a potential solution for cross-modal complementary tasks. The code will be available at https://github.com/Liangyh18/MIA_DPD.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call