Abstract

RGB-D saliency detection aims to comprehensively use RGB images and depth maps to detect object saliency. This field still faces two challenges: 1) how to extract representative multimodal features and 2) how to effectively fuse them. Most of the previous methods in this field equally treat RGB and depth information as two modalities, while not considering the difference in the frequency domain of the two modalities, and may lose some complementary information. In this paper, we introduce the frequency channel attention mechanism into the fusion process. First, we design a frequency-aware cross-modality attention (FACMA) module to interweave adequate channel features and select representative features. In the FACMA module, we also propose a spatial frequency channel attention (SFCA) module to introduce more complementary information in different channels. Second, we develop a weighted cross-modality fusion (WCMF) module to adaptively fuse multimodality features by learning the content-dependent weight maps. Comprehensive experiments on several benchmark datasets demonstrate that the proposed framework outperforms seventeen state-of-the-art methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call