Abstract

The ability of capturing the complementary information of multi-modality data is critical to the development of multi-modality salient object detection (SOD). Most of existing studies attempt to integrate multi-modality information through various fusion strategies. However, most of these methods ignore the inherent differences in multi-modality data, resulting in poor performance when dealing with some challenging scenarios. In this paper, we propose a novel Modality-Induced Transfer-Fusion Network (MITF-Net) for RGB-D and RGB-T SOD by fully exploring the complementarity in multi-modality data. Specifically, we first deploy a modality transfer fusion (MTF) module to bridge the semantic gap between single and multi-modality data, and then mine the cross-modality complementarity based on point-to-point structural similarity information. Then, we design a cycle-separated attention (CSA) module to optimize the cross-layer information recurrently, and measure the effectiveness of cross-layer features through point-wise convolution-based multi-scale channel attention. Furthermore, we refine the boundaries in the decoding stage to obtain high-quality saliency maps with sharp boundaries. Extensive experiments on 13 RGB-D and RGB-T SOD datasets show that the proposed MITF-Net achieves a competitive and excellent performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call