Abstract

Salient Object Detection (SOD) has witnessed remarkable improvement during the past decade. However, RGB-based SOD methods may fail for real-world applications in some extreme environments like low-light conditions and cluttered backgrounds. Thermal (T) images can capture the heat radiation from the surface of the objects and overcome such extreme situations. Therefore, some researchers introduce the T modality to the SOD task. Existing RGB-T SOD methods fail to explicitly explore multi-scale complementary saliency cues from dual modalities and lack the full explorations of individual RGB and T modalities. To deal with such problems, we propose the Three-stream Interaction Decoder Network (TIDNet) for the RGB-T SOD task. Specifically, the feature maps from the encoder branches are fed to the three-stream interaction decoder for in-depth saliency exploration, catching the single modality and multi-modality saliency cues. For single modality decoder streams, Contextual-enhanced Channel Reduction units (CCR) firstly reduce the channel dimension of feature maps from RGB and T modalities, reducing the computational burden and discriminatively enriching the multi-scale information. For the multi-modality decoder stream, Multi-scale Cross Modality Fusion (MCMF) unit is proposed to explore the complementary multi-scale information from RGB and T modalities. Then Internal and Multiple Decoder Interaction (IMDI) units further dig the specified and complementary saliency cues from the three-stream decoder. Three-stream deep supervision has been deployed on each feature level to facilitate the training strategy. Comprehensive experiments show our method outperforms fifteen state-of-the-art methods in terms of seven metrics. The codes and models are available at https://github.com/huofushuo/TIDNet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call