Abstract

RGB-thermal salient object detection (RGB-T SOD) has unique advantages in terms of handling challenging scenes with cluttered backgrounds, low illumination, and low contrast. However, because they do not consider the significant differences between different imaging mechanisms and inherent characteristics of thermal images, existing RGB-T SOD methods are generally unable to handle diverse feature fusion demands and may yield unsatisfactory performance. To overcome this problem and achieve more effective RGB-T SOD, we propose an asymmetric cross-modal activation network to exploit the interactions of modality-specific features based on an asymmetric feature fusion strategy. Specifically, a two-stream asymmetric feature aggregation encoder module is proposed to fuse multimodality features adaptively and extract complementary information. The self-attention of multimodality features is leveraged to guide cross-modal interactions, which can propagate long-range contextual dependencies and extract effective saliency cues. Furthermore, a multitask decoder is proposed to achieve SOD and thermal image reconstruction in a unified framework. Salient objects can be located and segmented accurately based on reconstructed high-resolution feature representations. Extensive experiments on public RGB-T and RGB-D SOD datasets demonstrate the superiority of the proposed network and ablation experiments highlight the effectiveness of each component. Our code and saliency maps are available at: www.github.com/xanxuso/ACMANet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call