Abstract

Aiming at the issues of insufficient cross-modality feature interaction and ineffective utilization of cross-modality data in RGB-D saliency object detection (SOD) tasks, we propose a Bidirectional Attentional Interaction Network (BAINet) for RGB-D SOD, which employs an encoder-decoder structure for bidirectional interaction of cross-modality features through a dual-branch progressive fusion approach. To begin with, based on the fact that RGB and depth information streams can complement each other, the bidirectional attention interaction module accomplishes bidirectional interaction between cross-modality features by capturing complementary cues from different modality data. In order to enhance the expressiveness of the fused RGB-D features, the global feature perception module endows the features with rich multi-scale contextual semantic information by enlarging the field of perception. In addition, exploring the correlation of cross-level features is vital to achieve accurate salient inference. Specifically, We introduce a cross-level guidance aggregation module to capture inter-layer dependencies and complete the integration of cross-level features, which effectively suppresses shallow cross-modality features and refines the saliency map during decoding. To improve the model training speed, a hybrid loss function is employed to train multi-branch saliency inference maps simultaneously. Extensive experiments on five publicly available datasets clearly show that the proposed model outperforms 18 state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call