Abstract

RGB-D salient object detection (SOD) models based on a two-stream structure have achieved good performance in single-object scenes. In multi-object scenes, there is an inconsistent saliency problem between RGB modality and depth modality, which deteriorates the accuracy of subsequent fusion results. Inconsistent saliency is caused by the following issues: firstly, artifacts, missing depth values, and confusion in depth maps render depth modality unreliable, leading to increased reliance on RGB modality for results. Secondly, RGB modality and depth modality lack guidance in salient object detection. Thirdly, there is a lack of interaction between modalities. To address these issues, we first propose a depth recovery (DR) block to mitigate the negative effects of both the original and estimated depth maps. Next, we design the saliency detection (SD) block, which effectively guides each modality to focus on salient objects using semantic information. Meanwhile, SD combines multi-scale information to enhance the ability to detect multi-scale objects in each modality. Finally, a specific fusion block (SFB) is designed to fuse salient object information obtained from RGB and depth modalities. Quantitative and qualitative experiments demonstrate that our method achieves state-of-the-art (SOTA) performance among 10 methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call