Abstract

AbstractThe introduction and popularity of depth maps have brought new vitality and growth into salient object detection (SOD), and plentiful RGB-D SOD methods have been proposed, which mainly focus on how to utilize and integrate the depth map. Although existing methods have achieved promising performance, the negative effects of low-quality depth maps have not been effectively addressed. In this paper, we solve the problem with a strategy of judging low-quality depth maps and assigning low factors to low-quality depth maps. To this end, we proposed a novel Transformer-based SOD framework, namely Depth-aware Assessment and Synthesis Transformer (DAST), to further improve the performance of RGB-D SOD. The proposed DAST involves two primary designs: 1) a Swin Transformer-based encoder is employed instead of a convolutional neural network for more effective feature extraction and long-range dependencies capture; 2) a Depth Assessment and Synthesis (DAS) module is proposed to judge the quality of depth maps and fuse the multi-modality salient features by computing the difference of saliency maps from RGB and depth streams in a coarse-to-fine manner. Extensive experiments on five benchmark datasets demonstrate that the proposed DAST achieves favorable performance as compared with other state-of-the-art (SOTA) methods.KeywordsSalient object detectionSwin transformerLow-qualityDepth mapAssessment and synthesis

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call