Abstract

With the development of deep learning, salient object detection methods have made great progress. However, there are still two challenges: 1) The lack of rich features extracted from multiple perspectives at different encoder levels results in the omission of salient objects with varying scales. 2) The ineffective fusion of multi-level features during decoding dilutes the saliency features, which destroys the purity of the predicted maps. In this paper, we design a Condensing-and-Filtering Network (CFNet), in which a saliency pyramid condensing module (SPCM) and a saliency filtering module (SFM) are proposed to solve the above two problems respectively. Specifically, SPCM introduces pyramid convolution as the basic unit to condense full-scale features from global and local perspectives at each level of the encoder. SFM is equipped with an ingenious ‘funnel’ structure to effectively filter multi-level features and supplement details, which makes the fusion of features more robust. The two modules complement each other, so that the full-scale features can be used effectively to predict salient objects. Extensive experimental results on five benchmark datasets demonstrate that our method performs favourably against the state-of-the-art approaches, and also shows superiority in terms of speed (16.18ms) and FLOPs (21.19G). Meanwhile, we extend our CFNet to the task of RGB-D salient object detection and achieve better results, which further demonstrate its effectiveness. The code will be made available.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call