Abstract

Despite the wide success in many vision tasks, it is still challenging for Convolutional Neural Networks (CNNs) to perform saliency detection due to their limited receptive fields and lack of enough discriminative contexts until very late layers. In this paper, beyond spatial convolution, we propose a Spatio-Frequency Network (SFNet) that exploits spatio-frequency clues to effectively enlarge the receptive fields of CNN layers and more importantly, strengthen their spatial discrimination for better saliency detection. In particular, the proposed SFNet contains a carefully designed Frequency Residual Module (FRM) that captures the holistic representation of the whole image within the frequency domain. The FRM leverages discrete and inverse discrete wavelet transformation to alternatively transfer global spatial features into frequency domains, to assist fast and accurate salient object detection. Besides, SFNet also includes an Aggregation of Frequency and Spatial Feature (AFSF) module to jointly integrate the two domain features guided by saliency results in a top-down manner. In this way, the aggregation features per layer contain rich holistic contexts, and the network can eventually explore more complete salient object parts and details by progressively integrating saliency predictions. Extensive experiments on six widely-used saliency detection datasets clearly demonstrate the advantages of our proposed model compared with state-of-the-arts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call