Abstract

Multi-layer feature integration has demonstrated its superiority in salient object detection. However, the salient regions generated by most models still suffer from inconsistencies and coarse region boundaries. In this paper, to solve these two problems, we proposed a wider and finer network named WFNet. Firstly, a wider feature enhancement module (WFE) is designed to expand the receptive fields of deep semantic features, which makes the network look wider and more accurate while locating salient regions. Secondly, to improve the regional continuity and reduce background noise, we introduce a finer feature fusion module (F3M) which consists of scale-invariant average pooling and detailed feature integration module with channel-wise attention. Finally, we propose an edge-region complementary strategy (ERC) and an edge-focused loss (EL), which can supplement the diluted deep semantics and let the network pay more attention to boundary pixels of salient objects. Benefit from rich deep semantics and more detailed edge features, WFNet can predict saliency maps with clear boundaries under the guidance of edge-focused loss. Experimental results prove that the proposed method outperforms state-of-the-art methods on five benchmarks without any post-processing.

Highlights

  • Salient object detection (SOD) aims to find the most distinctive regions which draw human visual attention in natural scene images or videos

  • To address the above problems, we proposed a novel WFNet with four modules, the wider feature enhancement module (WFE), the finer feature fusion module (F3M), regionedge complementary module (ERC), and edge-focused loss (EL)

  • The main contributions of this paper can be summarized as follows: (1) We propose a WFE module built with a dilation convolution group and an efficient fusion strategy, which makes deep features have larger receptive fields but keep the same scale

Read more

Summary

INTRODUCTION

Salient object detection (SOD) aims to find the most distinctive regions which draw human visual attention in natural scene images or videos. Wei et al proposed a novel U-Net based networks to extract multi-scale features and focus on boundary pixels [23], which can effectively fuse features from different levels and weaken the noise. To supplement semantics and make network focus on object boundaries, we introduce an edgeregion complementary strategy and an edge-focused loss function to predict salient maps with clear boundaries accurately. We use both global average pooling and global maximum pooling on the same features to generate feature maps with different semantics. The first item is the mean of the two main predictions’ loss, and the second corresponds to the weighted sum of the four sub-side predictions’ loss

EXPERIMENTS AND ANALYSIS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.