Though with the rapid development of deep learning, salient object detection methods have achieved increasingly better performance, how to get effective feature representation to predict more accurate saliency maps is still a burning problem we need to consider. To overcome this situation, most previous works tend to focus on skip-based architecture to integrate hierarchical information of different scales and layers. However, a simple concatenation of high-level features and low-level features is not all-powerful because cluttered and noisy information can cause negative consequences. Concerning the issue mentioned above, we propose a Multi-Attention guided Feature-fusion network (MAF) which can alleviate the problem from two aspects. For one thing, we use a novel Channel-wise Attention Block (CAB) to in charge of message passing layer by layer from a global view, which utilizes the semantic cues in the higher convolutional block to instruct the feature selection in the lower block. For another, a Position Attention Block (PAB) also works on integrated features to model pixel relationships and capture rich contextual dependencies. Under the guidance of multi-attention, discriminative features are selected to conduct a new end-to-end densely supervised encoder-decoder network which detects salient objects more uniformly and precisely. As the experimental results on five benchmark datasets show, our methods perform favorably against other state-of-the-art approaches.
Read full abstract