Abstract

Fully convolutional neural networks (FCNs) have shown their advantages in the salient object detection task. However, the prediction results do not perform well in most existing FCN-based methods, such as coarse object boundaries or even getting wrong predictions, which resulted from ignoring the difference between multi-level features during feature aggregation or underutilizing the spatial details suitable for locating boundaries. In this paper, we propose a novel end-to-end multi-level context aggregation network (MLCANet) to solve the problem mentioned-above, in which both bottom-up and top-down message passing can cooperate in a joint manner.The bottom-up process that aggregates low-level fine details features into high-level semantically-richer features would enhance high-level features, and in turn the top-down process that passes refined features from deeper layers to the shallower ones could benefit from the enhanced high-level features. Also by considering that the features from different layers may not be equally important, a multi-level feature aggregation mechanism with channel-wise attention is proposed to aggregate multi-level features by flexibly adjusting their contributions and absorbing useful information to refine themselves. The features after message passing which simultaneously encode semantic information and spatial details are used to predict saliency maps in our network. Extensive experiments demonstrate that our method can obtain high quality saliency maps with clear boundaries, and perform favorably against the state-of-the-art methods without any pre-processing and post-processing.

Highlights

  • The goal of salient object detection (SOD) is to find one or more objects which attract most attention in the given image or video and segment these objects out

  • To demonstrate the performance of multi-level context aggregation network (MLCANet), we report our experimental results on five popular SOD datasets and compare the results of our method with other 13 state-ofthe-art salient object detection networks

  • Experimental results demonstrate that the proposed model MLCANet achieves the state-of-the-art performance on five datasets, which proves the effectiveness and superiority of the proposed method

Read more

Summary

INTRODUCTION

The goal of salient object detection (SOD) is to find one or more objects which attract most attention in the given image or video and segment these objects out. Traditional methods mostly use hand-crafted features (e.g. colors, texture, contrast, or others) to capture local details and global context separately or simultaneously These hand-crafted features can hardly capture high-level semantic relations and context information, they usually fail to detect salient objects with complex scenes. We propose a novel grid-like Multi-level Context Aggregation Network that integrates both bottom-up and top-down message passing in a joint and cooperative manner, whose saliency estimation inference is in a coarse-to-fine manner by gradually integrating upperlayer saliency estimates with inferior-layer features. Experimental results demonstrate that the proposed model MLCANet achieves the state-of-the-art performance on five datasets, which proves the effectiveness and superiority of the proposed method This rest of paper is organized as follows: Section 2 briefly presents the related work.

RELATED WORK
JOINT MULTI-LEVEL COARSE-TO-FINE
EXPERIMENTS
IMPLEMENTATION DETAILS
DATASETS AND EVALUATION METRIC
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call