Abstract

Deep encoder-decoder networks have been adopted for saliency detection and achieved state-of-the-art performance. However, most existing saliency models usually fail to detect very small salient objects. In this paper, we propose a multitask architecture, M2Net, and a novel centerness-aware loss for salient object detection. The proposed M2Net aims to solve saliency prediction and centerness prediction simultaneously. Specifically, the network architecture is composed of a bottom-up encoder module, top-down decoder module, and centerness prediction module. In addition, different from binary cross entropy, the proposed centerness-aware loss can guide the proposed M2Net to uniformly highlight the entire salient regions with well-defined object boundaries. Experimental results on five benchmark saliency datasets demonstrate that M2Net outperforms state-of-the-art methods on different evaluation metrics.

Highlights

  • Salient object detection (SOD) [1,2,3] aims to extract the most visually distinctive objects in an image or video

  • Our contributions are as follows: (i) We propose a multiscale and multitask deep framework with a centerness-aware loss for salient object detection. e M2Net consists of encode module, decoder module, and centerness prediction module

  • We introduce the centerness to the saliency detection

Read more

Summary

Introduction

Salient object detection (SOD) [1,2,3] aims to extract the most visually distinctive objects in an image or video. Saliency detection results often serve as the first step for a variety of downstream computer vision tasks, including object recognition [4], visual tracking [5], image retrieval [6], no-reference synthetic image quality assessment [7], robot navigation [8] image and video compression [9, 10], and object discovery [11,12,13]. Earlier SOD methods mostly rely on hand-crafted features (e.g., color, brightness, and texture) to produce saliency maps. These low-level features can hardly capture high-level semantic information and are not robust enough to various complex scenarios. Encoder-decoder framework [3, 15,16,17,18,19] is frequently used to extract and combine enriched feature blocks and can generate more accurate saliency maps

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call