Abstract
Deep encoder-decoder networks have been adopted for saliency detection and achieved state-of-the-art performance. However, most existing saliency models usually fail to detect very small salient objects. In this paper, we propose a multitask architecture, M2Net, and a novel centerness-aware loss for salient object detection. The proposed M2Net aims to solve saliency prediction and centerness prediction simultaneously. Specifically, the network architecture is composed of a bottom-up encoder module, top-down decoder module, and centerness prediction module. In addition, different from binary cross entropy, the proposed centerness-aware loss can guide the proposed M2Net to uniformly highlight the entire salient regions with well-defined object boundaries. Experimental results on five benchmark saliency datasets demonstrate that M2Net outperforms state-of-the-art methods on different evaluation metrics.
Highlights
Salient object detection (SOD) [1,2,3] aims to extract the most visually distinctive objects in an image or video
Our contributions are as follows: (i) We propose a multiscale and multitask deep framework with a centerness-aware loss for salient object detection. e M2Net consists of encode module, decoder module, and centerness prediction module
We introduce the centerness to the saliency detection
Summary
Salient object detection (SOD) [1,2,3] aims to extract the most visually distinctive objects in an image or video. Saliency detection results often serve as the first step for a variety of downstream computer vision tasks, including object recognition [4], visual tracking [5], image retrieval [6], no-reference synthetic image quality assessment [7], robot navigation [8] image and video compression [9, 10], and object discovery [11,12,13]. Earlier SOD methods mostly rely on hand-crafted features (e.g., color, brightness, and texture) to produce saliency maps. These low-level features can hardly capture high-level semantic information and are not robust enough to various complex scenarios. Encoder-decoder framework [3, 15,16,17,18,19] is frequently used to extract and combine enriched feature blocks and can generate more accurate saliency maps
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.