Abstract

To obtain more accurate saliency maps, current methods mainly focus on aggregating multi-level features with structures like U-Net and introducing edge information as auxiliary supervision. Different from the focus of existing methods, in the paper, we study the different roles of semantics and details in saliency detection. The task is decomposed into two parallel sub-tasks: internal semantic estimation and boundary detail prediction, and these sub-goals are optimized simultaneously via explicit constraints. Specifically, we propose a novel semantic and detail collaborative learning network (SDCLNet) for salient object detection. To this end, a backbone network (e.g., VGG-16) with an additional layer is first adopted as a shared encoder to extract features from each image. And then two asymmetric decoders without bells and whistles are designed, in which the semantic decoder generates a coarse semantic mask, and the detail decoder generates a fine-grained object boundary. Finally, a collaborative learning block with some meaningful design adaptively selects the discriminative features to undertake the task of saliency prediction. In this way, both the semantic and detailed information can effectively fused, and the final accurate and consistent saliency maps are generated. SDCLNet is easy to be trained in an end-to-end style, and it does not need any post-processing. Extensive experiments demonstrate the effectiveness and superiority of the proposed method in terms of both subjective visual perception and objective evaluation metrics on six benchmark datasets. SDCLNet achieves a real-time speed of 51 FPS when it is run on one GPU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call