Abstract

Deep learning-based salient object detection techniques have shown impressive results compared to conventional saliency detection by handcrafted features. Integrating hierarchical features of Convolutional Neural Networks (CNN) to achieve fine-grained saliency detection is a current trend, and various deep architectures are proposed by researchers, including “skip-layer” architecture, “top-down” architecture, “short-connection” architecture and so on. While these architectures have achieved progressive improvement on detection accuracy, it is still unclear about the underlying distinctions and connections between these schemes. In this paper, we review and draw underlying connections between these architectures, and show that they actually could be unified into a general framework, which simply just has side structures with different depths. Based on the idea of designing deeper side structures for better detection accuracy, we propose a unified framework called Deepside that can be deeply supervised to incorporate hierarchical CNN features. Additionally, to fuse multiple side outputs from the network, we propose a novel fusion technique based on segmentation-based pooling, which severs as a built-in component in the CNN architecture and guarantees more accurate boundary details of detected salient objects. The effectiveness of the proposed Deepside scheme against state-of-the-art models is validated on 8 benchmark datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call