Abstract

Deep salient object detection (SOD) methods usually use the end-to-end network to extract the global or local information of the image such as contrast, spatial distribution and objectness. We find that the depth is one important cue of saliency that is neglected by previous works. For a single image, the depth denotes the relative distance of objects to observer, and the relatively closer object in the image usually attracts more human attention and has a higher saliency. In this paper, we proposed a deep convolution network to extract the depth information of the image to predict the saliency. Our network consists of two streams, depth stream and contrast stream. The first stream can predict the saliency brought by object depth through two deep networks. The second stream can extract the contrast information of image through a multi-scale network. The saliency prediction through the depth stream often has blurred boundaries, while the result of the contrast stream is more accurate in pixel level. So, we obtain the final saliency map through the combination of the two stream results. We compare with the state-of-the-art deep SOD methods on four public datasets. The experimental results show that the combination of the two streams can have more accurate performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call