Depth really Matters: Improving Visual Salient Region Detection with Depth

Karthik Desingh,Cv Jawahar,Deepu Rajan,Madhava Krishna K

doi:10.5244/c.27.98

Abstract

Depth information has been shown to affect identification of visually salient regions in images. In this paper, we investigate the role of depth in saliency detection in the presence of (i) competing saliencies due to appearance, (ii) depth-induced blur and (iii) centre-bias. Having established through experiments that depth continues to be a significant contributor to saliency in the presence of these cues, we propose a 3D-saliency formulation that takes into account structural features of objects in an indoor setting to identify regions at salient depth levels. Computed 3D-saliency is used in conjunction with 2D-saliency models through non-linear regression using SVM to improve saliency maps. Experiments on benchmark datasets containing depth information show that the proposed fusion of 3D-saliency with 2D-saliency models results in an average improvement in ROC scores of about 9% over state-of-the-art 2D saliency models. The main contributions of this paper are: (i) The development of a 3D-saliency model that integrates depth and geometric features of object surfaces in indoor scenes. (ii) Fusion of appearance (RGB) saliency with depth saliency through non-linear regression using SVM. (iii) Experiments to support the hypothesis that depth improves saliency detection in the presence of blur and centre-bias. The effectiveness of the 3D-saliency model and its fusion with RGB-saliency is illustrated through experiments on two benchmark datasets that contain depth information. Current stateof-the-art saliency detection algorithms perform poorly on these datasets that depict indoor scenes due to the presence of competing saliencies in the form of color contrast. For example in Fig. 1, saliency maps of [1] is shown for different scenes, along with its human eye fixations and our proposed saliency map after fusion. It is seen from the first scene of Fig. 1, that illumination plays spoiler role in RGB-saliency map. In second scene of Fig. 1, the RGB-saliency is focused on the cap though multiple salient objects are present in the scene. Last scene at the bottom of Fig. 1, shows the limitation of the RGB-saliency when the object is similar in appearance with the background. Effect of depth on Saliency: In [4], it is shown that depth is an important cue for saliency. In this paper we go further and verify if the depth alone influences the saliency. Different scenes were captured for experimentation using Kinect sensor. Observations resulted out of these experiments are (i) Humans fixate on the objects at closer depth, in the presence of visually competing salient objects in the background, (ii) Early attention happens on the objects at closer depth, (iii) Effective fixations are high at the low contrast foreground compared to the high contrast objects in the background which are blurred, (iv) Low contrast object placed at the center of the field of view, gets more attention compared to other locations. As a result of all these observations, we develop a 3D-saliency that captures the depth information of the regions in the scene. 3D-Saliency: We adapt the region based contrast method from Cheng et al. [1] in computing contrast strengths for the segmented 3D surfaces or regions. Each segmented region is assigned a contrast score using surface normals as the feature. Structure of the surface can be described based on the distribution of normals in the region. We compute a histogram of angular distances formed by every pair of normals in the region. Every region Rk is associated with a histogram Hk. Contrast score Ck of a region Rk is computed as the sum of the dot products of its histogram with histograms of other regions in the scene. Since the depth of the region is influencing the visual attention, the contrast score is scaled by a value Zk, which is the depth of the region Rk from the sensor. In order to define the saliency, sizes of the regions i.e. the number of the points in the region, have to be considered. We find the ratio of the region dimension to the half of the scene dimension. Considering nk as the number of 3D points in the region Rk, the constrast score becomes Figure 1: Four different scenes and their saliency maps; For each scene from top left (i) Original Image, (ii) RGB-Saliency map using RC [1], (iii) Human fixations from eye-tracker and (iv) Fused RGBD-saliency map

Full Text