Abstract

Depth information has been widely used to improve RGB-D salient object detection by extracting attention maps to determine the position information of objects in an image. However, non-salient objects may be close to the depth sensor and present high pixel intensities in the depth maps. This situation in depth maps inevitably leads to erroneously emphasize non-salient areas and may have a negative impact on the saliency results. To mitigate this problem, we propose a hybrid attention neural network that fuses middle- and high-level RGB features with depth features to generate a hybrid attention map to remove background information. The proposed network extracts multilevel features from RGB images using the Res2Net architecture and then integrates high-level features from depth maps using the Inception-v4-ResNet2 architecture. The mixed high-level RGB features and depth features generate the hybrid attention map, which is then multiplied to the low-level RGB features. After decoding by several convolutions and upsampling, we obtain the final saliency prediction, achieving state-of-the-art performance on the NJUD and NLPR datasets. Moreover, the proposed network has good generalization ability compared with other methods. An ablation study demonstrates that the proposed network effectively performs saliency prediction even when non-salient objects interfere detection. In fact, after removing the branch with high-level RGB features, the RGB attention map that guides the network for saliency prediction is lost, and all the performance measures decline. The resulting prediction map from the ablation study shows the effect of non-salient objects close to the depth sensor. This effect is not present when using the complete hybrid attention network. Therefore, RGB information can correct and supplement depth information, and the corresponding hybrid attention map is more robust than using a conventional attention map constructed only with depth information.

Highlights

  • Saliency detection extracts relevant objects with pixel-level details from an image

  • RGB-D salient object detection based on handcrafted features generally uses depth maps to determine edges, textures, and histogram statistics, and bottom-up [9] or top-down [10] approaches are used to predict whether a pixel belongs to a salient object

  • Based on spatial attention maps, we propose stereoscopic salient object detection using a hybrid attention network (HANet)

Read more

Summary

Introduction

Saliency detection extracts relevant objects with pixel-level details from an image. It has been widely used in many fields such as object segmentation [1], region proposal [2], object recognition [3], image quality assessment [4], and video analysis [5]. It has been found that when the background has similar colors to those of a salient object or it is highly complex and salient objects are very large or small, saliency detection solely based on RGB images often fails to provide accurate results. RGB-D salient object detection based on handcrafted features generally uses depth maps to determine edges, textures, and histogram statistics, and bottom-up [9] or top-down [10] approaches are used to predict whether a pixel belongs to a salient object. Various methods consider the rarity of pixels in an image at local and global regions [11], while others use prior knowledge to support prediction

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call