RGB-D salient object detection (SOD) usually describes two modes’ classification or regression problem, namely RGB and depth. The existing RGB-D SOD methods use depth hints to increase the detection performance, meanwhile they focus on the quality of little depth maps. In practical application, the interference of various problems in the acquisition process affects the depth map quality, which dramatically reduces the detection effect. In this paper, to minimize interference in depth mapping and emphasize prominent objects in RGB images, we put forward a layered interactive attention network (LIANet). The whole network model adopts the idea of double branch structure to integrate RGB information and Depth information. The network has three parts: feature coding, layered fusion mechanism, and feature decoding. In the feature coding stage, a simple attention module (SAM) is added, which defines the energy function considering the weights of channel and space dimensions. This module enables the network to learn more discriminating neurons without adding parameters. By refining these neurons, the high-level semantic features of images can be fully mined. The layered interactive fusion module (LIFM) is the most critical part of this paper. This module effectively enhances the cross-modal interaction between RGB features and depth features. The RGB-depth-RGB modulation feedback mechanism successfully eliminates interference in the depth map and accurately highlights the features of salient objects. In addition, we also used mixed losses to optimize further and train our model. Finally, a mass of experiments on six standard datasets demonstrated the importance of the method, and a timely detection speed reaches 30 fps on every dataset.
Read full abstract