A cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection

Zhengyi Liu,Wei Zhang,Peng Zhao

doi:10.1016/j.neucom.2020.01.045

Abstract

Salient object detection in RGB-D images aims to identify the most attractive objects in a pair of color and depth images for the observer. As an important branch of salient object detection, it focuses on solving the following two major challenges: how to achieve cross-modal fusion that is efficient and beneficial for salient object detection; how to effectively extract the information of depth image with relatively poor quality. This paper proposes a cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection by using color and depth images. Specifically, the generator network adopts double-stream encoder-decoder network and receives RGB and depth images at the same time. The proposed depthwise separable residual convolution module is used to deal with deep semantic information, and the processed feature is combined with side-output features of the encoder network progressively. In order to compensate for the shortcoming of poor quality of the depth image, the proposed method adds the cross-modal guidance from the side-output features of the RGB stream to the decoder network of depth stream. The discriminator network adaptively fuses the features of double streams using a gated fusion module, then sends the gated fusion saliency map to the discriminator to distinguish the similarity from ground-truth map. Adversarial learning forms the better generator network and discriminator network, and the gated fusion saliency map generated by the best generator network is served as final result. Experiments on five publicly RGB-D datasets demonstrate the effect of cross-modal fusion, depthwise separable residual convolution and adaptive gated fusion. Compared with the state-of-the-art methods, our method achieves the better performance.

Full Text