Because salient objects usually have fewer data in a scene, the problem of class imbalance is often encountered in salient object detection (SOD). In order to address this issue and achieve the consistent salient objects, we propose an adversarial focal loss network with improving generative adversarial networks for RGB-D SOD (called AFLNet), in which color and depth branches constitute the generator to achieve the saliency map, and adversarial branch with high-order potentials, instead of pixel-wise loss function, refines the output of the generator to obtain contextual information of objects. We infer the adversarial focal loss function to solve the problem of foreground–background class imbalance. To sufficiently fuse the high-level features of color and depth cues, an inception model is adopted in deep layers. We conduct a large number of experiments using our proposed model and its variants, and compare them with state-of-the-art methods. Quantitative and qualitative experimental results exhibit that our proposed approach can improve the accuracy of salient object detection and achieve the consistent objects.