Abstract

Weakly supervised salient object detection (SOD) is a challenging task and has drawn much attention from several research perspectives, it has revealed two problems while driving the rapid development of saliency detection. (1) Large divergence in the characteristics of saliency regions in terms of location, shape and size makes them difficult to recognize. (2) The properties of convolutional neural networks dictate that it is insensitive to various transformations, which will lead to hardly balance the application of various disturbances. To tackle these limitations, this paper proposes a novel seminar learning framework with consistent transformation ensembling (SLF-CT) for scribble supervised SOD. The framework consists of the teacher–student model and the student–student model for segmenting the salient objects. Specifically, we first design a cross attention guided network (CAGNet) as a baseline model for saliency prediction. Then we assign CAGNet to the teacher–student model, where the teacher network is based on the exponential moving average and guides the training of the student network. Moreover, we adopt multiple pseudo labels to transfer the information among students from different conditions. To further enhance the regularization of the network, a consistency transformation mechanism is also incorporated, which encourages the saliency prediction and input image of the network to be consistent. The experimental results demonstrate that the proposed approach performs favorably comparable with the state-of-the-art weakly supervised methods. As far as we know, the proposed approach is the first application of seminar learning in the SOD area.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call