Adversarial network integrating dual attention and sparse representation for semi-supervised semantic segmentation

Xu Chen,Chuancai Liu,Ge Jin

doi:10.1016/j.ipm.2021.102680

Abstract

Semantic segmentation is the important task of assigning a semantic label to each pixel. However, semantic segmentation based on the deep neural network usually requires massive annotations consumption to acquire better performance. To avoid the problem, some algorithms based on weakly-supervised and semi-supervised conditions have been proposed and achieved gradually improving performance in recent years. In this paper, we propose a novel semi-supervised adversarial network to alleviate the shortage of labeled data, which only requires a few labeled images to get competitive performance. The model is composed of two parts: the segmentation network and the discriminator network. The first part aims to semantically generate a segmented result that has the same size as the input color image. The discriminator network is designed in a fully convolutional manner to distinguish the predicted probability maps depending on the ground truth distribution. In particular, the probability maps are regarded as focal attention maps, which are fed back to the segmentation network to make the model converge faster, and the process can induce the model to focus on pixels that are hard to segment. To enhance the representation ability of image features, sparse representation and dual attention are adopted in the segmentation network. The sparse representation module aims to emphasize the object edges and locations by learning the convolutional sparse representation of the input color images, and the dual attention module can exploit the semantic interdependencies in two different dimensions. Moreover, the semi-supervised mechanism is introduced to the network, in which the adaptive parameter T that controls the sensitivity of the self-taught phase is proposed, and the training dataset is split into two parts for fully-supervised learning and semi-supervised learning. Specifically, the first part is unlabeled data, which is applied to provide supervised signals for semi-supervised training. The labeled data drawn from the other part is utilized for fully-supervised learning. Our semi-supervised adversarial framework can improve the learning ability and achieve higher performance, also, provide a novel approach to tackle the semantic segmentation task. Finally, comprehensive experiments on the PASCAL VOC 2012 and Cityscapes datasets are conducted to verify the effectiveness of the proposed model, which achieves the expected performance.

Full Text