Self-supervised representation learning can boost the performance of a pre-trained network on downstream tasks for which labeled data is limited. A popular method based on this paradigm, known as contrastive learning, works by constructing sets of positive and negative pairs from the data, and then pulling closer the representations of positive pairs while pushing apart those of negative pairs. Although contrastive learning has been shown to improve performance in various classification tasks, its application to image segmentation has been more limited. This stems in part from the difficulty of defining positive and negative pairs for dense feature maps without having access to pixel-wise annotations.In this work, we propose a novel self-supervised pre-training method that overcomes the challenges of contrastive learning in image segmentation. Our method leverages Information Invariant Clustering (IIC) as an unsupervised task to learn a local representation of images in the decoder of a segmentation network, but addresses three important drawbacks of this approach: (i) the difficulty of optimizing the loss based on mutual information maximization; (ii) the lack of clustering consistency for different random transformations of the same image; (iii) the poor correspondence of clusters obtained by IIC with region boundaries in the image. Toward this goal, we first introduce a regularized mutual information maximization objective that encourages the learned clusters to be balanced and consistent across different image transformations. We also propose a boundary-aware loss based on cross-correlation, which helps the learned clusters to be more representative of important regions in the image. Compared to contrastive learning applied in dense features, our method does not require computing positive and negative pairs and also enhances interpretability through the visualization of learned clusters.Comprehensive experiments involving four different medical image segmentation tasks reveal the high effectiveness of our self-supervised representation learning method. Our results show the proposed method to outperform by a large margin several state-of-the-art self-supervised and semi-supervised approaches for segmentation, reaching a performance close to full supervision with only a few labeled examples.
Read full abstract