Segmentation of the parotid glands and tumors by MR images is essential for treating parotid gland tumors. However, segmentation of the parotid glands is particularly challenging due to their variable shape and low contrast with surroundingstructures. The lack of large and well-annotated datasets limits the development of deep learning in medical images. As an unsupervised learning method, contrastive learning has seen rapid development in recent years. It can better use unlabeled images and is hopeful to improve parotid glandsegmentation. We propose Swin MoCo, a momentum contrastive learning network with Swin Transformer as its backbone. The ImageNet supervised model is used as the initial weights of Swin MoCo, thus improving the training effects on small medical imagedatasets. Swin MoCo trained with transfer learning improves parotid gland segmentation to 89.78% DSC, 85.18% mIoU, 3.60 HD, and 90.08% mAcc. On the Synapse multi-organ computed tomography (CT) dataset, using Swin MoCo as the pre-trained model of Swin-Unet yields 79.66% DSC and 12.73 HD, which outperforms the best result of Swin-Unet on the Synapsedataset. The above improvements require only 4h of training on a single NVIDIA Tesla V100, which is computationally cheap. Swin MoCo provides new approaches to improve the performance of tasks on small datasets. The code is publicly available at https://github.com/Zian-Xu/Swin-MoCo.
Read full abstract