Abstract

AbstractDeep learning methods have shown promising performance in medical image semantic segmentation. The cost of high-quality annotations, however, is still high and hard to access as clinicians are pressed for time. In this paper, we propose to utilize the power of Vision Transformer (ViT) with a semi-supervised framework for medical image semantic segmentation. The framework consists of a student model and a teacher model, where the student model learns from image feature information and helps teacher model to update parameters. The consistency of the inference of unlabeled data between the student model and teacher model is studied, so the whole framework is set to minimize segmentation supervision loss and consistency semi-supervision loss. To improve the semi-supervised performance, an uncertainty estimation scheme is introduced to enable the student model to learn from only reliable inference data during consistency loss calculation. The approach of filtering inconclusive images via an uncertainty value and the weighted sum of two losses in the training process is further studied. In addition, ViT is selected and properly developed as a backbone for the semi-supervised framework under the concern of long-range dependencies modeling. Our proposed method is tested with a variety of evaluation methods on a public benchmarking MRI dataset. The results of the proposed method demonstrate competitive performance against other state-of-the-art semi-supervised algorithms as well as several segmentation backbones.KeywordsSemi-supervised learningImage semantic segmentationVision transformer

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call