Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation.

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Supervised deep learning-based methods yield accurate results for medical image segmentation. However, they require large labeled datasets for this, and obtaining them is a laborious task that requires clinical expertise. Semi/self-supervised learning-based approaches address this limitation by exploiting unlabeled data along with limited annotated data. Recent self-supervised learning methods use contrastive loss to learn good global level representations from unlabeled images and achieve high performance in classification tasks on popular natural image datasets like ImageNet. In pixel-level prediction tasks such as segmentation, it is crucial to also learn good local level representations along with global representations to achieve better accuracy. However, the impact of the existing local contrastive loss-based methods remains limited for learning good local representations because similar and dissimilar local regions are defined based on random augmentations and spatial proximity; not based on the semantic label of local regions due to lack of large-scale expert annotations in the semi/self-supervised setting. In this paper, we propose a local contrastive loss to learn good pixel level features useful for segmentation by exploiting semantic label information obtained from pseudo-labels of unlabeled images alongside limited annotated images with ground truth (GT) labels. In particular, we define the proposed contrastive loss to encourage similar representations for the pixels that have the same pseudo-label/GT label while being dissimilar to the representation of pixels with different pseudo-label/GT label in the dataset. We perform pseudo-label based self-training and train the network by jointly optimizing the proposed contrastive loss on both labeled and unlabeled sets and segmentation loss on only the limited labeled set. We evaluated the proposed approach on three public medical datasets of cardiac and prostate anatomies, and obtain high segmentation performance with a limited labeled set of one or two 3D volumes. Extensive comparisons with the state-of-the-art semi-supervised and data augmentation methods and concurrent contrastive learning methods demonstrate the substantial improvement achieved by the proposed method. The code is made publicly available at https://github.com/krishnabits001/pseudo_label_contrastive_training.

Similar Papers
  • Research Article
  • Cite Count Icon 14
  • 10.1109/jbhi.2023.3340956
Two-Stage Self-Supervised Contrastive Learning Aided Transformer for Real-Time Medical Image Segmentation.
  • Feb 1, 2026
  • IEEE journal of biomedical and health informatics
  • Abdul Qayyum + 5 more

The availability of large, high-quality annotated datasets in the medical domain poses a substantial challenge in segmentation tasks. To mitigate the reliance on annotated training data, self-supervised pre-training strategies have emerged, particularly employing contrastive learning methods on dense pixel-level representations. In this work, we proposed to capitalize on intrinsic anatomical similarities within medical image data and develop a semantic segmentation framework through a self-supervised fusion network, where the availability of annotated volumes is limited. In a unified training phase, we combine segmentation loss with contrastive loss, enhancing the distinction between significant anatomical regions that adhere to the available annotations. To further improve the segmentation performance, we introduce an efficient parallel transformer module that leverages Multiview multiscale feature fusion and depth-wise features. The proposed transformer architecture, based on multiple encoders, is trained in a self-supervised manner using contrastive loss. Initially, the transformer is trained using an unlabeled dataset. We then fine-tune one encoder using data from the first stage and another encoder using a small set of annotated segmentation masks. These encoder features are subsequently concatenated for the purpose of brain tumor segmentation. The multiencoder-based transformer model yields significantly better outcomes across three medical image segmentation tasks. We validated our proposed solution by fusing images across diverse medical image segmentation challenge datasets, demonstrating its efficacy by outperforming state-of-the-art methodologies.

  • Research Article
  • 10.3390/bioengineering13010104
MedSegNet10: A Publicly Accessible Network Repository for Split Federated Medical Image Segmentation
  • Jan 15, 2026
  • Bioengineering
  • Chamani Shiranthika + 3 more

Machine Learning (ML) and Deep Learning (DL) have shown significant promise in healthcare, particularly in medical image segmentation, which is crucial for accurate disease diagnosis and treatment planning. Despite their potential, challenges such as data privacy concerns, limited annotated data, and inadequate training data persist. Decentralized learning approaches such as federated learning (FL), split learning (SL), and split federated learning (SplitFed/SFL) address these issues effectively. This paper introduces “MedSegNet10,” a publicly accessible repository designed for medical image segmentation using split-federated learning. MedSegNet10 provides a collection of pre-trained neural network architectures optimized for various medical image types, including microscopic images of human blastocysts, dermatoscopic images of skin lesions, and endoscopic images of lesions, polyps, and ulcers. MedSegNet10 implements SplitFed versions of ten established segmentation architectures, enabling collaborative training without centralizing raw data and labels, reducing the computational load required at client sites. This repository supports researchers, practitioners, trainees, and data scientists, aiming to advance medical image segmentation while maintaining patient data privacy.

  • Research Article
  • Cite Count Icon 16
  • 10.1038/s41598-025-89096-9
MedFuseNet: fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentation
  • Feb 11, 2025
  • Scientific Reports
  • Ruiyuan Chen + 10 more

Medical image segmentation plays a crucial role in addressing emerging healthcare challenges. Although several impressive deep learning architectures based on convolutional neural networks (CNNs) and Transformers have recently demonstrated remarkable performance, there is still potential for further performance improvement due to their inherent limitations in capturing feature correlations of input data. To address this issue, this paper proposes a novel encoder-decoder architecture called MedFuseNet that aims to fuse local and global deep feature representations with hybrid attention mechanisms for medical image segmentation. More specifically, the proposed approach contains two branches for feature learning in parallel: one leverages CNNs to learn local correlations of input data, and the other utilizes Swin-Transformer to capture global contextual correlations of input data. For feature fusion and enhancement, the designed hybrid attention mechanisms combine four different attention modules: (1) an atrous spatial pyramid pooling (ASPP) module for the CNN branch, (2) a cross attention module in the encoder for fusing local and global features, (3) an adaptive cross attention (ACA) module in skip connections for further performing fusion, and (4) a squeeze-and-excitation attention (SE-attention) module in the decoder for highlighting informative features. We evaluate our proposed approach on the public ACDC and Synapse datasets, and achieves the average DSC of 89.73% and 78.40%, respectively. Experimental results on these two datasets demonstrate the effectiveness of our proposed approach on medical image segmentation tasks, outperforming other used state-of-the-art approaches.

  • Research Article
  • Cite Count Icon 9
  • 10.1166/jmihi.2019.2843
Application of Deep Convolutional Neural Networks in Attention-Deficit/Hyperactivity Disorder Classification: Data Augmentation and Convolutional Neural Network Transfer Learning
  • Oct 1, 2019
  • Journal of Medical Imaging and Health Informatics
  • Li Zhu + 1 more

Attention-deficit/hyperactivity disorder (ADHD) is one of the most common and controversial diseases in paediatric psychiatry. Recently, computer-aided diagnosis methods become increasingly popular in clinical diagnosis of ADHD. In this paper, we introduced the latest powerful method—deep convolutional neural networks (CNNs). Some data augmentation methods and CNN transfer learning technique were used to address the application problem of deep CNNs in the ADHD classification task, given the limited annotated data. In addition, we previously encoded all gray-scale images into 3-channel images via two image enhancement methods to leverage the pre-trained CNN models designed for 3-channel images. All CNN models were evaluated on the published testing dataset from the ADHD-200 sample. Evaluation results show that our proposed deep CNN method achieves a state-of-the-art accuracy of 66.67% by using data augmentation methods and CNN transfer learning technique, and outperforms existing methods in the literature. The result can be improved by building a special CNN structure. Furthermore, the trained deep CNN model can be used to clinically diagnose ADHD in real-time. We suggest that the use of CNN transfer learning and data augmentation will be an effective solution in the application problem of deep CNNs in medical image analysis.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.3390/a17040168
CCFNet: Collaborative Cross-Fusion Network for Medical Image Segmentation
  • Apr 21, 2024
  • Algorithms
  • Jialu Chen + 1 more

The Transformer architecture has gained widespread acceptance in image segmentation. However, it sacrifices local feature details and necessitates extensive data for training, posing challenges to its integration into computer-aided medical image segmentation. To address the above challenges, we introduce CCFNet, a collaborative cross-fusion network, which continuously fuses a CNN and Transformer interactively to exploit context dependencies. In particular, when integrating CNN features into Transformer, the correlations between local and global tokens are adaptively fused through collaborative self-attention fusion to minimize the semantic disparity between these two types of features. When integrating Transformer features into the CNN, it uses the spatial feature injector to reduce the spatial information gap between features due to the asymmetry of the extracted features. In addition, CCFNet implements the parallel operation of Transformer and the CNN and independently encodes hierarchical global and local representations when effectively aggregating different features, which can preserve global representations and local features. The experimental findings from two public medical image segmentation datasets reveal that our approach exhibits competitive performance in comparison to current state-of-the-art methods.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.compbiomed.2022.106326
Learning to segment subcortical structures from noisy annotations with a novel uncertainty-reliability aware learning framework
  • Nov 16, 2022
  • Computers in Biology and Medicine
  • Xiang Li + 4 more

Learning to segment subcortical structures from noisy annotations with a novel uncertainty-reliability aware learning framework

  • Research Article
  • Cite Count Icon 61
  • 10.1016/j.compbiomed.2023.107717
LM-Net: A light-weight and multi-scale network for medical image segmentation
  • Nov 23, 2023
  • Computers in Biology and Medicine
  • Zhenkun Lu + 3 more

LM-Net: A light-weight and multi-scale network for medical image segmentation

  • Research Article
  • Cite Count Icon 2
  • 10.1088/1361-6501/adb76d
Zero-shot learning based on the fusion of global and local representations
  • Feb 28, 2025
  • Measurement Science and Technology
  • Wang Qiang + 4 more

Zero-shot learning (ZSL) endeavors to extend knowledge to novel classes by capitalizing on the semantic overlap across different categories. Contemporary ZSL approaches concentrate on isolating image-specific local features pertinent to attributes and using these to align class semantic vectors. However, existing methods often overlook the integration of global and local representation, a synthesis that could significantly enhance zero-shot recognition accuracy. This paper introduces an innovative ZSL technique, termed ZSL based on the fusion of global representation and local representation (ZGLR). Our approach incorporates a Transformer encoder constructed upon attribute prototypes to extract attribute-level features, which are then mapped to local representation to align with class semantic vectors. Concurrently, we introduce a discriminative semantic-visual mapping network that embeds class semantic vectors into the visual domain, thereby aligning global representation. During training, both global and local representation are optimized in tandem, while in the testing phase, the outcomes from both representational levels are consolidated to bolster classification precision. Practical results from three benchmark ZSL datasets demonstrate the superiority of our put-forward ZGLR solution.

  • Research Article
  • Cite Count Icon 16
  • 10.1109/tnnls.2023.3296652
Exploring Feature Representation Learning for Semi-Supervised Medical Image Segmentation.
  • Nov 1, 2024
  • IEEE transactions on neural networks and learning systems
  • Huimin Wu + 2 more

This article presents a simple yet effective two-stage framework for semi-supervised medical image segmentation. Unlike prior state-of-the-art semi-supervised segmentation methods that predominantly rely on pseudo supervision directly on predictions, such as consistency regularization and pseudo labeling, our key insight is to explore the feature representation learning with labeled and unlabeled (i.e., pseudo labeled) images to regularize a more compact and better-separated feature space, which paves the way for low-density decision boundary learning and therefore enhances the segmentation performance. A stage-adaptive contrastive learning method is proposed, containing a boundary-aware contrastive loss that takes advantage of the labeled images in the first stage, as well as a prototype-aware contrastive loss to optimize both labeled and pseudo labeled images in the second stage. To obtain more accurate prototype estimation, which plays a critical role in prototype-aware contrastive learning, we present an aleatoric uncertainty-aware method to generate higher quality pseudo labels. Aleatoric-uncertainty adaptive (AUA) adaptively regularizes prediction consistency by taking advantage of image ambiguity, which, given its significance, is underexplored by existing works. Our method achieves the best results on three public medical image segmentation benchmarks.

  • Research Article
  • Cite Count Icon 41
  • 10.1109/tmm.2021.3126146
Multisample-Based Contrastive Loss for Top-K Recommendation
  • Jan 1, 2023
  • IEEE Transactions on Multimedia
  • Hao Tang + 3 more

Top-k recommendation is a fundamental task in recommendation systems that is generally learned by comparing positive and negative pairs. The contrastive loss (CL) is the key in contrastive learning that has recently received more attention, and we find that it is well suited for top-k recommendations. However, CL is problematic because it treats the importance of the positive and negative samples the same. On the one hand, CL faces the imbalance problem of one positive sample and many negative samples. On the other hand, there are so few positive items in sparser datasets that their importance should be emphasized. Moreover, the other important issue is that the sparse positive items are still not sufficiently utilized in recommendations. Consequently, we propose a new data augmentation method by using multiple positive items (or samples) simultaneously with the CL loss function. Therefore, we propose a multisample-based contrastive loss (MSCL) function that solves the two problems by balancing the importance of positive and negative samples and data augmentation. Based on the graph convolution network (GCN) method, experimental results demonstrate the state-of-the-art performance of MSCL. The proposed MSCL is simple and can be applied in many methods. Our code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/haotangxjtu/MSCL</uri> .

  • Research Article
  • Cite Count Icon 89
  • 10.1109/tmi.2022.3161681
Retinal Vessel Segmentation With Skeletal Prior and Contrastive Loss.
  • Sep 1, 2022
  • IEEE Transactions on Medical Imaging
  • Yubo Tan + 3 more

The morphology of retinal vessels is closely associated with many kinds of ophthalmic diseases. Although huge progress in retinal vessel segmentation has been achieved with the advancement of deep learning, some challenging issues remain. For example, vessels can be disturbed or covered by other components presented in the retina (such as optic disc or lesions). Moreover, some thin vessels are also easily missed by current methods. In addition, existing fundus image datasets are generally tiny, due to the difficulty of vessel labeling. In this work, a new network called SkelCon is proposed to deal with these problems by introducing skeletal prior and contrastive loss. A skeleton fitting module is developed to preserve the morphology of the vessels and improve the completeness and continuity of thin vessels. A contrastive loss is employed to enhance the discrimination between vessels and background. In addition, a new data augmentation method is proposed to enrich the training samples and improve the robustness of the proposed model. Extensive validations were performed on several popular datasets (DRIVE, STARE, CHASE, and HRF), recently developed datasets (UoA-DR, IOSTAR, and RC-SLO), and some challenging clinical images (from RFMiD and JSIEC39 datasets). In addition, some specially designed metrics for vessel segmentation, including connectivity, overlapping area, consistency of vessel length, revised sensitivity, specificity, and accuracy were used for quantitative evaluation. The experimental results show that, the proposed model achieves state-of-the-art performance and significantly outperforms compared methods when extracting thin vessels in the regions of lesions or optic disc. Source code is available at https://www.github.com/tyb311/SkelCon.

  • Research Article
  • Cite Count Icon 1
  • 10.3233/jifs-231554
Local and global character representation enhanced model for Chinese medical named entity recognition
  • Aug 24, 2023
  • Journal of Intelligent &amp; Fuzzy Systems
  • Yan Xiang + 3 more

Chinese medical named entity recognition (CMNER) aims to extract entities from Chinese unstructured medical texts. Existing character-based NER models do not comprehensively consider character’s characteristics from different perspectives, which limits their performance in applying to CMNER. In this paper, we propose a local and global character representation enhanced model for CMNER. For the input sentence, the model fuses the spacial and sequential character representation using autoencoder to get the local character representation; extracts the global character representation according to the corresponding domain words; integrates the local and global representation through gating mechanism to obtain the enhanced character representation, which has better ability to perceive medical entities. Finally, the model sent the enhanced character representation to the Bi-LSTM and CRF layers for context encoding and tags decoding respectively. The experimental results demonstrate that our model achieves a significant improvement over the best baseline, increasing the F1 values by 1.04% and 0.62% on the IMCS21 and CMeEE datasets, respectively. In addition, we verify the effectiveness of each component of our model by ablation experiments.

  • Research Article
  • Cite Count Icon 1000
  • 10.1016/j.media.2020.101693
Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation.
  • Apr 3, 2020
  • Medical Image Analysis
  • Nima Tajbakhsh + 5 more

Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation.

  • Research Article
  • Cite Count Icon 8
  • 10.1109/tnnls.2025.3568479
Retrieval-Augmented Few-Shot Medical Image Segmentation With Foundation Models.
  • Oct 1, 2025
  • IEEE transactions on neural networks and learning systems
  • Lin Zhao + 5 more

Medical image segmentation is crucial for clinical decision-making, but the scarcity of annotated data presents significant challenges. Few-shot segmentation (FSS) methods show promise but often require training on the target domain and struggle to generalize across different modalities. Similarly, adapting foundation models such as the segment anything model (SAM) for medical imaging has limitations, including the need for fine-tuning and domain-specific adaptation. To address these issues, we propose a novel method that adapts DINOv2 and SAM 2 for retrieval-augmented few-shot medical image segmentation. Our approach uses DINOv2's feature as query to retrieve similar samples from limited annotated data, which are then encoded as memories and stored in memory bank. With the memory attention mechanism of SAM 2, the model leverages these memories as conditions to generate accurate segmentation of the target image. We evaluated our framework on three medical image segmentation tasks, demonstrating superior performance and generalizability across various modalities without the need for any retraining or fine-tuning. Overall, this method offers a practical and effective solution for few-shot medical image segmentation and holds significant potential as a valuable annotation tool in clinical applications.

  • Research Article
  • 10.1109/embc40787.2023.10341018
Semi-supervised Medical Image Segmentation with Multiscale Contrastive Learning and Cross-Supervision.
  • Jul 24, 2023
  • Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
  • Wenxia Wu + 4 more

We propose a semi-supervised segmentation method based on multiscale contrastive learning to solve the problem of shortage of annotations in medical image segmentation tasks. We apply perturbations to the input image and encoded features and make the output as consistent as possible by cross-supervision, which is a way to improve the generalizability of the model. Two scales of contrastive learning, patch-level and pixel-level, are employed to enhance the intra-class compactness and inter-class separability of the features. We evaluate the proposed model using three public datasets for brain tumor,left atrial, and cellular nuclei segmentation. The experiments showed that our model outperforms state-of-the-art methods.Clinical relevance- The proposed method can be used for medical image segmentation with limited annotated data and achieve comparable performance to the fully annotated situation. Such an approach can be easily extended to other clinical applications.

Save Icon
Up Arrow
Open/Close
Setting-up Chat
Loading Interface