Medical image segmentation tasks hitherto have achieved excellent progresses with large-scale datasets, which empowers us to train potent deep convolutional neural networks (DCNNs). However, labeling such large-scale datasets is laborious and error-prone, which leads the noisy (or incorrect) labels to be an ubiquitous problem in the real-world scenarios. In addition, data collected from different sites usually exhibit significant data distribution shift (or domain shift). As a result, noisy label and domain shift become two common problems in medical imaging application scenarios, especially in medical image segmentation, which degrade the performance of deep learning models significantly. In this paper, we identify a novel problem hidden in medical image segmentation, which is unsupervised domain adaptation on noisy labeled data, and propose a novel algorithm named "Self-Cleansing Unsupervised Domain Adaptation" (S-CDUA) to address such issue. S-CUDA sets up a realistic scenario to solve the above problems simultaneously where training data (i.e., source domain) not only shows domain shift w.r.t. unsupervised test data (i.e., target domain) but also contains noisy labels. The key idea of S-CUDA is to learn noise-excluding and domain invariant knowledge from noisy supervised data, which will be applied on the highly corrupted data for label cleansing and further data-recycling, as well as on the test data with domain shift for supervised propagation. To this end, we propose a novel framework leveraging noisy-label learning and domain adaptation techniques to cleanse the noisy labels and learn from trustable clean samples, thus enabling robust adaptation and prediction on the target domain. Specifically, we train two peer adversarial networks to identify high-confidence clean data and exchange them in companions to eliminate the error accumulation problem and narrow the domain gap simultaneously. In the meantime, the high-confidence noisy data are detected and cleansed in order to reuse the contaminated training data. Therefore, our proposed method can not only cleanse the noisy labels in the training set but also take full advantage of the existing noisy data to update the parameters of the network. For evaluation, we conduct experiments on two popular datasets (REFUGE and Drishti-GS) for optic disc (OD) and optic cup (OC) segmentation, and on another public multi-vendor dataset for spinal cord gray matter (SCGM) segmentation. Experimental results show that our proposed method can cleanse noisy labels efficiently and obtain a model with better generalization performance at the same time, which outperforms previous state-of-the-art methods by large margin. Our code can be found at https://github.com/zzdxjtu/S-cuda.
Read full abstract