Abstract

In semi-supervised learning (SSL), many approaches follow the effective self-training paradigm with consistency regularization, utilizing threshold heuristics to alleviate label noise. However, such threshold heuristics lead to the underutilization of crucial discriminative information from the excluded data. In this paper, we present OTAMatch, a novel SSL framework that reformulates pseudo-labeling as an optimal transport (OT) assignment problem and simultaneously exploits data with high confidence to mitigate the confirmation bias. Firstly, OTAMatch models the pseudo-label allocation task as a convex minimization problem, facilitating end-to-end optimization with all pseudo-labels and employing the Sinkhorn-Knopp algorithm for efficient approximation. Meanwhile, we incorporate epsilon-greedy posterior regularization and curriculum bias correction strategies to constrain the distribution of OT assignments, improving the robustness with noisy pseudo-labels. Secondly, we propose PseudoNCE, which explicitly exploits pseudo-label consistency with threshold heuristics to maximize mutual information within self-training, significantly boosting the balance of convergence speed and performance. Consequently, our proposed approach achieves competitive performance on various SSL benchmarks. Specifically, OTAMatch substantially outperforms the previous state-of-the-art SSL algorithms in realistic and challenging scenarios, exemplified by a notable 9.45% error rate reduction over SoftMatch on ImageNet with 100K-label split, underlining its robustness and effectiveness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call