Abstract

Pseudo-labeling-based approaches are gaining prominence in Semi-Supervised Learning (SSL). Recent studies have identified that the key bottleneck in this methodology is addressing insufficient, incorrect, and imbalanced pseudo-labels. In this paper, we argue that the intrinsic problem behind this bottleneck is classifier bias, i.e., the classifier's prototypes suffer from poor uniformity. Further, inspired by neural collapse that reveals an optimal structure under supervised training scenarios, we address this classifier bias by utilizing an offline simplex Equiangular Tight Frame (ETF) classifier with maximally and equally separated prototypes. During the training phase, we maintain the prototypes of the classifier as fixed and concentrate on refining the feature encoder. Specifically, we integrate a straightforward clustering-based pseudo-labeling strategy with information maximization for feature learning. In practice, the fixed ETF classifier prevents the model from falling into a detrimental cycle, where a biased classifier exacerbates misaligned features, further perpetuating this bias. Furthermore, the clustering-based pseudo-labeling strategy reduces the dependency on complex threshold-adjusting mechanisms and effectively navigates the quantity-quality trade-off that plagues existing SSL methods. Leveraging these methodologies, we develop a simple yet powerful approach, termed ETF-SSL. Extensive experiments across Image, Text, and Audio datasets demonstrate that ETF-SSL can achieve competitive or superior performance compared to existing approaches. This success highlights the benefits of using a fixed ETF classifier in SSL and points to promising directions for future research in this area. The code is available at: https://github.com/yichenwang231/ETFSSL.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call