Recent methods in network pruning have indicated that a dense neural network involves a sparse subnetwork (called a winning ticket), which can achieve similar test accuracy to its dense counterpart with much fewer network parameters. Generally, these methods search for the winning tickets on well-labeled data. Unfortunately, in many real-world applications, the training data are unavoidably contaminated with noisy labels, thereby leading to performance deterioration of these methods. To address the above-mentioned problem, we propose a novel two-stream sample selection network (TS3-Net), which consists of a sparse subnetwork and a dense subnetwork, to effectively identify the winning ticket with noisy labels. The training of TS3-Net contains an iterative procedure that switches between training both subnetworks and pruning the smallest magnitude weights of the sparse subnetwork. In particular, we develop a multistage learning framework including a warm-up stage, a semisupervised alternate learning stage, and a label refinement stage, to progressively train the two subnetworks. In this way, the classification capability of the sparse subnetwork can be gradually improved at a high sparsity level. Extensive experimental results on both synthetic and real-world noisy datasets (including MNIST, CIFAR-10, CIFAR-100, ANIMAL-10N, Clothing1M, and WebVision) demonstrate that our proposed method achieves state-of-the-art performance with very small memory consumption for label noise learning. Code is available at https://github.com/Runqing-forMost/TS3-Net/tree/master.
Read full abstract