Abstract

Self-distillation has gained widespread attention in recent years because it progressively transfers the knowledge in end-to-end training schemes within one network. However, self-distillation methods are susceptible to label noise hence leading to poor generalization performance. To address this problem, this paper proposes a novel self-distillation method, called GEIKD, which combines a gated ensemble self-teacher network and the influences-based label noise removal. Specifically, we design a gated ensemble self-teacher network composed of multiple teacher branches, which allows a gated fused knowledge based on a weighted bi-directional feature pyramid network. Moreover, we introduce influences estimation into the distillation process to quantify the effect of noisy labels on the distillation loss, and then reject the unfavorable instances as noisy labeled samples according to the calculated influences. Our influences-based label noise removal can be integrated with any existing knowledge distillation training schemes. The impact of noisy labels on knowledge distillation can be significantly alleviated by the proposed noisy instances removal with little extra training efforts. Experiments show that the proposed GEIKD method outperforms the state-of-the-art methods on CIFAR-100, TinyimageNet and fine-grained datasets CUB200, MIT-67, Stanford40 and FERC dataset, using clean data and data with noisy labels.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call