Abstract

Knowledge distillation is an effective way to transfer the knowledge from a pre-trained teacher model to a student model. Co-distillation, as an online variant of distillation, further accelerates the training process and paves a new way to explore the “dark knowledge” by training n models in parallel. In this paper, we explore the “divergent examples”, which can make the classifiers have different predictions and thus induce the “dark knowledge”, and we propose a novel approach named Adversarial Co-distillation Networks (ACNs) to enhance the “dark knowledge” by generating extra divergent examples. Note that we do not involve any extra dataset, and we only utilize the standard training set to train the entire framework. ACNs are end-to-end frameworks composed of two parts: an adversarial phase consisting of Generative Adversarial Networks (GANs) to generate the divergent examples and a co-distillation phase consisting of multiple classifiers to learn the divergent examples. These two phases are learned in an iterative and adversarial way. To guarantee the quality of the divergent examples and the stability of ACNs, we further design “Weakly Residual Connection” module and “Restricted Adversarial Search” module to assist in the training process. Extensive experiments with various deep architectures on different datasets well demonstrate the effectiveness of our approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call