Abstract

Knowledge Distillation (KD) is one of the most popular and effective techniques for model compression and knowledge transfer. However, most existing KD approaches are heavily relying on the labeled training data, which is usually unavailable due to privacy concerns. Thus, data-free KD focus on restoring the training data with Generative Adversarial Networks (GANs) by either catering the pre-trained teacher or fooling the student. In this paper we introduce Adversarial Variational Knowledge Distillation (AVKD), a framework that formulates the restoring process as Variational Autoencoders (VAEs). Different from vanilla VAEs, AVKD is specified by a pre-trained teacher model \(p(y|x)\) of the visible labels \(y\) given the latent \(x\), a prior \(p(x)\) over the latent variables and an approximate generative model \(q(x|y)\). In practice, we refer the prior \(p(x)\) as an alternate unlabeled data distribution from other related domains. Similar to Adversarial Variational Bayes (AVB), we estimate the KL-divergence term between \(p(x)\) and \(q(x|y)\) by introducing a discriminator network. Although the original training data are unavailable, we argue that the prior data drawn from other related domains can be easily obtained to learn the knowledge distillation efficiently. Extensive experiments testify that our method outperforms the state-of-the-art algorithms in the absence of the original training data, with performance approaching the case where the original training data are provided.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call