Abstract

Knowledge distillation has been used successfully to compress a large neural network (teacher) into a smaller neural network (student) by transferring the knowledge of the teacher network with its original training dataset. However, the original training dataset is not reusable in many real-world applications. To address this issue, data-free knowledge distillation, which is knowledge distillation in the absence of the original training datasets, has been studied. However, existing methods are limited to classification problems and cannot be directly applied to regression problems. In this study, we propose a novel data-free knowledge distillation method that is applicable to regression problems. Given a teacher network, we adopt a generator network to transfer the knowledge in the teacher network to a student network. We simultaneously train the generator and student networks in an adversarial manner. The generator network is trained to create synthetic data on which the teacher and student networks make different predictions, with the student network being trained to mimic the teacher network’s predictions. We demonstrate the effectiveness of the proposed method on benchmark datasets. Our results show that the student network emulates the prediction ability of the teacher network with little performance loss.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call