Abstract

Existing knowledge distillation (KD) method normally fixes the weight of the teacher network, and uses the knowledge from the teacher network to guide the training of the student network no-ninteractively, thus it is called static knowledge distillation (SKD). SKD is widely used in model compression on the homologous data and knowledge transfer on the heterogeneous data. However, the teacher network that with fixed-weight constrains the student network to learn knowledge from it. It is worth expecting that the teacher network itself can be continuously optimized to promote the learning ability of the student network dynamically. To overcome this limitation, we propose a novel dynamic knowledge distillation (DKD) method, in which the teacher network and the student network can learn from each other interactively. Importantly, we analyzed the effectiveness of DKD mathematically (see Eq. 4), and addressed one crucial issue caused by the continuous change of the teacher network in the dynamic distillation process via designing a valid loss function. We verified the practicality of our DKD by extensive experiments on various visual tasks, e.g. for model compression, we conducted experiments on image classification and object detection. For knowledge transfer, video-based human action recognition is chosen for analysis. The experimental results on benchmark datasets (i.e. ILSVRC2012, COCO2017, HMDB51, UCF101) demonstrated that the proposed DKD is valid to improve the performance of these visual tasks for a large margin. The source code is publicly available online at1.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call