Adaptive Teacher Finetune: Towards high-performance knowledge distillation through adaptive fine-tuning

Zhenyan Hou,Wenxuan Fan

doi:10.1088/1742-6596/1982/1/012084

Zhenyan Hou, Wenxuan Fan

Open Access

https://doi.org/10.1088/1742-6596/1982/1/012084

Copy DOI

Abstract

Knowledge distillation is a widely used method to transfer knowledge from a large model to a small model. Traditional methods use pre-trained large models to supervise the training of small models, called Offline Knowledge Distillation, However, the structural gap between teachers and students limits its performance. After that, Oneline Knowledge Distillation retrained the teacher-student network from the beginning and the method of echo teaching greatly improved the performance. But there is very little work to explore the difference between the two. In this paper, we first point out that the essential difference between Offline and Oneline Knowledge Distillation is actually whether the weight of the teacher-student network has a process of mutual adaptation. If they adopt the teacher network and the student network jointly train to implement Offline Knowledge Distillation, there is no obvious difference in the final performance, no matter whether it is a joint distillation training. This shows that teacher-student network adaptation is important for Knowledge Distillation. Then, we propose an Adaptive Teacher Finetune (ATF) to adapt the teacher model to the student network. It will use student model information for Tinetune during the Offline Knowledge Distillation process. With normalized logical distribution and alpha-divergence, the performance improvement of ATF clearly exceeds the existing Offline and Oneline Knowledge Distillation method. Extensive experiments conducted on cifar and ImageNet support our aforementioned analysis and conclusions. With the newly introduced ATF, we obtained state-of-the-art performance on ResNet 18 on ImageNet.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Physics: Conference Series	Publication Date: Jul 1, 2021
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Adaptive Teacher Finetune: Towards high-performance knowledge distillation through adaptive fine-tuning

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

Attention and feature transfer based knowledge distillation
Guoliang Yang ... Hao Yang
Scientific Reports | VOL. 13
Guoliang Yang, et. al.Guoliang Yang ... Hao Yang
26 Oct 2023
Scientific Reports | VOL. 13

Shared Knowledge Distillation Network for Object Detection
Zhen Guo ... Pengzhou Zhang
Electronics | VOL. 13
Zhen Guo, et. al.Zhen Guo ... Pengzhou Zhang
22 Apr 2024
Electronics | VOL. 13

Low-light image enhancement with knowledge distillation
Ziwen Li ... Jinpu Zhang
Neurocomputing | VOL. 518
Ziwen Li, et. al.Ziwen Li ... Jinpu Zhang
11 Nov 2022
Neurocomputing | VOL. 518

Adversarial Metric Knowledge Distillation
Zihe Dong ... Xin Sun
-
Zihe Dong, et. al.Zihe Dong ... Xin Sun
27 Nov 2020
27 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive Teacher Finetune: Towards high-performance knowledge distillation through adaptive fine-tuning

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series