What Role Does Data Augmentation Play in Knowledge Distillation?

Wei Li,Wei Huan,Shitong Shao,Ziming Qiu,Zhihao Zhu,Weiyan Liu

doi:10.1007/978-3-031-26284-5_31

Abstract

AbstractKnowledge distillation is an effective way to transfer knowledge from a large model to a small model, which can significantly improve the performance of the small model. In recent years, some contrastive learning-based knowledge distillation methods (i.e., SSKD and HSAKD) have achieved excellent performance by utilizing data augmentation. However, the worth of data augmentation has always been overlooked by researchers in knowledge distillation, and no work analyzes its role in particular detail. To fix this gap, we analyze the effect of data augmentation on knowledge distillation from a multi-sided perspective. In particular, we demonstrate the following properties of data augmentation: (a) data augmentation can effectively help knowledge distillation work even if the teacher model does not have the information about augmented samples, and our proposed diverse and rich Joint Data Augmentation (JDA) is more valid than single rotating in knowledge distillation; (b) using diverse and rich augmented samples to assist the teacher model in training can improve its performance, but not the performance of the student model; (c) the student model can achieve excellent performance when the proportion of augmented samples is within a suitable range; (d) data augmentation enables knowledge distillation to work better in a few-shot scenario; (e) data augmentation is seamlessly compatible with some knowledge distillation methods and can potentially further improve their performance. Enlightened by the above analysis, we propose a method named Cosine Confidence Distillation (CCD) to transfer the augmented samples’ knowledge more reasonably. And CCD achieves better performance than the latest SOTA HSAKD with fewer storage requirements on CIFAR-100 and ImageNet-1k. Our code is released at https://github.com/liwei-group/CCD.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

What Role Does Data Augmentation Play in Knowledge Distillation?

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Multi-perspective analysis on data augmentation in knowledge distillation
Wei Li ... Aiguo Song
Neurocomputing | VOL. 583
Wei Li, et. al.Wei Li ... Aiguo Song
05 Mar 2024
Neurocomputing | VOL. 583

Attention Based Data Augmentation for Knowledge Distillation with Few Data
Shengzhao Tian ... Duanbing Chen
Journal of Physics: Conference Series | VOL. 2171
Shengzhao Tian, et. al.Shengzhao Tian ... Duanbing Chen
01 Jan 2021
Journal of Physics: Conference Series | VOL. 2171

Discretization and decoupled knowledge distillation for arbitrary oriented object detection
Cheng Chen ... Hongwei Ding
Digital Signal Processing | VOL. 150
Cheng Chen, et. al.Cheng Chen ... Hongwei Ding
17 Apr 2024
Digital Signal Processing | VOL. 150

Maximizing discrimination capability of knowledge distillation with energy function
Seonghak Kim ... Daeshik Kim
Knowledge-Based Systems | VOL. 296
Seonghak Kim, et. al.Seonghak Kim ... Daeshik Kim
08 May 2024
Knowledge-Based Systems | VOL. 296

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

What Role Does Data Augmentation Play in Knowledge Distillation?

Abstract

Talk to us

Similar Papers