Difficulty level-based knowledge distillation

Gyeongdo Ham,Yucheol Cho,Jae-Hyeok Lee,Minchan Kang,Gyuwon Choi,Daeshik Kim

doi:10.1016/j.neucom.2024.128375

Abstract

Knowledge distillation (KD) enables a simple model (student model) to perform as a complex model (teacher model) by distilling the knowledge from a pre-trained teacher model. Existing soft-label distillation methods often use a fixed temperature value in the softmax function to prevent overconfidence in the distillation process. However, this approach can lead to the suppression of important ‘dark knowledge’ for non-target classes in difficult samples, while also over-smoothing the confidence values for easier samples. To address this issue, we propose a novel approach called difficulty level-based knowledge distillation (DLKD), which considers the difficulty level of each sample to distill refined knowledge with high or low confidence, depending on the sample’s complexity. Our method calculates the difficulty level based on the Euclidean distance between the teacher model’s predictions and the pruned teacher model’s predictions. Experimental results demonstrate that our DLKD method outperforms state-of-the-art methods on challenging samples, including those with noisy labels or augmented data, achieving superior results on CIFAR-100, FGVR, and ImageNet datasets for image classification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Difficulty level-based knowledge distillation

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Similar Papers

Knowledge Distillation with Noisy Labels for Natural Language Understanding
Shivendra Bhardwaj ... Ali Ghodsi
-
Shivendra Bhardwaj, et. al.Shivendra Bhardwaj ... Ali Ghodsi
01 Jan 2020
01 Jan 2020

Multi-perspective analysis on data augmentation in knowledge distillation
Wei Li ... Aiguo Song
Neurocomputing | VOL. 583
Wei Li, et. al.Wei Li ... Aiguo Song
05 Mar 2024
Neurocomputing | VOL. 583

What Role Does Data Augmentation Play in Knowledge Distillation?
Wei Li ... Weiyan Liu
-
Wei Li, et. al.Wei Li ... Weiyan Liu
01 Jan 2023
01 Jan 2023

Iterative Learning with Open-set Noisy Labels
Yisen Wang ... Hongyuan Zha
-
Yisen Wang, et. al.Yisen Wang ... Hongyuan Zha
01 Jun 2018
01 Jun 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Difficulty level-based knowledge distillation

Abstract

Talk to us

Similar Papers

More From: Neurocomputing