NTCE-KD: Non-Target-Class-Enhanced Knowledge Distillation.

Chuan Li,Long Lan,Yan Ding,Xiao Teng

doi:10.3390/s24113617

Abstract

Most logit-based knowledge distillation methods transfer soft labels from the teacher model to the student model via Kullback-Leibler divergence based on softmax, an exponential normalization function. However, this exponential nature of softmax tends to prioritize the largest class (target class) while neglecting smaller ones (non-target classes), leading to an oversight of the non-target classes's significance. To address this issue, we propose Non-Target-Class-Enhanced Knowledge Distillation (NTCE-KD) to amplify the role of non-target classes both in terms of magnitude and diversity. Specifically, we present a magnitude-enhanced Kullback-Leibler (MKL) divergence multi-shrinking the target class to enhance the impact of non-target classes in terms of magnitude. Additionally, to enrich the diversity of non-target classes, we introduce a diversity-based data augmentation strategy (DDA), further enhancing overall performance. Extensive experimental results on the CIFAR-100 and ImageNet-1k datasets demonstrate that non-target classes are of great significance and that our method achieves state-of-the-art performance across a wide range of teacher-student pairs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Jun 3, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

NTCE-KD: Non-Target-Class-Enhanced Knowledge Distillation.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation
Taehyeon Kim ... Sangwook Cho
-
Taehyeon Kim, et. al.Taehyeon Kim ... Sangwook Cho
01 Aug 2021
01 Aug 2021

Extracting the Most Discriminant Subset from a Pool of Candidates to Optimize Discriminant Classifier Training
Carlos E Vivaracho ... Q Isaac Moro
-
Carlos E Vivaracho, et. al.Carlos E Vivaracho ... Q Isaac Moro
01 Jan 2003
01 Jan 2003

A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications
Jin Ren ... Shunzhi Yang
PeerJ Computer Science | VOL. 9
Jin Ren, et. al.Jin Ren ... Shunzhi Yang
01 Nov 2023
PeerJ Computer Science | VOL. 9

Revisiting Label Smoothing Regularization with Knowledge Distillation
Jiyue Wang ... Qianhua He
Applied Sciences | VOL. 11
Jiyue Wang, et. al.Jiyue Wang ... Qianhua He
20 May 2021
Applied Sciences | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NTCE-KD: Non-Target-Class-Enhanced Knowledge Distillation.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)