Zero-Shot Knowledge Distillation Using Label-Free Adversarial Perturbation With Taylor Approximation

Kang Il Lee,Byung Cheol Song,Seunghyun Lee

doi:10.1109/access.2021.3066513

Kang Il Lee, Byung Cheol Song + Show 1 more

Open Access

https://doi.org/10.1109/access.2021.3066513

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 16	License type: CC BY 4.0

Affiliation: Inha University

Abstract

Knowledge distillation (KD) is one of the most effective neural network light-weighting techniques when training data is available. However, KD is seldom applicable to an environment where it is difficult or impossible to access training data. To solve this problem, a complete zero-shot KD (C-ZSKD) based on adversarial learning has been recently proposed, but the so-called biased sample generation problem limits the performance of C-ZSKD. To overcome this limitation, this paper proposes a novel C-ZSKD algorithm that utilizes a label-free adversarial perturbation. The proposed adversarial perturbation derives a constraint of the squared norm of gradient style by using the convolution of probability distributions and the 2nd order Taylor series approximation. The constraint serves to increase the variance of the adversarial sample distribution, which makes the student model learn the decision boundary of the teacher model more accurately without labeled data. Through analysis of the distribution of adversarial samples on the embedded space, this paper also provides an insight into the characteristics of adversarial samples that are effective for adversarial learning-based C-ZSKD.

Highlights

With the advent of effective solutions [1], [2] to the gradient vanishing problem, deep neural networks that provide high recognition performance have been developed rapidly
To solve an inherent biased sample generation problem of adversarial learning (AL)-based complete zero-shot KD (C-zero-shot KD (ZSKD)), we propose a method to increase the variance of the adversarial sample distribution by using the convolution of probability distributions and Taylor series approximation
By analyzing the distribution of adversarial samples in the embedding space, this paper provides an insight into the characteristics of adversarial samples that are useful for AL-based C-ZSKD

Summary

Introduction

With the advent of effective solutions [1], [2] to the gradient vanishing problem, deep neural networks that provide high recognition performance have been developed rapidly. Hinton et al firstly introduced the concept of knowledge distillation (KD) to effectively lighten neural networks [3]. KD is a technique that transfers knowledge from large networks that perform similar tasks to relatively small networks. KD allows small networks to overcome the limitations of training, but tries to have the same performance as large networks. Conventional KD techniques implicitly assume that training data is always available. A. COMPLETE ZERO-SHOT KNOWLEDGE DISTILLATION Unlike G-ZSKD, C-ZSKD can be said to be a highly scalable method because it operates even in an environment where training data is completely blocked. Assuming that the label of each class follows a Dirichlet distribution (D), the concentration parameter of D was derived from the weight W. The pseudo labels (y) were sampled from D, and the corresponding pseudo image (x∗) was generated according to Eq (1)

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Zero-Shot Knowledge Distillation Using Label-Free Adversarial Perturbation With Taylor Approximation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Efficient Edge Intelligence In the Era of Big Data

-

05 Aug 2021
05 Aug 2021

Can Adversarial Weight Perturbations Inject Neural Backdoors
Siddhant Garg ... Vibhor Goel
-
Siddhant Garg, et. al.Siddhant Garg ... Vibhor Goel
19 Oct 2020
19 Oct 2020

GrOD : Deep Learning with Gradients Orthogonal Decomposition for Knowledge Transfer, Distillation, and Adversarial Training
Haoyi Xiong ... Zeyu Chen
ACM Transactions on Knowledge Discovery from Data | VOL. 16
Haoyi Xiong, et. al.Haoyi Xiong ... Zeyu Chen
08 Sep 2022
ACM Transactions on Knowledge Discovery from Data | VOL. 16

Knowledge Distillation with Distribution Mismatch
Dang Nguyen ... Svetha Venkatesh
-
Dang Nguyen, et. al.Dang Nguyen ... Svetha Venkatesh
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Zero-Shot Knowledge Distillation Using Label-Free Adversarial Perturbation With Taylor Approximation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access