Progressively Knowledge Distillation via Re-parameterizing Diffusion Reverse Process

Xufeng Yao,Fanbin Lu,Bei Yu,Yuechen Zhang,Xinyun Zhang,Wenqian Zhao

doi:10.1609/aaai.v38i15.29579

Abstract

Knowledge distillation aims at transferring knowledge from the teacher model to the student one by aligning their distributions. Feature-level distillation often uses L2 distance or its variants as the loss function, based on the assumption that outputs follow normal distributions. This poses a significant challenge when distribution gaps are substantial since this loss function ignores the variance term. To address the problem, we propose to decompose the transfer objective into small parts and optimize it progressively. This process is inspired by diffusion models from which the noise distribution is mapped to the target distribution step by step. However, directly employing diffusion models is impractical in the distillation scenario due to its heavy reverse process. To overcome this challenge, we adopt the structural re-parameterization technique to generate multiple student features to approximate the teacher features sequentially. The multiple student features are combined linearly in inference time without extra cost. We present extensive experiments performed on various transfer scenarios, such as CNN-to-CNN and Transformer-to-CNN, that validate the effectiveness of our approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Progressively Knowledge Distillation via Re-parameterizing Diffusion Reverse Process

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Efficient Optimization of a Support Vector Regression Model with Natural Logarithm of the Hyperbolic Cosine Loss Function for Broader Noise Distribution
Aykut Kocaoğlu
Applied Sciences | VOL. 14
Aykut KocaoğluAykut Kocaoğlu
25 Apr 2024
Applied Sciences | VOL. 14

Fast Unsupervised Brain Anomaly Detection and Segmentation with Diffusion Models
Walter H L Pinaya ... David Werring
-
Walter H L Pinaya, et. al.Walter H L Pinaya ... David Werring
01 Jan 2021
01 Jan 2021

Contour wavelet diffusion – a fast and high-quality facial expression generation model
Chenwei Xu ... Yuntao Zou
Connection Science | VOL. 36
Chenwei Xu, et. al.Chenwei Xu ... Yuntao Zou
14 Feb 2024
Connection Science | VOL. 36

Maximum likelihood optimal and robust Support Vector Regression with lncosh loss function
Omer Karal
Neural Networks | VOL. 94
Omer KaralOmer Karal
30 Jun 2017
Neural Networks | VOL. 94

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Progressively Knowledge Distillation via Re-parameterizing Diffusion Reverse Process

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence