EPSD: Early Pruning with Self-Distillation for Efficient Model Compression

Dong Chen,Fachao Zhang,Jian Tang,Zhengping Che,Ning Liu,Rui Ma,Yichen Zhu,Yi Chang,Xiaofeng Mou

doi:10.1609/aaai.v38i10.29004

Abstract

Neural network compression techniques, such as knowledge distillation (KD) and network pruning, have received increasing attention. Recent work `Prune, then Distill' reveals that a pruned student-friendly teacher network can benefit the performance of KD. However, the conventional teacher-student pipeline, which entails cumbersome pre-training of the teacher and complicated compression steps, makes pruning with KD less efficient. In addition to compressing models, recent compression techniques also emphasize the aspect of efficiency. Early pruning demands significantly less computational cost in comparison to the conventional pruning methods as it does not require a large pre-trained model. Likewise, a special case of KD, known as self-distillation (SD), is more efficient since it requires no pre-training or student-teacher pair selection. This inspires us to collaborate early pruning with SD for efficient model compression. In this work, we propose the framework named Early Pruning with Self-Distillation (EPSD), which identifies and preserves distillable weights in early pruning for a given SD task. EPSD efficiently combines early pruning and self-distillation in a two-step process, maintaining the pruned network's trainability for compression. Instead of a simple combination of pruning and SD, EPSD enables the pruned network to favor SD by keeping more distillable weights before training to ensure better distillation of the pruned network. We demonstrated that EPSD improves the training of pruned networks, supported by visual and quantitative analyses. Our evaluation covered diverse benchmarks (CIFAR-10/100, Tiny-ImageNet, full ImageNet, CUB-200-2011, and Pascal VOC), with EPSD outperforming advanced pruning and SD techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

EPSD: Early Pruning with Self-Distillation for Efficient Model Compression

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

HKDP: A Hybrid Approach On Knowledge Distillation and Pruning for Neural Network Compression
Che Hongle ... Wen Quan
-
Che Hongle, et. al.Che Hongle ... Wen Quan
17 Dec 2021
17 Dec 2021

Learning Slimming SAR Ship Object Detector Through Network Pruning and Knowledge Distillation
Shiqi Chen ... Wei Wang
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 14
Shiqi Chen, et. al.Shiqi Chen ... Wei Wang
15 Dec 2020
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 14

Multi-target Knowledge Distillation via Student Self-reflection
Jianping Gou ... Yibing Zhan
International Journal of Computer Vision | VOL. 131
Jianping Gou, et. al.Jianping Gou ... Yibing Zhan
25 Apr 2023
International Journal of Computer Vision | VOL. 131

The Influence Mechanism of Knowledge Network Allocation Mechanism on Knowledge Distillation of High-Tech Enterprises.
Jianlin Yuan ... Qilei Jiang
Computational intelligence and neuroscience | VOL. 2022
Jianlin Yuan, et. al.Jianlin Yuan ... Qilei Jiang
25 Apr 2022
Computational intelligence and neuroscience | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

EPSD: Early Pruning with Self-Distillation for Efficient Model Compression

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence