ESCEPE: Early-Exit Network Section-Wise Model Compression Using Self-distillation and Weight Clustering

Saeed Khalilian Gourtani,Nirvana Meratnia

doi:10.1145/3578354.3592872

Saeed Khalilian Gourtani, Nirvana Meratnia

Open Access

PDF Available

https://doi.org/10.1145/3578354.3592872

Copy DOI

Export

Save

Cite

Publication Date: May 8, 2023
Citations: 2	License type: cc-by

Affiliation: Eindhoven University of Technology

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Deploying deep learning models on resource-constrained (edge) devices is challenging due to their high computational demands and large model sizes. Early-exit neural networks are one of the approaches to make deep learning models more efficient for resource-constrained devices by reducing computational cost and latency. However, even with early-exit neural networks, the model size may remain a problem when deploying them on edge devices. To address this problem, we propose a section-wise model compression technique for compressing an early-exit neural network with intermediate classifiers. Our approach divides the model into a few sections and uses different compression settings in the weight clustering-based compression for each section to prevent accuracy loss in the intermediate sections. We demonstrate that knowledge distillation can be used in the retraining phase to transfer knowledge from uncompressed to compressed sections and to accelerate the recovery of performance reduction after the weight clustering stages. The performance evaluation of our proposed method on CIFAR10 and CIFAR100 datasets using ResNet and WideResNet architectures demonstrates that the proposed technique can compress an early-exit neural network with a high compression ratio with minimal impact on the accuracy of intermediate classifiers. The proposed method achieves compression ratios of more than 36 and 22 times for ResNet18 with three shallow classifiers on CIFAR10 and CIFAR100, respectively, with an ensemble accuracy loss of less than 1%. By eliminating shallow classifiers from the early-exit model, the static model can achieve compression ratios of up to 64 and 52 times for ResNet18 and WideResNet50, respectively, on the CIFAR10 dataset with an accuracy loss of less than 2.5%.

Full Text