HKDP: A Hybrid Approach On Knowledge Distillation and Pruning for Neural Network Compression

Che Hongle,Shi Qirui,Wen Quan,Chen Juan

doi:10.1109/iccwamtip53232.2021.9674054

Abstract

A popular method for shrinking over-parameterized networks nowadays is pruning, which can efficiently reduce the number of computational parameters and computational cost of the network and has almost the same high accuracy as the original network. The general weighted pruning algorithm can only reduce the number of parameters based on the original network structure, but cannot reduce the width and depth of the pruned network. While the knowledge distillation algorithm can solve the problem by compressing the network structure, it cannot make further modifications on the processed network. To further reduce the network structure, we propose a model compression algorithm, HKDP, a hybrid method combining knowledge distillation and network pruning that can significantly reduce the overall size of the network and maintain substantial accuracy. This approach obtains the advantages of knowledge distillation and pruning, which achieves 10 times higher compression rate and 2 percent higher accuracy than using either algorithm alone. Concretely, we apply a stage-wise knowledge distillation algorithm in the front that can quickly and efficiently reduce the original model structure; we also apply a Stochastic Gradient Descent (SGD) based pruning method and introduce the concept of global sparsity, which allows us to customize the compression rate of the model. Our experiments on CIFAR-10 and MNIST show that our hybrid optimization algorithm has higher model accuracy and model compression ratio compared to other competitors' network compression algorithms.

Full Text