Compression of Convolution Neural Network Using Structured Pruning

Thaker Pragnesh,Biju R Mohan

doi:10.1109/i2ct54291.2022.9825302

Thaker Pragnesh, Biju R Mohan

https://doi.org/10.1109/i2ct54291.2022.9825302

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Deep Neural Network(DNN) is currently solving many real-life problems with excellent accuracy. However, de-signing a compact neural network and training them from scratch face two challenges. First, as in many problems, data-sets are relatively small; the model starts to overfit and has low validation accuracy. Second, training from scratch requires substantial computational resources. So many developers use transfer learning where we start from a standard model such as VGGNet with pre-trained weights. The pre-trained model, trained on a similar problem of high complexity. For example, for the Image Classification Problem, one can use VGG16, ResNet, AlexaNet, and GoogleNet. These pre-trained models are trained on ImageNet Dataset with millions of images of 1000 different classes. Such pre-trained models are enormous, and computation cost is huge during inference, making it unusable for many real-life situations where we need to deploy the model on resource-constrained devices. Thus, much work is going on to compress the standard pre-trained model to achieve the required accuracy with minimum computational cost. There are two types of pruning techniques. (i) Unstructured pruning: parameter-based pruning that prunes individual parameters. (ii) Structured pruning: Here, we prune a set of parameters that perform specific operations such as activation neurons and convolution operations. This paper focuses on structured pruning as it directly results in compression and faster execution. There are two strategies for structured pruning. (i) Saliency-based approach where we compute the impact of parameters on output and remove parameters with minimum value. The second one is similarities based where we find the redundant features and remove one of them such that pruning makes a minimum change in output. In this paper, we combine both the approach where we for the initial iteration we perform pruning based on saliency and later iteration; we perform pruning based on similarity-aware approach. Here we observed that a combined approach leads to better results for pruning.

Full Text