Improving Neural Network Architecture Compression by Multi-Grain Pruning

Kevin Kollek,Miguel Angel Aguilar,Marco Braun,Anton Kummert

doi:10.1109/pic53636.2021.9687071

Abstract

Pruning techniques for neural networks are applied to achieve superior model compression while maintaining accuracy. Common pruning approaches rely on single granularity (e.g., weights, channels, or layers) compression techniques and miss valuable optimization potential. This major limitation results in a sequence of obsolete layers with a small number of channels or highly sparse weights. In this paper, we present a novel pruning approach to address this issue. More precisely, in this work, a Multi-Grain Pruning (MGP) framework is proposed to optimize neural network architectures from coarse to fine in up to four different granularities. Besides the traditional pruning granularities, a new granularity is introduced on so-called blocks, which consist of multiple layers. By combining multiple pruning granularities, models can be optimized even further. We evaluated the proposed framework with VGG-19 on CIFAR-10 and CIFAR-100 as well as ResNet-56 on CIFAR-10 and ResNet-50 on ImageNet. The results show that our technique achieves from 31.9x up to 185.3x model compression rates with an accuracy drop from 0.08% up to 1.73% with VGG-19 on CIFAR-10.

Full Text