Abstract

Pruning techniques for neural networks are applied to achieve superior model compression while maintaining accuracy. Common pruning approaches rely on single granularity (e.g., weights, channels, or layers) compression techniques and miss valuable optimization potential. This major limitation results in a sequence of obsolete layers with a small number of channels or highly sparse weights. In this paper, we present a novel pruning approach to address this issue. More precisely, in this work, a Multi-Grain Pruning (MGP) framework is proposed to optimize neural network architectures from coarse to fine in up to four different granularities. Besides the traditional pruning granularities, a new granularity is introduced on so-called blocks, which consist of multiple layers. By combining multiple pruning granularities, models can be optimized even further. We evaluated the proposed framework with VGG-19 on CIFAR-10 and CIFAR-100 as well as ResNet-56 on CIFAR-10 and ResNet-50 on ImageNet. The results show that our technique achieves from 31.9x up to 185.3x model compression rates with an accuracy drop from 0.08% up to 1.73% with VGG-19 on CIFAR-10.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call