Abstract

Pruning is a useful technique for decreasing the memory consumption and floating point operations (FLOPs) of deep convolutional neural network (CNN) models. Nevertheless, at modest pruning levels, current structured pruning approaches often lead to considerable declines in accuracy. Furthermore, existing approaches often treat pruning rates as super parameters, neglecting the sensitivity of different convolution layers. In this study, we propose a novel sensitivity-based method for channel pruning that utilizes second-order sensitivity as a criterion. The essential concept is to prune insensitive filters while retaining sensitive ones. We quantify the sensitivity of the filter using the sum of the sensitivities of all weights in the filter, rather than the magnitude-based metric frequently applied in the literature. Furthermore, a layer sensitivity approach based on the Hessian eigenvalues of each layer is introduced into the process of automatically choosing the most appropriate pruning rate for each layer. Experiments on a variety of modern CNN architectures demonstrate that we can considerably enhance the pruning rate while sacrificing a small amount of accuracy, resulting in a reduction of more than 60% in FLOPs on CIFAR-10. Notably, on ImageNet, pruning based on ResNet50 decreased the FLOPs by 56.3% while losing only 0.92% of accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call