Neural network pruning provides a promising prospect for the deployment of neural networks on embedded or mobile devices with limited resources. Although current structured strategies are unconstrained by specific hardware architecture in the phase of forward inference, the decline in classification accuracy of structured methods is beyond the tolerance at the level of general pruning rate. This inspires us to develop a technique that satisfies high pruning rate with a small decline in accuracy and has the general nature of structured pruning. In this paper, we propose a new pruning method, namely KEP (Kernel Elements Pruning), to compress deep convolutional neural networks by exploring the significance of elements in each kernel plane and removing unimportant elements. In this method, we apply a controllable regularization penalty to constrain unimportant elements by adding a prior knowledge mask and obtain a compact model. In the calculation procedure of forward inference, we introduce a sparse convolution operation which is different from the sliding window to eliminate invalid zero calculations and verify the effectiveness of the operation for further deployment on FPGA. A massive variety of experiments demonstrate the effectiveness of KEP on two datasets: CIFAR-10 and ImageNet. Specially, with few indexes of non-zero weights introduced, KEP has a significant improvement over the latest structured methods in terms of parameter and float-point operation (FLOPs) reduction, and performs well on large datasets.
Read full abstract