Filter pruning has achieved remarkable success in reducing memory consumption and speeding up inference for convolutional neural networks (CNNs). Some prior works, such as heuristic methods, attempted to search for suitable sparse structures during the pruning process, which may be expensive and time-consuming. In this paper, an efficient cross-layer importance evaluation (CIE) method is proposed to automatically calculate proportional relationships among convolutional layers. Firstly, every layer is pruned separately by grid sampling way to obtain the accuracy of the model for all sampling points. And then, contribution matrices are built to describe the importance of each layer to model accuracy. Finally, the binary search algorithm is used to search the optimal sparse structure under a target pruned value. Extensive experiments on multiple representative image classification tasks demonstrate that proposed method acquires better compression performance under a little time cost compared to existing pruning algorithms. For instance, it reduces more than 50% FLOPs with only a small loss of 0.93% and 0.43% in the top-1 and top-5 accuracy for ResNet50, respectively. At the cost of only 0.24% accuracy loss, the pruned VGG19 model parameters are successfully compressed by 27.23× and the throughput has increased by 2.46×. On the whole, CIE has an excellent effect on the deployment and application of the CNNs model in edge device in terms of efficiency and accuracy.
Read full abstract