Abstract

Deep neural network models, especially CNNs, have a wide range of applications in many fields, but their high computational power requirements limit the deployment applications in many resource-constrained embedded devices. Pruning techniques reduce the computational power requirements of models by removing redundant structures from CNNs. Most existing static pruning methods use a global uniform pruning rate to prune pre-trained models and require finetuning to recover accuracy after pruning, resulting in high training costs, and the global uniform pruning rate is sub-optimal. While dynamic pruning methods perform pruning during training and use auxiliary modules to calculate the saliency scores of channels, but do not exploit its function of assisting network training. We propose an adaptive structured continuous sparse network pruning method based on the attention mechanism that prunes the original network during training. The attention-based channel similarity calculation module calculates the channel saliency scores while refining features to assist network training, and the adaptive continuous sparse control module gradually discretizes the channel saliency scores and assigns the pruning rate of each layer according to the preset pruning rate target. The pruned model is output after training and no additional fine-tuning is required. We validate the proposed method on CIFAR-10 and the large-scale dataset Imagenet using networks with different structures, and our method outperforms the comparative pruning methods at different pruning rates. In CIFAR-10, our method can reduce VGG-16 by 34.4% FLOPs while top-1 accuracy increased by 0.19%. In Imagenet, we can reduce ResNet-34 by 51.5% FLOPs, while the top-1 accuracy decreases by only 0.89%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call