Abstract

The powerful performance of deep learning is evident to all. With the deepening of research, neural networks have become more complex and not easily generalized to resource-constrained devices. The emergence of a series of model compression algorithms makes artificial intelligence on edge possible. Among them, structured model pruning is widely utilized because of its versatility. Structured pruning prunes the neural network itself and discards some relatively unimportant structures to compress the model’s size. However, in the previous pruning work, problems such as evaluation errors of networks, empirical determination of pruning rate, and low retraining efficiency remain. Therefore, we propose an accurate, objective, and efficient pruning algorithm—Combine-Net, introducing Adaptive BN to eliminate evaluation errors, the Kneedle algorithm to determine the pruning rate objectively, and knowledge distillation to improve the efficiency of retraining. Results show that, without precision loss, Combine-Net achieves 95% parameter compression and 83% computation compression on VGG16 on CIFAR10, 71% of parameter compression and 41% computation compression on ResNet50 on CIFAR100. Experiments on different datasets and models have proved that Combine-Net can efficiently compress the neural network’s parameters and computation.

Highlights

  • With the increasing popularity of Internet of Things technology (IoT), different kinds of sensors emerge, carrying a massive amount of raw data

  • Deep learning models are not readily able to be deployed on resource-constrained devices or work smoothly for applications with stringent Quality of Experience (QoE) requirements

  • The experiments of VGG16 on CIFAR10 showed that: (1) after pruning with a 95% rate, the accuracy of the sub-network corrected by Adaptive Batch Normalization (BN) operation was improved by about 40% compared with the one without this method, which reflects the performance of the sub-network better

Read more

Summary

Introduction

With the increasing popularity of Internet of Things technology (IoT), different kinds of sensors emerge, carrying a massive amount of raw data. How to efficiently extract useful knowledge from such an amount of raw data has become a problem. To achieve better results, deep learning models usually have to go wider and deeper, which incurs high computational costs in terms of storage, memory, latency, and energy. Deep learning models are not readily able to be deployed on resource-constrained devices or work smoothly for applications with stringent Quality of Experience (QoE) requirements. Compressing a computationally intensive model is a potential solution to facilitate ubiquitous deep learning models on resource-constrained devices or for applications under harsh QoE conditions. From the aforementioned methods, pruning, requiring much less expertise, can be applied to pre-trained models, and the accuracy loss through retraining can be constrained. The above merits make pruning a better choice for model compression

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.