Abstract

A critical problem in large neural networks is over parameterization with a large number of weight parameters, which limits their use on edge devices due to prohibitive computational power and memory/storage requirements. To make neural networks more practical on edge devices and real-time industrial applications, they need to be compressed in advance. Since edge devices cannot train or access trained networks when internet resources are scarce, the preloading of smaller networks is essential. Various works in the literature have shown that the redundant branches can be pruned strategically in a fully connected network without sacrificing the performance significantly. However, majority of these methodologies need high computational resources to integrate weight training via the back-propagation algorithm during the process of network compression. In this work, we draw attention to the optimization of the network structure for preserving performance despite compression by pruning aggressively. The structure optimization is performed using the simulated annealing algorithm only, without utilizing back-propagation for branch weight training. Being a heuristic-based, non-convex optimization method, simulated annealing provides a globally near-optimal solution to this NP-hard problem for a given percentage of branch pruning. Our simulation results have shown that simulated annealing can significantly reduce the complexity of a fully connected network while maintaining the performance without the help of back-propagation.

Highlights

  • The successful development of artificial intelligence methods fueled by deep-learning has made an important impact in multiple industries

  • The recent mainstream strategies can be classified in mainly four approaches: (i) methods that aim at reducing the storage requirements via reducing the number of bits for representing the branch weights that indirectly lead to the reduction of computational load as well, such as quantization [9]; (ii) methods that aim to reduce the computational load by the decomposition of layers or simplifying activation functions, as in [10]. (iii) methods that aim to replace bigger networks with smaller networks that provide similar results for a chosen sub-task, as in knowledge distillation [11]; and (iv) methods that aim to reduce the number of parameters by increasing the sparsity of the neural network, as in pruning [12,13]

  • Another important point regarding the achievement of low-complexity neural networks with only pruning lies in the observation that among the training steps of deep learners mentioned above, back-propagation and gradients update make up the major part that consumes the most computing resources and that most of the existing methods still include these two steps in the pruning process

Read more

Summary

Introduction

The successful development of artificial intelligence methods fueled by deep-learning has made an important impact in multiple industries. The highly nonlinear nature of neural networks, makes it difficult to judge the importance of a branch from its weight only Another important point regarding the achievement of low-complexity neural networks with only pruning lies in the observation that among the training steps of deep learners mentioned above, back-propagation and gradients update make up the major part that consumes the most computing resources and that most of the existing methods still include these two steps in the pruning process. In order to reduce the total size of a model while doing inference in a mobile electronic device, various references found in the survey [21] focus on pruning the weight parameters directly instead of the gradients These pruning techniques can be element-wise, vector-wise, or block-wise, which can correspond to unstructured or structured pruning. A coefficient to control accept–reject rate the temperature in SA the coefficient to decrease the temperature a dataset for network training and testing

Related Works
Threshold Pruning Techniques
Gradient Pruning Techniques
Other Annealing Applications
Other Heuristic Algorithms
Network Optimization Using Simulated Annealing
Choice of State Neighborhood Structure
Acceptance–Rejection Ratio
Convergence
Cooling Scheme and Hyperparameters
Selection of Weight Parameters
Permutation after Edge Pruning
Experimental Study
Visualization to the Selection of Weight Parameters
Performance Trend under Different Pruning Scales
Time Complexity of the SA-Based Pruning Process
Findings
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.