Neural Network Structure Optimization by Simulated Annealing.

Chun Lin Kuo,Wai Kin Victor Chan,Ercan Engin Kuruoglu

doi:10.3390/e24030348

Chun Lin Kuo, Wai Kin Victor Chan + Show 1 more

Open Access

https://doi.org/10.3390/e24030348

Copy DOI

Journal: Entropy	Publication Date: Feb 28, 2022
Citations: 13	License type: CC BY 4.0

Affiliation: Tsinghua–Berkeley Shenzhen Institute

Abstract

A critical problem in large neural networks is over parameterization with a large number of weight parameters, which limits their use on edge devices due to prohibitive computational power and memory/storage requirements. To make neural networks more practical on edge devices and real-time industrial applications, they need to be compressed in advance. Since edge devices cannot train or access trained networks when internet resources are scarce, the preloading of smaller networks is essential. Various works in the literature have shown that the redundant branches can be pruned strategically in a fully connected network without sacrificing the performance significantly. However, majority of these methodologies need high computational resources to integrate weight training via the back-propagation algorithm during the process of network compression. In this work, we draw attention to the optimization of the network structure for preserving performance despite compression by pruning aggressively. The structure optimization is performed using the simulated annealing algorithm only, without utilizing back-propagation for branch weight training. Being a heuristic-based, non-convex optimization method, simulated annealing provides a globally near-optimal solution to this NP-hard problem for a given percentage of branch pruning. Our simulation results have shown that simulated annealing can significantly reduce the complexity of a fully connected network while maintaining the performance without the help of back-propagation.

Highlights

The successful development of artificial intelligence methods fueled by deep-learning has made an important impact in multiple industries
The recent mainstream strategies can be classified in mainly four approaches: (i) methods that aim at reducing the storage requirements via reducing the number of bits for representing the branch weights that indirectly lead to the reduction of computational load as well, such as quantization [9]; (ii) methods that aim to reduce the computational load by the decomposition of layers or simplifying activation functions, as in [10]. (iii) methods that aim to replace bigger networks with smaller networks that provide similar results for a chosen sub-task, as in knowledge distillation [11]; and (iv) methods that aim to reduce the number of parameters by increasing the sparsity of the neural network, as in pruning [12,13]
Another important point regarding the achievement of low-complexity neural networks with only pruning lies in the observation that among the training steps of deep learners mentioned above, back-propagation and gradients update make up the major part that consumes the most computing resources and that most of the existing methods still include these two steps in the pruning process

Summary

Introduction

The successful development of artificial intelligence methods fueled by deep-learning has made an important impact in multiple industries. The highly nonlinear nature of neural networks, makes it difficult to judge the importance of a branch from its weight only Another important point regarding the achievement of low-complexity neural networks with only pruning lies in the observation that among the training steps of deep learners mentioned above, back-propagation and gradients update make up the major part that consumes the most computing resources and that most of the existing methods still include these two steps in the pruning process. In order to reduce the total size of a model while doing inference in a mobile electronic device, various references found in the survey [21] focus on pruning the weight parameters directly instead of the gradients These pruning techniques can be element-wise, vector-wise, or block-wise, which can correspond to unstructured or structured pruning. A coefficient to control accept–reject rate the temperature in SA the coefficient to decrease the temperature a dataset for network training and testing

Related Works

Threshold Pruning Techniques

Gradient Pruning Techniques

Other Annealing Applications

Other Heuristic Algorithms

Network Optimization Using Simulated Annealing

Choice of State Neighborhood Structure

Acceptance–Rejection Ratio

Convergence

Cooling Scheme and Hyperparameters

Selection of Weight Parameters

Permutation after Edge Pruning

Experimental Study

Visualization to the Selection of Weight Parameters

Performance Trend under Different Pruning Scales

Time Complexity of the SA-Based Pruning Process

Findings

Discussion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Neural Network Structure Optimization by Simulated Annealing.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Hybrid PSO-SA algorithm for training a Neural Network for Classification
Sriram G Sanjeevi
International Journal of Computer Science, Engineering and Applications | VOL. 1
Sriram G SanjeeviSriram G Sanjeevi
31 Dec 2011
International Journal of Computer Science, Engineering and Applications | VOL. 1

Optimization of Neural Network Training for Wine Quality Classification Using Simulated Annealing
Mingfei Duan
SHS Web of Conferences | VOL. 144
Mingfei DuanMingfei Duan
01 Jan 2021
SHS Web of Conferences | VOL. 144

A New Optimization Selection for Test Process of Equipment Manufacturing Based on Fused Algorithm
Fang Li ... Gang Liu
-
Fang Li, et. al.Fang Li ... Gang Liu
01 Jan 2015
01 Jan 2015

<title>Image feature analysis for classification of microcalcifications in digital mammography: neural networks and genetic algorithms</title>
Chris Y Wu ... Kenneth M Hanson
-
Chris Y Wu, et. al.Chris Y Wu ... Kenneth M Hanson
25 Apr 1997
25 Apr 1997

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Neural Network Structure Optimization by Simulated Annealing.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy