Abstract
The over-parameterization of neural networks and the local optimality of backpropagation algorithm have been two major problems associated with deep-learning. In order to reduce the redundancy of neural network parameters, the conventional approach has been to prune branches with small weights. However, this only solves the problem of parameter redundancy, not providing any global optimality guarantees. In this paper, we overturn back-propagation and combine the sparse network optimization problem and the network weight optimization problem using a non-convex optimization method, namely Simulated Annealing. This method can complete network training under the premise of controlling the amount of parameters. Different from simply updating network parameters using gradient descent, our method simultaneously optimizes the topology of the sparse network. With the guarantee of global optimality of Simulated Annealing solution, the performance of the sparse network optimized by our method has exceeded the one trained by backpropagation only.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have