Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

S Indrapriyadarsini,Takeshi Kamio,Hideki Asai,Shahrzad Mahboubi,Hiroshi Ninomiya

doi:10.3390/a15010006

S Indrapriyadarsini, Takeshi Kamio + Show 3 more

Open Access

https://doi.org/10.3390/a15010006

Copy DOI

Abstract

Gradient-based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have been shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method, though less commonly used in training neural networks, is known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks, and to briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.

Highlights

Neural networks have shown to have great potential in several applications
We evaluate the performance of the proposed Nesterov accelerated symmetric rank-1 quasi-Newton (L-Symmetric Rank-1 (SR1)-N) method in its limited memory form in comparison to conventional first order methods and second order methods
Since our aim is to investigate the effectiveness of the Nesterov’s acceleration on SR1, we focus on the performance comparison of oLBFGS-1, oLSR1 and the proposed oL-SR1-N and oL-MoSR1 methods

Summary

Introduction

Neural networks have shown to have great potential in several applications. there is a great demand for large scale algorithms that can train neural networks effectively and efficiently. While several works focus on sophisticated update strategies for improving the performance of the optimization algorithm, several works propose acceleration techniques such as incorporating momentum, Nesterov’s acceleration or Anderson’s accleration. While most of the second-order quasi-Newton methods used in training neural networks are rank-2 update methods, rank-1 methods are not widely used since they do not perform as well as the rank-2 update methods. We investigate if the Nesterov’s acceleration can be applied to the rank-1 update methods of the quasi-Newton family to improve performance. Training in neural networks is an iterative process in which the parameters are updated in order to minimize an objective function. The objective function E(w) under consideration is minimized by the iterative formula (2) where k ∈ N is the iteration count and vk+1 is the update vector, which is defined for each gradient algorithm.

Objectives

Methods

Results

Conclusion