Abstract

The momentum method has become the workhorse in the deep learning community. To theoretical understand its success, researchers put efforts in demystifying its convergence properties when optimizing neural networks. For the convex problem, it is well-known that the triple momentum (TM) method owns the fastest theoretical convergence rate among all first-order methods. However, there exists no theoretical convergence results about the TM method in solving the non-convex neural networks training problem, let alone its acceleration guarantee. In this paper, we focus on the training process of the TM method for a two-layer ReLU neural network. Inspired by the accurate characterization of the high-resolution dynamical system, we consider the high-resolution ordinary differential equation (ODE) of the TM method. Under the over-parameterized assumption, we derive that the original non-convex optimization problem can be transformed to a strongly convex task. By applying an appropriate Lyapunov function, we prove that the TM method can linearly converge to a global minimum. Compared to the heavy ball method and Nesterov’s accelerated gradient method, our result provides the first guarantee for the acceleration of the TM method in training neural networks. Through empirical experiments, the accelerated convergence of the TM method and the effect of over-parameterization are validated.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call