Eigenvalue-Corrected Natural Gradient Based on a New Approximation

Kaixin Gao,Zidong Wang,Dachuan Xu,Shuangling Wang,Zheng-Hai Huang,Fan Yu,Xiaolei Liu,Min Wang

doi:10.1142/s0217595923400055

Abstract

Using second-order optimization methods for training deep neural networks (DNNs) has attracted many researchers. A recently proposed method, Eigenvalue-corrected Kronecker Factorization (EKFAC), proposed an interpretation by viewing natural gradient update as a diagonal method and corrects the inaccurate re-scaling factor in the KFAC eigenbasis. What’s more, a new method to approximate the natural gradient called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC) is also proposed, in which the Fisher information matrix (FIM) is approximated as a constant multiplied by the Kronecker product of two matrices and the traces can be kept equal before and after the approximation. In this work, we combine the ideas of these two methods and propose Trace-restricted Eigenvalue-corrected Kronecker Factorization (TEKFAC). The proposed method not only corrects the inexact re-scaling factor under the Kronecker-factored eigenbasis, but also considers the new approximation method and the effective damping technique adopted by TKFAC. We also discuss the differences and relationships among the related Kronecker-factored approximations. Empirically, our method outperforms SGD with momentum, Adam, EKFAC and TKFAC on several DNNs.

Full Text