Analysis and comparison of two-level KFAC methods for training deep neural networks

Abdoulaye Koroko,Ani Anciaux-Sedrakian,Ibtihel Ben Gharbia,Valérie Garès,Mounir Haddou,Quang Huy Tran

doi:10.1080/10556788.2024.2380684

Abstract

As a second-order method, the Natural Gradient Descent (NGD) has the ability to accelerate training of neural networks. However, due to the prohibitive computational and memory costs of computing and inverting the Fisher Information Matrix (FIM), efficient approximations are necessary to make NGD scalable to Deep Neural Networks (DNNs). Many such approximations have been attempted. The most sophisticated of these is KFAC, which approximates the FIM as a block-diagonal matrix, where each block corresponds to a layer of the neural network. By doing so, KFAC ignores the interactions between different layers. In this work, we investigate the interest of restoring some low-frequency interactions between the layers by means of two-level methods. Inspired from domain decomposition, several two-level corrections to KFAC using different coarse spaces are proposed and assessed. The obtained results show that incorporating the layer interactions in this fashion does not really improve the performance of KFAC. This suggests that it is safe to discard the off-diagonal blocks of the FIM, since the block-diagonal approach is sufficiently robust, accurate and economical in computation time.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Analysis and comparison of two-level KFAC methods for training deep neural networks

Abstract

Talk to us

Similar Papers

More From: Optimization Methods and Software

Lead the way for us

Similar Papers

Component-Wise Natural Gradient Descent - An Efficient Neural Network Optimization
Tran Van Sang ... Mhd Irvan
-
Tran Van Sang, et. al.Tran Van Sang ... Mhd Irvan
01 Nov 2022
01 Nov 2022

Projective Fisher Information for Natural Gradient Descent
Piyush Kaul ... Brejesh Lall
IEEE Transactions on Artificial Intelligence | VOL. 4
Piyush Kaul, et. al.Piyush Kaul ... Brejesh Lall
01 Apr 2023
IEEE Transactions on Artificial Intelligence | VOL. 4

Eigenvalue-Corrected Natural Gradient Based on a New Approximation
Kaixin Gao ... Zidong Wang
Asia-Pacific Journal of Operational Research | VOL. 40
Kaixin Gao, et. al.Kaixin Gao ... Zidong Wang
01 Feb 2023
Asia-Pacific Journal of Operational Research | VOL. 40

A Trace-restricted Kronecker-Factored Approximation to Natural Gradient
Kaixin Gao ... Xiaolei Liu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Kaixin Gao, et. al.Kaixin Gao ... Xiaolei Liu
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analysis and comparison of two-level KFAC methods for training deep neural networks

Abstract

Talk to us

Similar Papers

More From: Optimization Methods and Software