Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning

Lin Zhang,Shaohuai Shi,Wei Wang,Bo Li

doi:10.1109/tcc.2022.3205918

Abstract

The second-order optimization methods, notably the D-KFAC (Distributed Kronecker Factored Approximate Curvature) algorithms, have gained traction on accelerating deep neural network (DNN) training on GPU clusters. However, existing D-KFAC algorithms require to compute and communicate a large volume of second-order information, i.e., Kronecker factors (KFs), before preconditioning gradients, resulting in large computation and communication overheads as well as a high memory footprint. In this paper, we propose DP-KFAC, a novel distributed preconditioning scheme that distributes the KF constructing tasks at different DNN layers to different workers. DP-KFAC not only retains the convergence property of the existing D-KFAC algorithms but also enables three benefits: reduced computation overhead in constructing KFs, no communication of KFs, and low memory footprint. Extensive experiments on a 64-GPU cluster show that DP-KFAC reduces the computation overhead by 1.55×-1.65×, the communication cost by 2.79×-3.15×, and the memory footprint by 1.14×-1.47× in each second-order update compared to the state-of-the-art D-KFAC methods. Our codes are available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/lzhangbv/kfac\_pytorch</uri> .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cloud Computing

Lead the way for us

Journal: IEEE Transactions on Cloud Computing	Publication Date: Jul 1, 2023
Citations: 1

Similar Papers

Neuroevolution in Deep Neural Networks: Current Trends and Future Challenges
Edgar Galvan ... Peter Mooney
IEEE Transactions on Artificial Intelligence | VOL. 2
Edgar Galvan, et. al.Edgar Galvan ... Peter Mooney
04 May 2021
IEEE Transactions on Artificial Intelligence | VOL. 2

FFT-based Gradient Sparsification for the Distributed Training of Deep Neural Networks
Linnan Wang ... Wei Wu
-
Linnan Wang, et. al.Linnan Wang ... Wei Wu
23 Jun 2020
23 Jun 2020

A Framework for Distributed Deep Neural Network Training with Heterogeneous Computing Platforms
Bontak Gu ... Arslan Munir
-
Bontak Gu, et. al.Bontak Gu ... Arslan Munir
01 Dec 2019
01 Dec 2019

Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method
Yohei Tsuji ... Akira Naruse
-
Yohei Tsuji, et. al.Yohei Tsuji ... Akira Naruse
05 Aug 2019
05 Aug 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cloud Computing