Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Zhouyuan Huo,Bin Gu,Heng Huang

doi:10.1609/aaai.v35i9.16962

Abstract

Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. Warmup is one of nontrivial techniques to stabilize the convergence of large batch training. However, warmup is an empirical method and it is still unknown whether there is a better algorithm with theoretical underpinnings. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. We prove the convergence of our algorithm by introducing a new fine-grained analysis of gradient-based methods. Furthermore, the new analysis also helps to understand two other empirical tricks, layer-wise adaptive rate scaling and linear learning rate scaling. We conduct extensive experiments and demonstrate that the proposed algorithm outperforms gradual warmup technique by a large margin and defeats the convergence of the state-of-the-art large-batch optimizer in training advanced deep neural networks (ResNet, DenseNet, MobileNet) on ImageNet dataset.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: May 18, 2021
Citations: 3

Similar Papers

A new perspective for understanding generalization gap of deep neural networks trained with large batch sizes
Oyebade K Oyedotun ... Konstantinos Papadopoulos
Applied intelligence (Dordrecht, Netherlands) | VOL. 53
Oyebade K Oyedotun, et. al.Oyebade K Oyedotun ... Konstantinos Papadopoulos
24 Nov 2022
Applied intelligence (Dordrecht, Netherlands) | VOL. 53

Neuroevolution in Deep Neural Networks: Current Trends and Future Challenges
Edgar Galvan ... Peter Mooney
IEEE transactions on artificial intelligence | VOL. 2
Edgar Galvan, et. al.Edgar Galvan ... Peter Mooney
04 May 2021
IEEE transactions on artificial intelligence | VOL. 2

Efficient Dual Batch Size Deep Learning for Distributed Parameter Server Systems
Kuan-Wei Lu ... Jan-Jan Wu
-
Kuan-Wei Lu, et. al.Kuan-Wei Lu ... Jan-Jan Wu
01 Jun 2022
01 Jun 2022

FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters
Forrest N Iandola ... Kurt Keutzer
-
Forrest N Iandola, et. al.Forrest N Iandola ... Kurt Keutzer
01 Jun 2016
01 Jun 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence