Layered mixed-precision training: A new training method for large-scale AI models

Hao Li,Yuzhu Wang,Yan Hong,Fei Li,Xiaohui Ji

doi:10.1016/j.jksuci.2023.101656

Abstract

How to efficiently and quickly train large-scale AI models has become a hot topic in recent deep learning. Mixed-precision training is an effective technique to speed up training and reduce memory usage. At present, the automatic mixed-precision training method mainly uses half-precision (FP16) for the matrix operations of forward and backward propagation of the entire model and accumulates the FP32 weight copies to avoid rounding errors. However, this method is not optimized for each layer individually, leading to poor convergence in large-scale model training because different layers have different data patterns. Therefore, this paper proposes a layered mixed-precision training method, which can flexibly adjust training precisions according to the contribution of each layer to the training effect. Applying the layered mixed-precision method, the ResNet model achieves a 1.9× speedup compared to the baseline and a lower percentage of accuracy loss. In addition, this paper combines the layered mixed-precision method with distributed training strategies. Combining data parallel training, the model achieves a 3.74× speedup by using four Tesla V100 GPUs. The applicability of the layered mixed-precision method in model parallel training has been verified. Combining optimized pipeline parallel training, the model achieves a 3.26× speedup by using three Tesla V100 GPUs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Layered mixed-precision training: A new training method for large-scale AI models

Abstract

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences

Lead the way for us

Journal: Journal of King Saud University - Computer and Information Sciences	Publication Date: Jul 24, 2023
License type: cc-by-nc-nd

Similar Papers

Efficient Asynchronous GCN Training on a GPU Cluster
Yi Zhang ... Dhrubajyoti Goswami
-
Yi Zhang, et. al.Yi Zhang ... Dhrubajyoti Goswami
01 Dec 2021
01 Dec 2021

Parallel Training: An ACP-Based Training Framework for Iterative Learning in Uncertain Driving Spaces
Jiangong Wang ... Yutong Wang
IEEE Transactions on Intelligent Vehicles | VOL. 8
Jiangong Wang, et. al.Jiangong Wang ... Yutong Wang
01 Apr 2023
IEEE Transactions on Intelligent Vehicles | VOL. 8

Parallel training and testing methods for complex image processing algorithms on distributed, heterogeneous, unreliable, and non-dedicated resources
Rubén Usamentiaga ... Daniel F García
-
Rubén Usamentiaga, et. al.Rubén Usamentiaga ... Daniel F García
23 Jan 2011
23 Jan 2011

MPQ-YOLO: Ultra low mixed-precision quantization of YOLO for edge devices deployment
Xinyu Liu ... Jiancheng Lv
Neurocomputing | VOL. 574
Xinyu Liu, et. al.Xinyu Liu ... Jiancheng Lv
30 Dec 2023
Neurocomputing | VOL. 574

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Layered mixed-precision training: A new training method for large-scale AI models

Abstract

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences