Abstract

How to efficiently and quickly train large-scale AI models has become a hot topic in recent deep learning. Mixed-precision training is an effective technique to speed up training and reduce memory usage. At present, the automatic mixed-precision training method mainly uses half-precision (FP16) for the matrix operations of forward and backward propagation of the entire model and accumulates the FP32 weight copies to avoid rounding errors. However, this method is not optimized for each layer individually, leading to poor convergence in large-scale model training because different layers have different data patterns. Therefore, this paper proposes a layered mixed-precision training method, which can flexibly adjust training precisions according to the contribution of each layer to the training effect. Applying the layered mixed-precision method, the ResNet model achieves a 1.9× speedup compared to the baseline and a lower percentage of accuracy loss. In addition, this paper combines the layered mixed-precision method with distributed training strategies. Combining data parallel training, the model achieves a 3.74× speedup by using four Tesla V100 GPUs. The applicability of the layered mixed-precision method in model parallel training has been verified. Combining optimized pipeline parallel training, the model achieves a 3.26× speedup by using three Tesla V100 GPUs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call