Abstract

How to efficiently and quickly train large-scale AI models has become a hot topic in recent deep learning. Mixed-precision training is an effective technique to speed up training and reduce memory usage. At present, the automatic mixed-precision training method mainly uses half-precision (FP16) for the matrix operations of forward and backward propagation of the entire model and accumulates the FP32 weight copies to avoid rounding errors. However, this method is not optimized for each layer individually, leading to poor convergence in large-scale model training because different layers have different data patterns. Therefore, this paper proposes a layered mixed-precision training method, which can flexibly adjust training precisions according to the contribution of each layer to the training effect. Applying the layered mixed-precision method, the ResNet model achieves a 1.9× speedup compared to the baseline and a lower percentage of accuracy loss. In addition, this paper combines the layered mixed-precision method with distributed training strategies. Combining data parallel training, the model achieves a 3.74× speedup by using four Tesla V100 GPUs. The applicability of the layered mixed-precision method in model parallel training has been verified. Combining optimized pipeline parallel training, the model achieves a 3.26× speedup by using three Tesla V100 GPUs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.