Angel-PTM: A Scalable and Economical Large-Scale Pre-Training System in Tencent

Fangcheng Fu,Xiaonan Nie,Xupeng Miao,Jinbao Xue,Yangyu Tao,Yi Liu,Dian Jiao,Bin Cui

doi:10.14778/3611540.3611564

Fangcheng Fu, Xiaonan Nie + Show 6 more

Open Access

https://doi.org/10.14778/3611540.3611564

Copy DOI

Abstract

Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially Transformer models. Many products and services in Tencent Inc., such as WeChat, QQ, and Tencent Advertisement, have been opted in to gain the power of pre-trained models. In this work, we present Angel-PTM, a productive deep learning system designed for pre-training and fine-tuning Transformer models. Angel-PTM can train extremely large-scale models with hierarchical memory efficiently. The key designs of Angel-PTM are a fine-grained memory management via the Page abstraction and a unified scheduling method that coordinates computations, data movements, and communications. Furthermore, Angel-PTM supports extreme model scaling with SSD storage and implements a lock-free updating mechanism to address the SSD I/O bottlenecks. Experimental results demonstrate that Angel-PTM outperforms existing systems by up to 114.8% in terms of maximum model scale as well as up to 88.9% in terms of training throughput. Additionally, experiments on GPT3-175B and T5-MoE-1.2T models utilizing hundreds of GPUs verify our strong scalability.

Full Text