DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Conglong Li,Minjia Zhang,Cheng Li,Xiaoxia Wu,Connor Holmes,Yuxiong He,Zhewei Yao

doi:10.1609/aaai.v38i16.29810

Abstract

Recent advances on deep learning models come at the price of formidable training cost. The increasing model size is one of the root causes, but another less-emphasized fact is that data scale is actually increasing at a similar speed as model scale, and the training cost is proportional to both of them. Compared to the rapidly evolving model architecture, how to efficiently use the training data (especially for the expensive foundation model pretraining) is both less explored and difficult to realize due to the lack of a convenient framework that focus on data efficiency capabilities. To this end, we present DeepSpeed Data Efficiency, a framework that makes better use of data, increases training efficiency, and improves model quality. Specifically, we propose and combine two data efficiency techniques: efficient data sampling via a general curriculum learning library, and efficient data routing via a novel random layerwise token dropping technique. For GPT-3 1.3B language model pretraining, our work achieves 12.5x less data/time/cost ($3.7K if rent on Azure), while still maintaining 95% of model quality compared to baseline with full data and cost ($46.3K). For GPT-3 1.3B and BERT-large pretraining, our work can also achieve the same model quality with up to 2x less data/time/cost, or achieve better model quality under same data/time/cost. DeepSpeed Data Efficiency is easy to use and tune, enabling us to easily apply it and verify its benefit on additional tasks including GPT-3 MoE model pretraining and small-scale GPT-2/ViT finetuning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Mar 24, 2024
Citations: 4

Similar Papers

Energy Harvested and Cooperative Enabled Efficient Routing Protocol (EHCRP) for IoT-WBAN.
Muhammad Dawood Khan ... Zahid Ullah
Sensors | VOL. 20
Muhammad Dawood Khan, et. al.Muhammad Dawood Khan ... Zahid Ullah
03 Nov 2020
Sensors | VOL. 20

Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI
Mohamad Ballout ... Kai-Uwe Kühnberger
Procedia Computer Science | VOL. 222
Mohamad Ballout, et. al.Mohamad Ballout ... Kai-Uwe Kühnberger
01 Jan 2023
Procedia Computer Science | VOL. 222

A Multi-tasking and Multi-stage Chinese Minority Pre-trained Language Model
Bin Li ... Bin Sun
-
Bin Li, et. al.Bin Li ... Bin Sun
01 Jan 2021
01 Jan 2021

Neural Transfer Learning For Vietnamese Sentiment Analysis Using Pre-trained Contextual Language Models
An Pha Le ... Tran Vu Pham
-
An Pha Le, et. al.An Pha Le ... Tran Vu Pham
16 Dec 2021
16 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence