On Efficient Training of Large-Scale Deep Learning Models

Li Shen,Dacheng Tao,Yan Sun,Liang Ding,Xinmei Tian,Zhiyuan Yu

doi:10.1145/3700439

Abstract

The field of deep learning has witnessed significant progress in recent times, particularly in areas such as computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications, enhancing industrial productivity and facilitating social development. However, it suffers extremely from the unstable training process and stringent requirements of computational resources. With the increasing demands on the adaption of computational capacity, though numerous studies have explored the efficient training field to a certain extent, a comprehensive summarization/guideline on those general acceleration techniques of training large-scale deep learning models is still much anticipated. In this survey, we present a detailed review of the general techniques for training acceleration. We consider the fundamental update formulation and split its basic components into five main perspectives: (1) “data-centric,” including dataset regularization, data sampling, and data-centric curriculum learning techniques, which can significantly reduce the computational complexity of the data samples; (2) “model-centric,” including acceleration of basic modules, compression training, model initialization, and model-centric curriculum learning techniques, which focus on accelerating the training via reducing the calculations on parameters and providing better initialization; (3) “optimization-centric,” including the selection of learning rate, the employment of large batch size, the designs of efficient objectives, and model average techniques, which pay attention to the training policy and improving the generality for the large-scale models; (4) “budgeted training,” including some distinctive acceleration methods on source-constrained situations, e.g., for limitation on the total iterations; and (5) “system-centric,” including some efficient distributed frameworks and open source libraries that provide adequate hardware support for the implementation of the above-mentioned acceleration algorithms. By presenting this comprehensive taxonomy, our survey presents a comprehensive review to understand the general mechanisms within each component and their joint interaction. Meanwhile, we further provide a detailed analysis and discussion of future works on the development of general acceleration techniques, which could inspire us to re-think and design novel efficient paradigms. Overall, we hope that this survey will serve as a valuable guideline for general efficient training.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On Efficient Training of Large-Scale Deep Learning Models

Abstract

Talk to us

Similar Papers

More From: ACM Computing Surveys

Lead the way for us

Similar Papers

Effective flow properties heterolithic, cross-bedded tidal sandstones: Part 1. Surface-based modeling
Benoît Y G Massart ... Howard D Johnson
AAPG Bulletin | VOL. 100
Benoît Y G Massart, et. al.Benoît Y G Massart ... Howard D Johnson
01 May 2016
AAPG Bulletin | VOL. 100

An investigation into the performance of four cloud droplet activation parameterisations
E. Simpson ... P. Connolly
Geoscientific Model Development | VOL. 7
E. Simpson, et. al.E. Simpson ... P. Connolly
24 Jul 2014
Geoscientific Model Development | VOL. 7

Air-Sea Interaction in High Winds and the Role of Spray
Edgar L Andreas
-
Edgar L AndreasEdgar L Andreas
30 Sep 2000
30 Sep 2000

Distributed Training of Large-Scale Deep Learning Models in Commodity Hardware
Jubaer Ahmad ... Md Shahadat Hossain
-
Jubaer Ahmad, et. al.Jubaer Ahmad ... Md Shahadat Hossain
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On Efficient Training of Large-Scale Deep Learning Models

Abstract

Talk to us

Similar Papers

More From: ACM Computing Surveys