ScaleDNN: Data Movement Aware DNN Training on Multi-GPU

Weizheng Xu,Youtao Zhang,Ashutosh Pattnaik,Yanzhi Wang,Geng Yuan,Xulong Tang

doi:10.1109/iccad51958.2021.9643503

Abstract

Training Deep Neural Networks (DNNs) models is a time-consuming process that requires immense amount of data and computation. To this end, GPUs are widely adopted to accelerate the training process. However, the delivered training performance rarely scales with the increase in the number of GPUs. The major reason behind this is the large amount of data movement that prevents the system from providing the GPUs with the required data in a timely fashion. In this paper, we propose ScaleDNN, a framework that systematically and comprehensively investigates and optimizes data-parallel training on two types of multi-GPU systems (PCIe-based and NVLink-based). Specifically, ScaleDNN performs: i) CPU-centric input batch splitting, ii) mini-batch data pre-loading, and iii) model parameter compression to effectively a) reduce the data movement between the CPU and multiple GPUs, and b) hide the data movement overheads by overlapping the data transfer with the GPU computation. Our experimental results show that ScaleDNN achieves up to 39.38%, with an average of 17.96% execution time saving over modern data parallelism on PCIe-based multi-GPU system. The corresponding execution time reduction on NVLink-based multi-GPU system is up to 19.20% with an average of 10.26%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ScaleDNN: Data Movement Aware DNN Training on Multi-GPU

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Optimizing makespan and resource utilization for multi-DNN training in GPU cluster
Zhongjin Li ... Francesco Piccialli
Future Generation Computer Systems | VOL. 125
Zhongjin Li, et. al.Zhongjin Li ... Francesco Piccialli
24 Jun 2021
Future Generation Computer Systems | VOL. 125

Ascend: A Scalable and Energy-Efficient Deep Neural Network Accelerator With Photonic Interconnects
Yuan Li ... Ke Wang
IEEE Transactions on Circuits and Systems I: Regular Papers | VOL. 69
Yuan Li, et. al.Yuan Li ... Ke Wang
01 Jul 2022
IEEE Transactions on Circuits and Systems I: Regular Papers | VOL. 69

A Framework for Distributed Deep Neural Network Training with Heterogeneous Computing Platforms
Bontak Gu ... Arslan Munir
-
Bontak Gu, et. al.Bontak Gu ... Arslan Munir
01 Dec 2019
01 Dec 2019

Parallelizing DNN Training on GPUs: Challenges and Opportunities
Weizheng Xu ... Xulong Tang
-
Weizheng Xu, et. al.Weizheng Xu ... Xulong Tang
19 Apr 2021
19 Apr 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ScaleDNN: Data Movement Aware DNN Training on Multi-GPU

Abstract

Talk to us

Similar Papers