HeterPS: Distributed deep learning with reinforcement learning based scheduling in heterogeneous environments

Ji Liu,Zhihua Wu,Danlei Feng,Minxu Zhang,Xinxuan Wu,Xuefeng Yao,Dianhai Yu,Yanjun Ma,Feng Zhao,Dejing Dou

doi:10.1016/j.future.2023.05.032

Abstract

Deep neural networks (DNNs) exploit many layers and a large number of parameters to achieve excellent performance. The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high Input/Output (IO) cost, while some layers are compute-intensive. The training process generally exploits distributed computing resources to reduce training time. While heterogeneous computing resources, e.g., CPUs, GPUs of multiple types, are available for the distributed training process, the scheduling of multiple layers to diverse computing resources remains critical for the training process. To efficiently train a DNN model using the heterogeneous computing resources, we propose a distributed framework, i.e., Heterogeneous Parameter Server (HeterPS), composed of a distributed architecture and a Reinforcement Learning (RL)-based scheduling method. The advantages of HeterPS are three-fold compared with existing frameworks. First, HeterPS enables efficient training process of diverse workloads with heterogeneous computing resources. Second, HeterPS exploits an RL-based method to efficiently schedule the workload of each layer to appropriate computing resources to minimize the cost while satisfying throughput constraints. Third, HeterPS manages data storage and data communication among distributed computing resources. We carry out extensive experiments to show that HeterPS significantly outperforms state-of-the-art approaches in terms of throughput (14.5 times higher) and monetary cost (312.3% smaller).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HeterPS: Distributed deep learning with reinforcement learning based scheduling in heterogeneous environments

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems

Lead the way for us

Journal: Future Generation Computer Systems	Publication Date: Jun 2, 2023
Citations: 20

Similar Papers

A novel robust black-box fingerprinting scheme for deep classification neural networks
Mouke Mo ... Xinpeng Zhang
Expert Systems With Applications | VOL. 252
Mouke Mo, et. al.Mouke Mo ... Xinpeng Zhang
14 May 2024
Expert Systems With Applications | VOL. 252

A comparative evaluation of deep convolutional neural network and deep neural network-based land use/land cover classifications of mining regions using fused multi-sensor satellite data
Ajay Kumar ... Amit Kumar Gorai
Advances in Space Research | VOL. 72
Ajay Kumar, et. al.Ajay Kumar ... Amit Kumar Gorai
04 Sep 2023
Advances in Space Research | VOL. 72

Solution-Phase DNA-Compatible Pictet-Spengler Reaction Aided by Machine Learning Building Block Filtering.
Ke Li ... Sixiu Liu
iScience | VOL. 23
Ke Li, et. al.Ke Li ... Sixiu Liu
07 May 2020
iScience | VOL. 23

Robustness analysis and experimental validation of a deep neural network for acoustic source imaging
Qing Li ... Yu Liu
Mechanical Systems and Signal Processing | VOL. 216
Qing Li, et. al.Qing Li ... Yu Liu
04 May 2024
Mechanical Systems and Signal Processing | VOL. 216

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HeterPS: Distributed deep learning with reinforcement learning based scheduling in heterogeneous environments

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems