H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training

Lintao Xian,David H C Du,Zhongwen Guo,Bingzhe Li,Jing Liu

doi:10.1109/access.2021.3060154

Lintao Xian, David H C Du + Show 3 more

Open Access

PDF Available

https://doi.org/10.1109/access.2021.3060154

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Deep neural networks have become one of the popular techniques used in many research and application areas including computer vision, natural language processing, etc. As the complexity of neural networks continuously increasing, the training process takes a much longer time and requires more computation resources. To speed up the training process, a centralized distributed training structure named Parameter Server (PS) is widely used to assign training tasks to different workers/nodes. Most existing studies considered all workers having the same computation resources. However, in a heterogeneous environment, fast workers (i.e., workers having more computation resources) can complete tasks earlier than slow workers and thus the system does not fully utilize the resources of fast workers. In this paper, we propose a PS model with heterogeneous types of workers/nodes, called H-PS, which can fully utilize the resources of each worker by dynamically scheduling tasks based on the current status of the workers (e.g., available memory). By doing so, the workers will complete their tasks at the same time and the stragglers (i.e., workers fall behind others) can be avoided. In addition, a pipeline scheme is proposed to further improve the effectiveness of workers by fully utilizing the resources of workers during the time of parameters transmitting between PS and workers. Moreover, a flexible quantization scheme is proposed to reduce the communication overhead between the PS and workers. Finally, the H-PS is implemented using Containers which is an emerging lightweight technology. The experimental results indicate that the proposed H-PS can reduce the overall training time by 1.4x – 3.5x when compared with existing methods.

Highlights

In recent years, deep neural networks have become one of the most popular techniques which are successfully applied in many research and application fields including computer vision, natural language processing, systems management, Internet of Things (IoT), and etc. [1], [2]
We propose a heterogeneous-aware parameter server model, which focuses on speeding up the training process of deep neural networks in a heterogeneous environment
To improve the training performance of the Parameter Server (PS) system, the proposed scheme is designed from three aspects: 1) Dynamically allocate workloads according to the current computing capacities of the workers; 2) Keep workers training during the period of parameter communication to fully utilize the system resources; and 3) Apply flexible quantized parameters according to the change of accuracy in the training process to reduce the total amount of communication data

Summary

INTRODUCTION

Deep neural networks have become one of the most popular techniques which are successfully applied in many research and application fields including computer vision, natural language processing, systems management, Internet of Things (IoT), and etc. [1], [2]. The workers focus on training tasks such as forward and backward propagation The distributed systems such as Spark [8], GraphX [9] and MLlib [10] assume that all machines are identical (i.e., having the same configuration including the same size of memory). In other words, they train neural networks in a homogeneous environment. The other is that the systems may have different hardware configurations, such as different numbers of CPUs, variable memory sizes, and dynamic changing network bandwidths If these systems/workers are assigned with the same workloads, the workers with more available resources (denoted as fast workers) can complete their tasks faster than those with less available resources (denoted as slow workers).

BACKGROUND

PIPELINE COMMUNICATION AND COMPUTATION

DYNAMIC QUANTIZATION PARAMETER

EXPERIMENTAL RESULTS

RELATED WORK

CONCLUSION

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

GenSyth: a new way to understand deep learning
Alexander Wong ... Francis Li
Electronics Letters | VOL. 55
Alexander Wong, et. al.Alexander Wong ... Francis Li
01 Sep 2019
Electronics Letters | VOL. 55

SHAT: A Novel Asynchronous Training Algorithm That Provides Fast Model Convergence in Distributed Deep Learning
Yunyong Ko ... Sang-Wook Kim
Applied Sciences | VOL. 12
Yunyong Ko, et. al.Yunyong Ko ... Sang-Wook Kim
29 Dec 2021
Applied Sciences | VOL. 12

Understanding adversarial attack and defense towards deep compressed neural networks
Qi Liu ... Wujie Wen
-
Qi Liu, et. al.Qi Liu ... Wujie Wen
03 May 2018
03 May 2018

Simultaneously improving accuracy and computational cost under parametric constraints in materials property prediction tasks
Vishu Gupta ... Ankit Agrawal
Journal of Cheminformatics | VOL. 16
Vishu Gupta, et. al.Vishu Gupta ... Ankit Agrawal
16 Feb 2024
Journal of Cheminformatics | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network Training

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access