Distributed Training for Deep Learning Models On An Edge Computing Network Using Shielded Reinforcement Learning

Tanmoy Sen,Haiying Shen

doi:10.1109/icdcs54860.2022.00062

Abstract

With the emergence of edge devices along with their local computation advantage over the cloud, distributed deep learning (DL) training on edge nodes becomes promising. In such a method, the cluster head of a cluster of edge nodes schedules all the DL training jobs from the cluster nodes. Using such a centralized scheduling method, the cluster head knows all the loads of the cluster nodes, which can avoid overloading the cluster nodes, but the head itself may become overloaded. To handle this problem, we first propose a multi-agent RL (MARL) system that enables each edge node to schedule its own jobs using RL. However, without the coordination between the nodes, action collision may occur, in which multiple nodes may schedule tasks to the same node and make it overloaded. To avoid these problems, we propose a system called Shielded ReinfOrcement learning (RL) based DL training on Edges (SROLE). In SROLE, each edge node schedules its own jobs using multi-agent RL. The shield deployed in a node checks action collisions and provides alternative actions to avoid the collisions. As the central shield node for the entire cluster may become a bottleneck, we further propose a decentralized shielding method, in which different shields are responsible for different regions in the cluster and they coordinate to avoid action collisions on the region boundaries. Our container-based emulation experiments show that SROLE reduces training time by up to 59% with 29% lower median resource utilization and reduces the number of action collisions by up to 48% compared to multi-agent RL and the centralized RL. Our real device experiments show that SROLE still reduces the training time by up to 53% with 28% lower median resource utilization than multi-agent RL and the centralized RL.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distributed Training for Deep Learning Models On An Edge Computing Network Using Shielded Reinforcement Learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Review of the progress of communication-based multi-agent reinforcement learning
涵王 ... 扬俞
SCIENTIA SINICA Informationis | VOL. 52
涵王, et. al.涵王 ... 扬俞
01 May 2022
SCIENTIA SINICA Informationis | VOL. 52

A multi-agent system integrating reinforcement learning, bidding and genetic algorithms
...
-
, et. al. ...
01 Dec 2003
01 Dec 2003

Assured Deep Multi-Agent Reinforcement Learning for Safe Robotic Systems
Joshua Riley ... Radu Calinescu
-
Joshua Riley, et. al.Joshua Riley ... Radu Calinescu
01 Jan 2021
01 Jan 2021

Deep Reinforcement Learning for User Association and Resource Allocation in Heterogeneous Cellular Networks
Nan Zhao ... Dusit Niyato
IEEE Transactions on Wireless Communications | VOL. 18
Nan Zhao, et. al.Nan Zhao ... Dusit Niyato
01 Nov 2019
IEEE Transactions on Wireless Communications | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed Training for Deep Learning Models On An Edge Computing Network Using Shielded Reinforcement Learning

Abstract

Talk to us

Similar Papers