Knowledge Transfer for On-Device Deep Reinforcement Learning in Resource Constrained Edge Computing Systems

Ingook Jang,Young-Sung Son,Seonghyun Kim,Hyunseok Kim,Donghun Lee

doi:10.1109/access.2020.3014922

Abstract

Deep reinforcement learning (DRL) is a promising approach for developing control policies by learning how to perform tasks. Edge devices are required to control their actions by exploiting DRL to solve tasks autonomously in various applications such as smart manufacturing and autonomous driving. However, the resource limitations of edge devices make it unfeasible for them to train their policies from scratch. It is also impractical for such an edge device to use the policy with a large number of layers and parameters, which is pre-trained by a centralized cloud infrastructure with high computational power. In this paper, we propose a method, on-device DRL with distillation (OD3), to efficiently transfer distilled knowledge of how to behave for on-device DRL in resource-constrained edge computing systems. Our proposed method makes it possible to simultaneously perform knowledge transfer and policy model compression in a single training process on edge devices with considering their limited resource budgets. The novelty of our method is to apply a knowledge distillation approach to DRL based edge device control in integrated edge cloud environments. We analyze the performance of the proposed method by implementing it on a commercial embedded system-on-module equipped with limited hardware resources. The experimental results show that 1) edge policy training with the proposed method achieves near-cloud-performance in terms of average rewards, although the size of the edge policy network is significantly smaller compared to that of the cloud policy network and 2) the training time elapsed for edge policy training with our method is reduced significantly compared to edge policy training from scratch.

Highlights

Today, the Internet of Things (IoT) is used for a wide range of industrial applications, including smart cities [1], autonomous transportation [2], urban surveillance [3], and intelligent manufacturing [4]
This paper investigates a method to efficiently transfer distilled knowledge of how to behave for edge device control using Deep Reinforcement Learning (DRL) in resource-constrained edge computing systems, where a huge number of various edge devices are connected through cloud infrastructure
We demonstrate the advantages of the OD3 for resource-constrained edge computing systems

Summary

INTRODUCTION

The Internet of Things (IoT) is used for a wide range of industrial applications, including smart cities [1], autonomous transportation [2], urban surveillance [3], and intelligent manufacturing [4]. The available hardware resources of the edge devices may not be sufficient to execute inference tasks (i.e., action prediction) This is why it is hard for edge devices to handle the pre-trained policies transferred from the cloud systems, which tend to be large and deep with dozens of hidden layers and millions of neurons. This paper investigates a method to efficiently transfer distilled knowledge of how to behave for edge device control using Deep Reinforcement Learning (DRL) in resource-constrained edge computing systems, where a huge number of various edge devices are connected through cloud infrastructure. On-device DRL with distillation (OD3), makes it possible to simultaneously perform knowledge transfer and policy model compression in a single training process on edge devices with considering their limited resource budgets.

REINFORCEMENT LEARNING

KNOWLEDGE DISTILLATION

ALGORITHM DETAILS

EVALUATION