Fast-DRD: Fast decentralized reinforcement distillation for deadline-aware edge computing

Shinan Song,Zhiyi Fang,Jingyan Jiang

doi:10.1016/j.ipm.2021.102850

Abstract

Edge computing has recently gained momentum as it provides computing services for mobile devices through high-speed networks. In edge computing system optimization, deep reinforcement learning(DRL) enhances the quality of services(QoS) and shorts the age of information(AoI). However, loosely coupled edge servers saturate a noisy data space for DRL exploration, and learning a reasonable solution is enormously costly. Most existing works assume that the edge is an exact observation system and harvests well-labeled data for the pretraining of DRL neural networks. However, this assumption stands in opposition to the motivation of driving DRL to explore unknown information and increases the scheduling and computing costs in large-scale dynamic systems. This article leverages DRL with a distillation module to drive learning efficiency for edge computing with partial observation. We formulate the deadline-aware offloading problem as a decentralized partially observable Markov decision process (Dec-POMDP) with distillation, called fast decentralized reinforcement distillation(Fast-DRD). Each edge server decides makes offloading decisions in accordance with its own observations and learning strategies in a decentralized manner. By defining trajectory observation history(TOH) distillation and trust distillation to avoid overfitting, Fast-DRD learns a suitable offloading model in a noisy partially observed edge system and reduces the cost for communication among servers. Finally, experimental simulations are presented to evaluate and compare the effectiveness and complexity of Fast-DRD.

Full Text