A DRL Agent for Jointly Optimizing Computation Offloading and Resource Allocation in MEC

Juan Chen,Zhiwen Xiao,Tao Tao,Lexi Xu,Huanlai Xing

doi:10.1109/jiot.2021.3081694

Abstract

This article studies the joint optimization problem of computation offloading and resource allocation (JCORA) in mobile-edge computing (MEC). Deep reinforcement learning (DRL) is one of the ideal techniques for addressing the dynamic JCORA problem. However, it is still challenging to adapt traditional DRL methods for the problem since they usually lead to slow and unstable convergence in model training. To this end, we propose a temporal attentional deterministic policy gradient (TADPG) to tackle JCORA. Based on the deep deterministic policy gradient (DDPG), TADPG has two significant features. First, a temporal feature extraction network consisting of a 1-D convolution (Conv1D) residual block and an attentional long short-term memory (LSTM) network is designed, which is beneficial to high-quality state representation and function approximation. Second, a rank-based prioritized experience replay (rPER) method is devised to accelerate and stabilize the convergence of model training. Experimental results demonstrate that the decentralized TADPG-based mechanism can achieve more efficient JCORA performance than the centralized one, and the proposed TADPG outperforms a number of state-of-the-art DRL agents in terms of the task completion time and energy consumption.

Full Text