Deep Reinforcement Learning for Energy-Efficient Computation Offloading in Mobile-Edge Computing

Huan Zhou,Xiuhua Li,Xuxun Liu,Kai Jiang,Victor C M Leung

doi:10.1109/jiot.2021.3091142

Abstract

Mobile-edge computing (MEC) has emerged as a promising computing paradigm in the 5G architecture, which can empower user equipments (UEs) with computation and energy resources offered by migrating workloads from UEs to the nearby MEC servers. Although the issues of computation offloading and resource allocation in MEC have been studied with different optimization objectives, they mainly focus on facilitating the performance in the quasistatic system, and seldomly consider time-varying system conditions in the time domain. In this article, we investigate the joint optimization of computation offloading and resource allocation in a dynamic multiuser MEC system. Our objective is to minimize the energy consumption of the entire MEC system, by considering the delay constraint as well as the uncertain resource requirements of heterogeneous computation tasks. We formulate the problem as a mixed-integer nonlinear programming (MINLP) problem, and propose a value iteration-based reinforcement learning (RL) method, named <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> -Learning, to determine the joint policy of computation offloading and resource allocation. To avoid the curse of dimensionality, we further propose a double deep <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> network (DDQN)-based method, which can efficiently approximate the value function of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> -learning. The simulation results demonstrate that the proposed methods significantly outperform other baseline methods in different scenarios, except the exhaustion method. Especially, the proposed DDQN-based method achieves very close performance with the exhaustion method, and can significantly reduce the average of 20%, 35%, and 53% energy consumption compared with offloading decision, local first method, and offloading first method, respectively, when the number of UEs is 5.

Full Text