Abstract

Wireless powered mobile edge computing (MEC) networks, where wireless devices (WDs) are allowed to offload parts of computation-intensive tasks to remote servers and charge the built-in batteries over the air, have been envisaged as a promising technology to ensure the ultra-low-power requirement and enhance the continuous work capacity of WDs. However, when multiple WDs coexist in the network, it is non-trivial to minimize the total tasks delay since the optimization variables are intrinsically coupled. Even more, channels are dynamically varying from time to time and the tasks are unpredictable, which aggravates the difficulty to obtain the closed-form solution. Although reinforcement learning (RL) has been proved to be effective for such complex optimization problems, there is still the challenge that the training of neural networks is time-consuming. This paper considers a challenging hybrid tasks offloading scenario, where offloading tasks can be partially executed locally and remotely in parallel, and each WD is endowed to take both the active RF-transmission and passive backscatter communication (BackCom) for remote tasks offloading. Furthermore, a game-combined multi-agent deep deterministic policy gradient (MADDPG) algorithm is proposed to minimize the total tasks delay with the fairness consideration of multiple WDs, i.e., potential game for offloading decision and MADDPG for time scheduling and harvested energy splitting. The introduction of potential game which can be proved to converge with finite iterations, helps to accelerate the training and reduce the computation complexity. Equipped with the feature of ‘centralized training with decentralized execution,’ once well trained, each agent in MADDPG can figure out the proper time scheduling and harvested energy splitting independently without sharing information with others. Besides the unilateral contention among WDs for the offloading decision by potential game, a fully decentralized framework is finally designed for the proposed algorithm. Numerical results demonstrate that the game-combined MADDPG algorithm can achieve the near-optimal performance compared with existing centralized approaches, and reduce the convergence time compared with other no-game learning approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call