Multi-Agent Power and Resource Allocation for D2D Communications: A Deep Reinforcement Learning Approach

Honglin Xiang,Lingjie Li,Jingyi Peng,Yang Yang,Zhen Gao

doi:10.1109/vtc2022-fall57202.2022.10012889

Abstract

The explosion in the number of smartphones and wearable devices brings the challenge of high achievable rate (AR) requirement, and D2D communications become the critical technology to solve this challenge. However, the co-channel interference caused by spectrum reusing and low delay requirement restrict D2D communications performance improvements. In this paper, we consider the cases of no time delay constraint and time delay constraint respectively, and design a joint power control and resource allocation scheme based on deep reinforcement learning (DRL) to maximize the AR of cellular users (CUEs) and D2D users (DUEs). Specifically, D2D pairs are considered multiple agents for reusing CUE spectrum, each agent can independently select spectrum resources and power without any prior information to ease interference. Furthermore, a double deep Q-network with priority sampling (Pr-DDQN) distributed algorithm is proposed, which helps agents to learn more dominant features during experience replay. Simulation results indicate that Pr-DDQN algorithm can obtain a higher AR than the present DRL algorithms. In particular, the probability of selecting low power of agents enlarges as the increase of the remaining transmission time, which demonstrates that the agents can successfully learn and perceive the implicit relationship of time delay constraint.

Full Text