Abstract

The explosion in the number of smartphones and wearable devices brings the challenge of high achievable rate (AR) requirement, and D2D communications become the critical technology to solve this challenge. However, the co-channel interference caused by spectrum reusing and low delay requirement restrict D2D communications performance improvements. In this paper, we consider the cases of no time delay constraint and time delay constraint respectively, and design a joint power control and resource allocation scheme based on deep reinforcement learning (DRL) to maximize the AR of cellular users (CUEs) and D2D users (DUEs). Specifically, D2D pairs are considered multiple agents for reusing CUE spectrum, each agent can independently select spectrum resources and power without any prior information to ease interference. Furthermore, a double deep Q-network with priority sampling (Pr-DDQN) distributed algorithm is proposed, which helps agents to learn more dominant features during experience replay. Simulation results indicate that Pr-DDQN algorithm can obtain a higher AR than the present DRL algorithms. In particular, the probability of selecting low power of agents enlarges as the increase of the remaining transmission time, which demonstrates that the agents can successfully learn and perceive the implicit relationship of time delay constraint.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call