This letter investigates a deep reinforcement learning (DRL)-based spectrum access scheme for device-to-device (D2D) communication underlay cellular networks. Specifically, cellular users (CUEs) and D2D pairs attempt to access the time slots (TSs) of a shared spectrum, and TSs are dynamically scheduled to CUEs in different frames. Based on the DRL theory, D2D pairs can be seen as a centralized agent which aims to learn an optimal spectrum access strategy to maximize the sum throughput without any prior information. In particular, with different locations of CUEs, the spectrum access manners for D2D communication are changed to ensure the communication quality of CUEs at the cell edge. Then, a double deep Q-network (DDQN) based D2D spectrum access (D <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">4</sup> SA) algorithm is proposed, which makes D2D pairs learn to decide whether to access the spectrum in different TSs. Moreover, to ensure the fairness of resource allocation among D2D pairs, we improve the proposed algorithm and incorporate fairness into the objective function. Simulation results show that our proposed algorithm can achieve an optimal sum throughput close to the theoretical upper bound, where the performance is significantly improved compared to the scheme based on base station cooperation.
Read full abstract