In the context of wireless systems going forward, particularly in the Beyond 5G (B5G) era, where high data rates and low latency are critical, D2D communication is a pivotal technology that has many advantages. In D2D communication, the allocation of resources plays a critical role in achieving higher throughput while ensuring interference management, improved spectrum and energy efficiency, and system fairness. Conventional resource allocation methodologies encounter challenges in dynamically changing and diverse communication environments. To deal with the dynamic and unpredictable character of channel characteristics, we utilize a distributed iterative resource allocation technique based on the reinforcement learning (RL) approach that empowers the system to learn and adapt to the wireless environment autonomously. In this paper, we formulate a distributed Q learning-based RL method as its real-time learning capabilities, reduced communication overhead, and exploration efficiency make it well-suited for adapting to dynamic D2D environments compared to other RL methods. By applying the Q-learning method, the D2D devices act as learning agents striving to maximize the cumulative rewards. Through interactions with the environment and continuous learning from feedback, these agents adapt to real-time resource allocation decisions over time. Comparing our proposed Q-learning method to the state-of-the-art RL techniques, simulation results show improvements in energy and spectrum efficiency, latency, an increase in Jain's fairness index, and improvements in overall system throughput of about 6 %–8 %. The scalability is found to be 1.69 interpreting that Q-learning exhibits a good scalability as the throughput does not abruptly decrease for an increasing number of devices.
Read full abstract