Abstract
Efficient use of spectral resources is critical in wireless networks and has been extensively studied in recent years. Dynamic spectrum access (DSA) is one of the key techniques on utilizing the spectral resources. Among them, reinforcement learning (RL) for DSA has gained great attention due to the excellent performance. Limited by the large state space in RL, obtaining the best solution to the spectrum access problem is often computationally expensive. Besides, it is hard to balance multiple objectives of the reward function in RL. To tackle these problems, we explore deep reinforcement learning in a layered framework and propose a hierarchical deep Q-network (h-DQN) model for DSA. The proposed approach divides the original problem into separate “sub problems”, each of which is solved using its own reinforcement learning agent. This partitioning simplifies each individual problem, enables modularity, and reduces the complexity of the whole optimization process in the multi-objective case. The performance of Q-learning for dynamic sensing(QADS), deep reinforcement learning for dynamic access (DRLDA), and the proposed h-DQN model is evaluated through simulations. The simulation results show that h-DQN yields better performance with the faster convergence and higher channel utilization than the other two compared methods.
Highlights
In wireless networks, spectrum resources are becoming more and more scarcer due to the increasing demand for wireless communication [1]
We propose a dynamic multi-channel sensing model based on the hierarchical deep Q-network (h-DQN), which can address the above issues by combining deep Q-network and hierarchical levels of temporal abstraction
Inspired by the characteristics of the intrinsic behavior, we propose the dynamic multichannel sensing by constructing the hierarchical deep Q-network (h-DQN) framework
Summary
Spectrum resources are becoming more and more scarcer due to the increasing demand for wireless communication [1]. They estimated the channel status of the PU through historical spectrum sensing and decision information, and optimized the spectrum access strategy of the cognitive user to maximize the throughput of the cognitive network [10] Researchers such as Das et al [11] have developed the spectrum sensing technology based on collaborative Q-learning for SUs of self-organizing networks to access the primary channel. The users can only observe the state of the selected channel, and the states of the remaining channels were not fully visible This fact results in an exponential rise of the state space dimension in reinforcement learning. Each user maps its current state to spectrum access actions based on a trained DQN used to maximize the objective function. The DQN-based methods can effectively solve the problem of high-dimensional state space and obtain the good results. The large spatial dimension increases the computational cost and makes the problem difficult to solve
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.