Abstract

We consider the problem of dynamic spectrum access (DSA) in cognitive wireless networks, consisting of primary users (PUs) and secondary users (SUs), where only partial observations are available at the SUs due to narrowband sensing and transmissions. The network operates in a time-slotted regime, where the traffic patterns of the PUs are modeled as finite-memory Markov chains, that are unknown to the SUs. Since observations are partial, then both channel sensing and access actions affect the throughput. Focusing on the case in which there is a single SU, our objective is to maximize the SU’s long-term throughput. To that aim, we develop a novel algorithm that learns <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">both</i> access and sensing policies via deep Q-learning, dubbed Double Deep Q-network for Sensing and Access (DDQSA). To the best of our knowledge, this is the first work that jointly optimizes both sensing and access policies for DSA via deep Q-learning. Next, we consider wireless networks with access policy which implements a fixed channel hopping dynamics, for which we analytically determine the optimal SU sensing and access policy and its associated throughput. Then, we demonstrate that indeed, the proposed DDQSA algorithm can achieve near-optimal performance for the considered network. Our results show that the proposed DDQSA algorithm learns a policy that implements both sensing and channel access, which significantly outperforms existing approaches, and can achieve the optimal performance in certain scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call