This paper addresses the rendezvous problem in asymmetric cognitive radio networks (CRNs) by proposing a novel reinforcement learning (RL)-based channel-hopping algorithm. Traditional methods like the jump-stay (JS) algorithm, while effective, often struggle with high time-to-rendezvous (TTR) in asymmetric scenarios where secondary users (SUs) have varying channel availability. Our proposed RL-based algorithm leverages the actor-critic policy gradient method to learn optimal channel selection strategies by dynamically adapting to the environment and minimizing TTR. Extensive simulations demonstrate that the RL-based algorithm significantly reduces the expected TTR (ETTR) compared to the JS algorithm, particularly in asymmetric scenarios where M-sequence-based approaches are less effective. This suggests that RL-based approaches not only offer robustness in asymmetric environments but also provide a promising alternative in more predictable settings.
Read full abstract