Abstract

We propose a dynamic spectrum access scheme where secondary users cooperatively recommend "goodâ channels to each other and access accordingly. We formulate the problem as an average reward-based Markov decision process. We show the existence of the optimal stationary spectrum access policy and explore its structure properties in two asymptotic cases. Since the action space of the Markov decision process is continuous, it is difficult to find the optimal policy by simply discretizing the action space and use the policy iteration, value iteration, or Q-learning methods. Instead, we propose a new algorithm based on the model reference adaptive search method and prove its convergence to the optimal policy. Numerical results show that the proposed algorithms achieve up to 18 and 100 percent performance improvement than the static channel recommendation scheme in homogeneous and heterogeneous channel environments, respectively, and is more robust to channel dynamics.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call