Abstract

In this paper we propose a novel Bayesian network based model for analysing convergence properties of reinforcement learning (RL) based dynamic spectrum access (DSA) algorithms. It uses a minimum complexity DSA problem for probabilistic analysis of the joint policy transitions of RL algorithms. A Monte Carlo simulation of a distributed Q-learning DSA algorithm shows that the proposed approach exhibits remarkable accuracy of predicting convergence behaviour of such algorithms. Furthermore, their behaviour can also be expressed in the form of an absorbing Markov chain, derived from the novel Bayesian network model. This representation enables further theoretical analysis of convergence properties of RL based DSA algorithms. The main benefit of the analysis tool presented in this paper is that it enables the design and theoretical evaluation of novel DSA schemes by extending the proposed Bayesian network model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call