In this paper, we deal with the problem of opportunistic spectrum access in infrastructure-less cognitive networks. Each secondary user (SU) Tx is allowed to select one frequency channel at each transmission trial. We assume that there is no information exchange between SUs, and they have no knowledge of channel quality, availability, and other SUs actions, hence, each SU selfishly tries to select the best band to transmit. This particular problem is designed as a multi-user restless Markov multi-armed bandit problem, in which multiple SUs collect a priori unknown reward by selecting a channel. The main contribution of the paper is to propose an online learning policy for distributed SUs, that takes into account not only the availability criterion of a band but also a quality metric linked to the interference power from the neighboring cells experienced on the sensed band. We also prove that the policy, named distributed restless QoS-UCB, achieves at most logarithmic order regret, for a single-user in a first time and then for multi-user in a second time. Moreover, studies on the achievable throughput, average bit error rate obtained with the proposed policy are conducted and compared to well-known reinforcement learning algorithms.
Read full abstract