Abstract

The problem of how to evaluate the rate of convergence to Nash equilibrium solutions in the process of channel selection under incomplete information is studied. In this paper, the definition of regret is used to reflect the convergence rates of online algorithms. The process of selecting an idle channel for each secondary user is modeled as a multi-channel bandit game. The definition of the maximal averaged regret is given. Two existing online learning algorithms are used to obtain the Nash equilibrium for each SU. The maximal averaged regrets are used to evaluate the performances of online algorithms. When there is a pure strategy Nash equilibrium in the multi-channel bandit game, the maximal averaged regrets are finite. A cooperation mechanism is also needed in the process of calculating the maximal averaged regrets. Simulation results show the maximal averaged regrets are finite and the online algorithm with greater convergence rate has less maximal averaged regrets.

Highlights

  • With the emergences of new wireless services and applications, the demand for spectrum increases

  • 5: Cooperation begin From n=1 to N, each secondary users (SUs) n selects channel m from the set of channels M according to the predetermined order and the selection strategy of other

  • The performances of online learning algorithms are evaluated

Read more

Summary

Introduction

With the emergences of new wireless services and applications, the demand for spectrum increases. Due to the stochastic nature of cognitive radio networks (CRNs), the real primary traffic distribution is not always stationary Under this scenario, the channel set selection problem is analyzed in [4]. These works claim that NE can be found under incomplete information, they need more information than our model because the potential functions will be given before searching the NE These works do not consider the convergence rate of the online algorithm. The channel selection problem with incomplete information is modeled as a multi-channel bandit game from the view of bandit models The MAR is used to evaluate the convergence rates of online algorithms

System Model and Problem Formulation
Online Learning Scheme
Online Learning Algorithm Based on Exp3
Stochastic Learning Algorithm
5: Cooperation begin
The Relationship between NE and MAR
Simulation Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call