Abstract

In this paper, a multi-agent deep reinforcement learning method was adopted to realize cooperative spectrum sensing in cognitive radio networks. Each secondary user learns an efficient sensing strategy from the sensing results of some of the selected spectra to avoid interference to the primary users and to coordinate with other secondary users. It is necessary to balance exploration and exploitation in the learning process when using deep reinforcement learning methods, helping explain that upper confidence bound with Hoeffding-style bonus has been adopted in this paper to improve the efficiency of exploration. The simulation results verify that the proposed algorithm, when compared with the conventional reinforcement learning methods with <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\varepsilon $ </tex-math></inline-formula> -greedy exploration, is much easier to achieve faster convergence speed and better reward performance.

Highlights

  • The rapid development of wireless communication has been accompanied by spectrum resources which are becoming scarcer

  • If the primary users (PUs) start to use their allocated spectra, the secondary users (SUs) need to vacate these spectra immediately, thereby making it necessary for the SUs to sense the changing idle spectra caused by the actions of the PUs, suggesting the importance of the spectrum sensing technology to establish cognitive radio (CR) networks [4]

  • COOPERATIVE SPECTRUM SENSING ALGORITHM BASED ON REINFORCEMENT LEARNING WITH upper confidence bounds with Hoeffding-style (UCB-H) we focus on the first slot structure as depicted in Section II and propose an algorithm to achieve superior cooperative spectrum sensing performance in distributed SUs

Read more

Summary

INTRODUCTION

The rapid development of wireless communication has been accompanied by spectrum resources which are becoming scarcer. Reference [17] implements a distributed Q-learning based spectrum sensing algorithm in which each SU regards the behavior of other SUs as parts of the environment It is essential for Q-learning to store the estimated values of the cumulative discounted reward (which are usually called as Q-values) of every state-action pairs. COOPERATIVE SPECTRUM SENSING ALGORITHM BASED ON REINFORCEMENT LEARNING WITH UCB-H we focus on the first slot structure as depicted in Section II and propose an algorithm to achieve superior cooperative spectrum sensing performance in distributed SUs. We denote the set of states and actions by S and A whose size are |S| and |A|, respectively. COOPERATIVE SPECTRUM SENSING ALGORITHM BASED ON DQN WITH UCB-H There may be numerous spectra in practical CR networks. The action with the largest Q-value is selected after Ts slots

SIMULATION AND ANALYSIS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call