Abstract

We consider the problem of dynamic multichannel access for transmission maximization in multiuser wireless communication networks. The objective is to find a multiuser strategy that maximizes global channel utilization with a low collision in a centralized manner without any prior knowledge. Obtaining an optimal solution for centralized dynamic multichannel access is an extremely difficult problem due to the large-state and large-action space. To tackle this problem, we develop a centralized dynamic multichannel access framework based on double deep recurrent Q-network. The centralized node first maps current state directly to channel assignment actions, which can overcome prohibitive computation compared with reinforcement learning. Then, the centralized node can be easy to select multiple channels by maximizing the sum of value functions based on a trained neural network. Finally, the proposed method avoids collisions between secondary users through centralized allocation policy.

Highlights

  • With the rapid development of generation network technologies such as the mobile Internet and the Internet of Things (IoT), spectrum scarcity has been severe

  • We mainly focus on overlay dynamic spectrum access (DSA) models

  • In the proposed double deep recurrent Q-network (DDRQN) algorithm, the centralized node is considered as an agent which consists of two neural networks: online network and target network

Read more

Summary

Introduction

With the rapid development of generation network technologies such as the mobile Internet and the Internet of Things (IoT), spectrum scarcity has been severe. We consider an overlay DSA environment with multiple PUs, multiple secondary users (SUs), and a centralized node which can be able to detect all channel state at the current time and allocate a channel to each SU for transmitting data during the time This is a coordinated multichannel access problem of independent channels in a fully observable scenario. We assume the centralized node has cognitive ability that could be able to exploit time-domain holes of channels and improve spectrum utilization efficiency in an unknown environment For this purpose, reinforcement learning (RL), especially Markov Decision Process (MDPs), is one potential solution due to good decision performance [3].

Related Work
System Model and Problem Statement
Implementation of Q-Learning and Deep Reinforcement Learning
Online Learning
Simulation Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call