Abstract

This paper investigates the use of deep reinforcement learning (DRL) to solve the dynamic spectrum access problem. Specifically, we examine the scenario where multiple discrete channels are shared by different types of nodes which lack the ability to communicate (with other node types) and do not have a priori knowledge of the other nodes' behaviors. Each node's objective is to maximize its own long-term expected number of successful transmissions. The problem is formulated as a Markov Decision Process (MDP) with unknown system dynamics. In order to overcome the challenge of an unknown environment combined with a prohibitively large transition matrix, we apply two specific DRL approaches: The Deep Q Network (DQN) and The Double Deep Q Network (DDQN). Additionally, we also introduce techniques to improve DQNs including an eligibility trace, prior experience and the “guess process”. We first study the proposed DQN approach in a simple environment. The simulations show that both DQN and DDQN can effectively learn different nodes' communication patterns and achieve near optimal performance without prior knowledge. Then, we examine these techniques in a more complex environment and conclude that although different implementation variations obtain different performance, our proposed DQ N nodes can still learn to avoid collisions and achieve near optimal performance even in the more complex scenario.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call