Dealing with Partial Observations in Dynamic Spectrum Access: Deep Recurrent Q-Networks

J Yu,Y Xu,R.M Buehrer

doi:10.1109/milcom.2018.8599697

Abstract

This paper investigates the use of deep reinforcement learning (DRL) to solve a fundamental problem in dynamic spectrum access. Specifically, we develop a technique to allow a secondary radio node to share the spectrum with multiple existing (primary) radio nodes in the presence of partial observations and no a priori knowledge of the primary nodes' behaviors. There are multiple discrete channels shared by the radio nodes, and the secondary node cannot communicate with the primary nodes, but instead must base its spectrum access decisions on partial observations of the spectrum at each time step. The secondary radio node's objective is to maximize its own long-term expected number of successful transmissions (i.e., probability of full spectrum utilization) while minimizing collisions using limited feedback. The problem is formulated as a Partially-Observable Markov Decision Process (POMDP) with unknown system dynamics. In order to overcome the challenge of an unknown environment combined with partial observations, we apply a specific DRL approach: the Deep Recurrent Q Network (DRQN). We examine two partial observation patterns along with full observations in order to gain insight into the impact partial observations. Further, we compare previously proposed deep Q-networks (DQN) with the DRQN approach and demonstrate that the proposed DRQN approach can overcome the limitations of DQN-based solutions. We examine four specific scenarios which involve different types of primary nodes and show that the proposed DRQN technique can learn to avoid collisions and achieve near optimal performance.

Full Text