Abstract

Opportunistic spectrum access (OSA) is envisioned to support the spectrum demand of future-generation wireless networks. The majority of existing work assumed independent primary channels with the knowledge of network dynamics. However, the channels are usually correlated and network dynamics is unknown a-priori . This entails a great challenge on the sensing policy design for spectrum opportunity tracking, and the conventional partially observable Markov decision process (POMDP) formulation with model-based solutions are generally inapplicable. In this paper, we take a different approach, and formulate the sensing policy design as a time-series POMDP from a model-free perspective. To solve this time-series POMDP, we propose a novel Gaussian process reinforcement learning (GPRL) based solution. It achieves accurate channel selection and a fast learning rate. In essence, GP is embedded in RL as a Q-function approximator to efficiently utilize the past learning experience. A novel kernel function is first tailor designed to measure the correlation of time-series spectrum data. Then a covariance-based exploration strategy is developed to enable a proactive exploration for better policy learning. Finally, for GPRL to adapt to multichannel sensing, we propose a novel action-trimming method to reduce the computational cost. Our simulation results show that the designed sensing policy outperforms existing ones, and can obtain a near-optimal performance within a short learning phase.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call