In this paper, we study the channel access problem of vehicles in a cognitive radio vehicular network, where each vehicle opportunistically accesses the channel resources of the primary network in order to successfully receive the necessary data packets within a time deadline. Given the access priority constraint and the limited bandwidth of the primary network, a smart channel connection scheme is indispensable to ensure a decent quality of service (QoS) at the vehicles’ side. Due to the competitive nature of vehicles, the vehicle access control is formulated as a multi-agent access problem that comes with an intrinsic challenge, i.e. the partial observation of the information about the environment dynamics. On top of that, considering the temporal usage profile of the primary network, the environment dynamics are also time-dependant, and hence making the aforementioned access control a non-Markovian problem. Consequently, the estimation of the system states, which are used for the decision making process of a vehicle, is very challenging. To deal with the issues arising from such non-Markovian problem, we propose a vehicle connection algorithm based on a deep recurrent Q-learning network. With the aid of a recurrent Long Short Term Memory (LSTM) layer integrated into a deep Q-network, the time-correlated system states can be properly estimated, thereby improving the vehicle channel access policy. Besides, we introduce novel reward quantities that help improving the network performance and the capability to flexibly adapt to unexplored scenarios. A new structure of the cumulative reward function is also presented to balance the performance trade off between the cooperative and competitive objectives. Simulation results are provided to verify the advantage and the stability of our proposed algorithm over the benchmark schemes.
Read full abstract