Multi-Agent Reinforcement Learning Resources Allocation Method Using Dueling Double Deep Q-Network in Vehicular Networks

Yuxin Ji,Haitao Zhao,Hikmet Sari,Fumiyuki Adachi,Haris Gacanin,Yu Wang,Guan Gui

doi:10.1109/tvt.2023.3275546

Abstract

The communication between vehicle-to-vehicle (V2V) with high frequency, group sending, group receiving and periodic lead to serious collision of wireless resources and limited system capacity, and the rapid channel changes in high mobility vehicular environments preclude the possibility of collecting accurate instantaneous channel state information at the base station for centralized resource management. In addition, road and traffic safety have very strict requirements for low-latency, high-reliability communication, and information transmission is only valuable within a certain range, and requires timeliness. For the Internet of Vehicles (IoV), it is a fundamental challenge to achieve low latency and high reliability communication for real-time data interaction over short distances in a complex wireless propagation environment, as well as to attenuate and avoid inter-vehicle interference through a reasonable spectrum allocation. To solve the above problems, this paper proposes a resource allocation (RA) method using dueling double deep-Q network reinforcement learning (RL) with low-latitude fingerprints and soft-update architecture (D3QN-LS) while constructing a multi-agent model based on a Manhattan grid layout urban virtual environment, with communication links between V2V links acting as agents to reuse vehicle-to-infrastructure (V2I) spectrum resources. These agents work in concert, interact with the environment, receive appropriate observations, rewards, and ultimately learn to improve power and spectrum allocation to provide a better entertainment experience and safer driving environment for users. We also address the shortcomings in existing literature studies. Firstly, the low upper limit of transmitted data volume; Secondly, the artificial assumption that the spectrum is consistent with the number of V2V links, which is insufficient for future applications in spectrum shortage sections. We accordingly extend the amount of transmitted data while adding a scenario where the spectrum resources are relatively short, i.e., the number of V2V links is significantly larger than the amount of spectrum. Experimental results show that with proper training mechanism and reward function construction, multiple intelligence can cooperate effectively. Therefore, the total link capacity of the V2I link and the success rate of periodic security information transmission are further improved.

Full Text