Highly dynamic Internet of Vehicles spectrum sharing can share spectrum owned by vehicle-to-infrastructure links through multiple workshop links to achieve efficient resource allocation. Aiming at the problem that the rapid variations in channel states in highly dynamic vehicular environments can make it challenging for base stations to gather and manage information about instantaneous channel states, we present a multi-agent deep reinforcement learning-based V2X spectrum access algorithm. The algorithm is designed to optimize the throughput of V2I user under V2V user delay and reliability constraints, and uses the experience gained from interacting with the communication environment to update the Q network to improve spectrum and power allocation strategies. Implicit collaborative agents are trained through an improved DQN model combined with dueling network architecture and long short-term memory network layers and public rewards. With lagged Q-learning and concurrent experience replay trajectories, the training process was stabilized and the non-stationarity problem caused by concurrent learning of multiple agents was resolved. Simulation results demonstrate that our presented algorithm achieves a mean successful payload delivery rate of 95.89%, which is 16.48% greater than that of the randomized baseline algorithm. Our algorithm obtains approximately the optimal value and shows performance close to the centralized brute force algorithm, which provides a better strategy for further minimizing the signaling overhead of the Internet of Vehicles communication system.
Read full abstract