Abstract

In this paper, we investigate the joint sub-channel and power allocation problem for cellular vehicle-to-everything (V2X) communications, where multiple vehicle-to-infrastructure (V2I) users share the spectrum resources with vehicle-to-vehicle (V2V) users. In particular, a novel channel state information (CSI)-independent decentralized algorithm based on multi-agent reinforcement learning (MARL) is proposed to maximize the sum throughput of V2I links while meeting the latency and reliability requirements of V2V links. Specifically, we implement the individual double dueling deep recurrent Q-networks (D3RQN) and the carefully designed common reward to train the implicitly collaborative agents, through which, each agent optimizes the policy individually based solely on local CSI-independent observations. To handle the non-stationarity induced by multi-agent concurrent learning, we incorporate hysteretic Q-learning and concurrent experience replay trajectory (CERT) to stabilize the training process. Besides, we incorporate the approximate regretted reward (ARR) to alleviate the unstable reward estimation problem caused by shifting environment dynamics. Simulation results demonstrate that the proposed algorithm outperforms the baselines and can achieve close performance compared with the centralized Brute-force method. Furthermore, the proposed CSI-independent design exhibits comparable performance as the CSI-involved version, which sheds some light on the potential to further reduce the signalling overhead of machine learning-based vehicular communication systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call