A Reinforcement Learning Framework for Vehicular Network Routing Under Peak and Average Constraints

Nan Geng,Mingwei Xu,Vaneet Aggarwal,Tian Lan,Chenyi Liu,Qinbo Bai,Yuan Yang

doi:10.1109/tvt.2023.3235946

Nan Geng, Mingwei Xu + Show 5 more

https://doi.org/10.1109/tvt.2023.3235946

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Providing provable performance guarantees in vehicular network routing problems is crucial to ensure safely and timely delivery of information in an environment characterized by high mobility, dynamic network conditions, and frequent topology changes. While Reinforcement Learning (RL) has shown great promise in network routing, existing RL-based solutions typically support decision-making with either peak constraints or average constraints, but not both. For network routing in intelligent transportation, such as advanced vehicle control and safety, both peak constraints (e.g., maximum latency or minimum bandwidth guarantees) and average constraints (e.g., average transmit power or data rate constraints) must be satisfied. In this paper, we propose a holistic framework for RL-based vehicular network routing, which maximizes routing decisions under both average and peak constraints. The routing problem is modeled as a Constrained Markov Decision Process and recast into an optimization based on Constraint Satisfaction Problems (CSPs). We prove that the optimal policy of a given CSP can be learned by an extended Q-learning algorithm while satisfying both peak and average latency constraints. To improve the scalability of our framework, we further turn it into a decentralized implementation through a cluster-based learning structure. Applying the proposed RL algorithm to vehicular network routing problems under both peak and average latency constraints, simulation results show that our algorithm achieves much higher rewards than heuristic baselines with over 40% improvement in average transmission rate, while resulting in zero violation in terms of both peak and average constraints.

Full Text