Abstract

Reinforcement learning (RL) based traffic signal control for large-scale traffic grids is challenging due to the curse of dimensionality. Most particularly, searching for an optimal policy in a huge action space is impractical, even with approximate Q-functions. On the other hand, heuristic self-organizing algorithms could achieve efficient decentralized control, but most of them have few effort on optimizing the real-time traffic. This paper proposes a new regional RL algorithm that could form local cooperation regions adaptively, and then learn the optimal control policy for each region separately. In particular, we maintain a set of learning parameters to capture the control patterns in regions at different scales. At each time step, we first decompose the large-scale traffic grid into disjoint sub-regions, depending on the real-time traffic condition. Next, we apply approximate Q-learning to learn the centralized control policy within each sub-region, by updating the corresponding learning parameters upon traffic observations. The numerical experiments demonstrate that our regional RL algorithm is computationally efficient and functionally adaptive, and it outperforms typical heuristic decentralized algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call