Reinforcement Learning Versus PDE Backstepping and PI Control for Congested Freeway Traffic

Huan Yu,Scott Moura,Alexandre Bayen,Miroslav Krstic,Saehong Park

doi:10.1109/tcst.2021.3116796

Abstract

We develop reinforcement learning (RL) boundary controllers to mitigate stop-and-go traffic congestion on a freeway segment. The traffic dynamics of the freeway segment are governed by a macroscopic Aw-Rascle-Zhang (ARZ) model, consisting of 2 x 2 quasi-linear partial differential equations (PDEs) for traffic density and velocity. The boundary stabilization of the linearized ARZ PDE model has been solved by PDE backstepping, guaranteeing spatial L² norm regulation of the traffic state to uniform density and velocity and ensuring that traffic oscillations are suppressed. Collocated proportional (P) and proportional-integral (PI) controllers also provide stability guarantees for allowable control gains and are always applicable as model-free control options through gain tuning by trial and error, or by model-free optimization. Although these approaches are mathematically elegant, the stabilization result only holds locally and is usually affected by the change of model parameters. Therefore, we reformulate the PDE boundary control problem as an RL problem that pursues stabilization without knowing the system dynamics, simply by observing the state values. The proximal policy optimization (PPO), a neural network-based policy gradient algorithm, is employed to obtain RL controllers by interacting with a numerical simulator of the ARZ PDE. Being stabilization-inspired, the RL state-feedback boundary controllers are compared and evaluated against the rigorously stabilizing controllers in two cases: 1) in a system with perfect knowledge of the traffic flow dynamics and then 2) in one with only partial knowledge. We obtain RL controllers that nearly recover the performance of the backstepping, P, and PI controllers with perfect knowledge and outperform them in some cases with partial knowledge. It must be noted, however, that the RL controllers are obtained by conducting about one thousand episodes of iterative training on a simulation model. This training cannot be performed in a collision-free fashion in real traffic, nor convergence guaranteed when training. Thus, we demonstrate that the RL approach has learning (i.e., adaptation) potential for traffic PDE systems under uncertain and changing conditions, but RL is neither simple nor a fully safe substitute for model-based control in real traffic systems.

Full Text