Proximal policy optimization learning based control of congested freeway traffic

Zhiguang Feng,Yueying Wang,Jie Qi,Nailong Wu,Anqi Pan,Huaicheng Yan,Shurong Mo

doi:10.1002/oca.3068

Abstract

AbstractIn this paper, a delay compensation feedback controller based on reinforcement learning is proposed to adjust the time interval of the adaptive cruise control (ACC) vehicle agents in the traffic congestion by introducing the proximal policy optimization (PPO) scheme. The high‐speed traffic flow is characterized by a two‐by‐two Aw Rasle Zhang nonlinear first‐order partial differential equations (PDEs). Unlike the backstepping delay compensation control,23 the PPO controller proposed in this paper consists of the current traffic flow velocity, the current traffic flow density and the previous one step control input. Since the system dynamics of the traffic flow are difficult to be expressed mathematically, the control gains of the three feedback can be determined via learning from the interaction between the PPO and the digital simulator of the traffic system. The performance of Lyapunov control, backstepping control and PPO control are compared with numerical simulation. The results demonstrate that PPO control is superior to Lyapunov control in terms of the convergence rate and control efforts for the traffic system without delay. As for the traffic system with unstable input delay value, the performance of PPO controller is also equivalent to that of backstepping controller. Besides, PPO is more robust than backstepping controller when the parameter is sensitive to Gaussian noise.

Full Text