Combining multi-agent deep deterministic policy gradient and rerouting technique to improve traffic network performance under mixed traffic conditions

Hung Tuan Trinh,Duy Quang Tran,Sang-Hoon Bae

doi:10.1177/00375497241237831

Abstract

In the future, mixed traffic flow will include two types of vehicles: connected autonomous vehicles (CAVs) and human-driven vehicles (HDVs). CAVs emerge as new solutions to disrupt the traditional transportation system. This new solution shares real-time data with each other and the roadside units (RSU) for network management. Reinforcement learning (RL) is a promising approach for traffic signal management in complex urban areas by leveraging information gathered from CAVs. In particular, coordinating signal management at many intersections is a critical challenge in multi-agent reinforcement learning (MARL). According to this vision, we propose an approach that combines an actor–critic network–based multi-agent deep deterministic policy gradient (MADDPG) model and a rerouting technique (RT) to increase traffic performance in vehicular networks. This algorithm overcomes the inherent non-stationary of Q-learning and the high variance of policy gradient (PG) algorithms. Based on centralized learning with decentralized execution, the MADDPG model employs one actor and one critic for each agent. The actor network uses local information to execute actions, while the critic network is trained with extra information, including the states and actions of other agents. Through a centralized learning process, agents can coordinate with each other, diminishing the influence of an unstable environment. Unlike previous studies, we not only manage traffic light systems but also consider the effect of platooning vehicles on increasing throughput. Experimental results show that our model outperforms other models in terms of traffic performance in different scenarios.

Full Text