Multi-agent reinforcement learning framework for autonomous traffic signal control in smart cities
IntroductionThe increasing urbanization across the world necessitate efficient traffic management especially in the emerging economies. This paper presents an intelligent framework aimed at enhancing traffic signal management within complex road networks through the creation and evaluation of a multi-agent reinforcement learning (MARL) framework.MethodsThe research explored how Reinforcement Learning (RL) algorithms can be employed to optimize the flow of traffic, lessen bottleneck, and enhance overall transportation safety and efficiency. Additionally, the research explored the design and simulation of a typical traffic environment that is, an intersection, defined and implemented a Multi-Agent System (MAS), and developed a Multi-Agent reinforcement learning model for traffic management within a simulated environment this model leverages actor-critics and deep Q Network (DQN) strategies for learning and coordination, and performed the evaluation of the MARL model. Novel approaches for decentralized decision-making and dynamic resource allocation were developed to enable real-time adaptation to changing traffic conditions and emergent situations. Performance evaluation using metrics such as waiting time, queue length, and congestion were carried out in the SUMO simulation platforms (Simulation of Urban Mobility) to evaluate the efficiency of the proposed solution in various traffic scenarios.Results and DiscussionThe outcome of the simulation conducted in this study showed an improvement in queue management and traffic flow by 64.5% and 70.0% respectively with improvement in performance of the proposed model over the episodes. The results show that the RL model policy showed better performance compared to the baseline policy, indicating that the model learned over different episodes. The results also show that the MARL-based approach performs better for decentralized traffic control systems in both scalability and adaptability. The proposed solution supports real-time decision-making, reduces traffic congestion, and improves the efficiency of the urban transportation system.
378
- 10.1016/j.array.2021.100057
- Jul 1, 2021
- Array
232
- 10.1016/j.sbspro.2013.11.170
- Dec 1, 2013
- Procedia - Social and Behavioral Sciences
35
- 10.3390/su15043479
- Feb 14, 2023
- Sustainability
5
- 10.1016/j.procs.2023.03.016
- Jan 1, 2023
- Procedia Computer Science
203
- 10.1049/iet-its.2011.0123
- Sep 1, 2012
- IET Intelligent Transport Systems
53
- 10.1016/j.engappai.2024.108147
- Feb 27, 2024
- Engineering Applications of Artificial Intelligence
43
- 10.3390/su12187394
- Sep 9, 2020
- Sustainability
124
- 10.1002/atr.1392
- Oct 1, 2016
- Journal of Advanced Transportation
387
- 10.1016/j.cities.2022.103794
- Jun 18, 2022
- Cities
23
- 10.3390/s23052373
- Feb 21, 2023
- Sensors (Basel, Switzerland)
- Research Article
53
- 10.1016/j.future.2020.03.065
- Apr 9, 2020
- Future Generation Computer Systems
Deep reinforcement learning for traffic signal control under disturbances: A case study on Sunway city, Malaysia
- Research Article
20
- 10.1109/tfuzz.2022.3214001
- Feb 1, 2023
- IEEE Transactions on Fuzzy Systems
In multiagent reinforcement learning (RL), multilayer fully connected neural network is used for value function approximation, which solves large-scale or continuous space problems. However, it is easy to fall into a local optimal and overfitting under partially observed environments. Because each agent lacks the information that plays a key role in decision making beyond the observation field. Even if communication is allowed, the received informations in communication channel have large noise due to the observations of other agents and strong uncertainty if the agent's policy is used as the communication information. To tackle this problem, two-stream fused fuzzy deep neural network (2s-FDNN) was proposed to reduce the uncertainty and noise of information in the communication channel. It is a parallel structure in which the fuzzy inference module reduces the uncertainty of information and the deep neural module reduces the noise of information. Then, we presented Fuzzy MA2C which integrates 2s-FDNN into multiagent deep RL to deal with uncertain communication informations for improving the robustness and generalization under partially observed environments. We empirically evaluate our methods in two large-scale traffic signal control environments using simulation of urban mobility (SUMO) simulator. Results demonstrate that our methods can achieve superior performance against existing RL algorithms.
- Research Article
10
- 10.32604/cmc.2022.022952
- Jan 1, 2022
- Computers, Materials & Continua
This paper investigates the use of multi-agent deep Q-network (MADQN) to address the curse of dimensionality issue occurred in the traditional multi-agent reinforcement learning (MARL) approach. The proposed MADQN is applied to traffic light controllers at multiple intersections with busy traffic and traffic disruptions, particularly rainfall. MADQN is based on deep Q-network (DQN), which is an integration of the traditional reinforcement learning (RL) and the newly emerging deep learning (DL) approaches. MADQN enables traffic light controllers to learn, exchange knowledge with neighboring agents, and select optimal joint actions in a collaborative manner. A case study based on a real traffic network is conducted as part of a sustainable urban city project in the Sunway City of Kuala Lumpur in Malaysia. Investigation is also performed using a grid traffic network (GTN) to understand that the proposed scheme is effective in a traditional traffic network. Our proposed scheme is evaluated using two simulation tools, namely Matlab and Simulation of Urban Mobility (SUMO). Our proposed scheme has shown that the cumulative delay of vehicles can be reduced by up to 30% in the simulations.
- Research Article
34
- 10.1109/tits.2021.3105426
- Jul 1, 2022
- IEEE Transactions on Intelligent Transportation Systems
With the development of communication technology and artificial intelligence of things (AIoT), transportation systems have become much smarter than ever before. However, the volume of vehicles and traffic flows have rapidly increased. Optimizing and improving urban traffic signal control is a potential way to relieve traffic congestion. In general, traffic signal control is a sequential decision process that conforms to the characteristics of reinforcement learning, in which an agent constantly interacts with its environment, thus providing strategy for optimizing behavior in accordance with feedback in response. In this paper, we propose multiagent reinforcement learning for traffic signals (MARL4TS) to support the control and deployment of traffic signals. First, information on traffic flows and multiple intersections is formalized as input environments for performing reinforcement learning. Second, we design a new reward function to continuously select the most appropriate strategy as control during multiagent learning to track actions for traffic signals. Finally, we use a supporting tool, Simulation of Urban MObility (SUMO), to simulate the proposed traffic signal control process and compare it with other methods. The experimental results show that our proposed MARL4TS method is superior to the baselines. In particular, our method can reduce vehicle delay.
- Research Article
- 10.1016/j.ifacol.2023.10.205
- Jan 1, 2023
- IFAC PapersOnLine
Cooperative Traffic Signal Control based on Biased ReLU Neural Network Approximation
- Research Article
5
- 10.3390/electronics13010198
- Jan 2, 2024
- Electronics
The optimization and control of traffic signals is very important for logistics transportation. It not only improves the operational efficiency and safety of road traffic, but also conforms to the direction of the intelligent, green, and sustainable development of modern cities. In order to improve the optimization effect of traffic signal control, this paper proposes a traffic signal optimization method based on deep reinforcement learning and Simulation of Urban Mobility (SUMO) software for urban traffic scenarios. The intersection training scenario was established using SUMO micro traffic simulation software, and the maximum vehicle queue length and vehicle queue time were selected as performance evaluation indicators. In order to be more relevant to the real environment, the experiment uses Weibull distribution to simulate vehicle generation. Since deep reinforcement learning takes into account both perceptual and decision-making capabilities, this study proposes a traffic signal optimization control model based on the deep reinforcement learning Deep Q Network (DQN) algorithm by considering the realism and complexity of traffic intersections, and first uses the DQN algorithm to train the model in a training scenario. After that, the G-DQN (Grouping-DQN) algorithm is proposed to address the problems that the definition of states in existing studies cannot accurately represent the traffic states and the slow convergence of neural networks. Finally, the performance of the G-DQN algorithm model was compared with the original DQN algorithm model and Advantage Actor-Critic (A2C) algorithm model. The experimental results show that the improved algorithm increased the main indicators in all aspects.
- Conference Article
14
- 10.1109/ijcnn48605.2020.9206820
- Jul 1, 2020
Finding the optimal control strategy for traffic signals, especially for multi-intersection traffic signals, is still a difficult task. The use of reinforcement learning (RL) algorithms to this problem is greatly limited because of the partially observable and nonstationary environment. In this paper, we study how to eliminate the above influence from the environment through communication among agents. The proposed method, called Information Exchange Deep Q-Network (IEDQN), has a learning communication protocol, which makes each local agent pay unbalanced and asymmetric attention to other agents’ information. Besides the protocol, each agent has the ability to abstract local information from its own history data for interacting, which means that the communication can avoid the dependent instant information and it is robust to the potential time delay of communication. Specifically, by alleviating the effects of partial observation, experience replay can recover to good performance. We evaluate IEDQN via simulation experiments in the simulation of urban mobility (SUMO) in a traffic grid, and it outperforms the comparative multi-agent RL (MARL) methods in both efficiency and effectiveness.
- Research Article
23
- 10.3390/s23052373
- Feb 21, 2023
- Sensors (Basel, Switzerland)
Intelligent traffic management systems have become one of the main applications of Intelligent Transportation Systems (ITS). There is a growing interest in Reinforcement Learning (RL) based control methods in ITS applications such as autonomous driving and traffic management solutions. Deep learning helps in approximating substantially complex nonlinear functions from complicated data sets and tackling complex control issues. In this paper, we propose an approach based on Multi-Agent Reinforcement Learning (MARL) and smart routing to improve the flow of autonomous vehicles on road networks. We evaluate Multi-Agent Advantage Actor-Critic (MA2C) and Independent Advantage Actor-Critical (IA2C), recently suggested Multi-Agent Reinforcement Learning techniques with smart routing for traffic signal optimization to determine its potential. We investigate the framework offered by non-Markov decision processes, enabling a more in-depth understanding of the algorithms. We conduct a critical analysis to observe the robustness and effectiveness of the method. The method’s efficacy and reliability are demonstrated by simulations using SUMO, a software modeling tool for traffic simulations. We used a road network that contains seven intersections. Our findings show that MA2C, when trained on pseudo-random vehicle flows, is a viable methodology that outperforms competing techniques.
- Research Article
- 10.1002/adfm.202509144
- Jun 18, 2025
- Advanced Functional Materials
Real‐time vehicle detection and adaptive traffic signal timings control are essential for intelligent traffic management systems, aimed at improving road utilization and enhancing traffic safety. However, conventional traffic monitoring equipment often requires additional power sources and complex deployment. Therefore, this study proposes an intelligent speed bump (ISB) that integrates a multi‐functional triboelectric nanogenerator (M‐TENG) with a bidirectional gear transmission structure and a contact‐sliding‐separation mode. This design can achieve self‐powered vehicle flow perception and provide sensing signals for the dynamic regulation of real‐time traffic flow, thereby alleviating traffic congestion. Furthermore, the process of vehicle passage through the ISB and data collection is simulated on the simulation of urban mobility (SUMO) platform, and traffic signal control is optimized using the Deep Q‐Network (DQN) algorithm. Experimental results show that the proposed ISB‐based intelligent traffic management system reduces average vehicle waiting time and queue length by 97.7% and 71.4%, respectively, significantly alleviating traffic congestion. This study overcomes the dependency of intelligent traffic integration devices on external power sources, providing new insights for the sustainable development of intelligent transportation systems.
- Conference Article
3
- 10.5121/csit.2021.110102
- Jan 23, 2021
Designing efficient transportation systems is crucial to save time and money for drivers and for the economy as whole. One of the most important components of traffic systems are traffic signals. Currently, most traffic signal systems are configured using fixed timing plans, which are based on limited vehicle count data. Past research has introduced and designed intelligent traffic signals; however, machine learning and deep learning have only recently been used in systems that aim to optimize the timing of traffic signals in order to reduce travel time. A very promising field in Artificial Intelligence is Reinforcement Learning. Reinforcement learning (RL) is a data driven method that has shown promising results in optimizing traffic signal timing plans to reduce traffic congestion. However, model-based and centralized methods are impractical here due to the high dimensional state-action space in complex urban traffic network. In this paper, a model-free approach is used to optimize signal timing for complicated multiple four-phase signalized intersections. We propose a multi-agent deep reinforcement learning framework that aims to optimize traffic flow using data within traffic signal intersections and data coming from other intersections in a Multi-Agent Environment in what is called Multi-Agent Reinforcement Learning (MARL). The proposed model consists of state-of-art techniques such as Double Deep Q-Network and Hindsight Experience Replay (HER). This research uses HER to allow our framework to quickly learn on sparse reward settings. We tested and evaluated our proposed model via a Simulation of Urban MObility simulation (SUMO). Our results show that the proposed method is effective in reducing congestion in both peak and off-peak times.
- Research Article
10
- 10.1360/ssi-2020-0180
- May 1, 2022
- SCIENTIA SINICA Informationis
Reinforcement learning (RL) technology has been successfully applied to various continuous decision environments in decades of development. Nowadays, RL is attracting more attention, even being touted as one of the closest approaches to general artificial intelligence. However, real-world problems often involve multiple intelligent agents interacting with each other. Thus, we focus on multi-agent reinforcement learning (MARL) to deal with such multi-agent systems in practice. In the past decade, the combination of multi-agent system and RL has become increasingly close, gradually forming and enriching the research field of MARL. Reviewing the studies on MARL, we found that researchers mainly solve MARL problems from three perspectives: learning framework, joint action learning, and communication-based MARL. In this paper, we focus from the studies on the communication perspective. We first state the reasons for choosing communication-based MARL and then list the president studies falling into the MARL category but different in nature. We hope that this article can provide a reference for developing MARL methods that can solve practical problems for the national welfare.
- Research Article
- 10.1142/s021812662550046x
- Sep 30, 2024
- Journal of Circuits, Systems and Computers
Day-to-day mobility among the population has increased with economic growth. Smart cities are renovated with advanced technologies to admire modern life in which intelligent transportation becomes highly focused. Because the traffic signal control systems are fixed at a constant time. They split the traffic signal into predetermined intervals and function inefficiently; they result in long wait times, waste fuel and increased carbon emissions. This research study introduces a novel technique for traffic light management to reduce the uncertainties in the system. A dynamic and intelligent traffic light adaptive optimal management system (DITLAOCS) is implemented in this research. It does this by modifying the traffic signal duration in run time and using real-time traffic data as input. Furthermore, the proposed DITLAOCS executes based on three modes: fairness mode (FM), priority mode (PM) and emergent mode (EM). In fairness mode (FM), all vehicles are prioritized equally, while vehicles in different categories receive varying priority levels. Emergency vehicles, on the other hand, receive the highest priority. Furthermore, a fuzzy inference method based on traffic data is shown to choose one mode out of three (FM, PM and EM). This model uses deep reinforcement learning to switch traffic lights in three different phases (red, green and yellow). We evaluated and accurately simulated DITLAOCS on the Shaanxi city map in China using Simulation of Urban MObility (SUMO), an open-source simulator. The simulation results illustrate the efficiency of DITLAOCS when compared to other cutting-edge algorithms on several performance measures.
- Research Article
8
- 10.1007/s13177-022-00321-5
- Aug 12, 2022
- International Journal of Intelligent Transportation Systems Research
Intelligent traffic lights in smart cities can optimally reduce traffic congestion. In this study, we employ reinforcement learning to train the control agent of a traffic light on a simulator of urban mobility. As a difference from existing works, a policy-based deep reinforcement learning method, Proximal Policy Optimization (PPO), is utilized rather than value-based methods such as Deep Q Network (DQN) and Double DQN (DDQN). First, the obtained optimal policy from PPO is compared to those from DQN and DDQN. It is found that the policy from PPO performs better than the others. Next, instead of fixed-interval traffic light phases, we adopt light phases with variable time intervals, which result in a better policy to pass the traffic flow. Then, the effects of environment and action disturbances are studied to demonstrate that the learning-based controller is robust. Finally, we consider unbalanced traffic flows and find that an intelligent traffic light can perform moderately well for the unbalanced traffic scenarios, although it learns the optimal policy from the balanced traffic scenarios only.
- Book Chapter
- 10.1007/978-981-19-1653-3_14
- Jan 1, 2022
Traffic jams are common as a result of the heavy traffic caused by the vast number of cars on the road. Despite the fact that traffic congestion is prevalent nowadays, enhancing the effectiveness of traffic signal control for effective traffic management is an important goal. The goals of a cooperative intelligent traffic manage scheme are to increase transportation movement and decrease the common wait time of every vehicle. Each signal aspires to make a more efficient journey motion. Throughout the course, signals build a cooperative strategy as well as a constraint for adjacent signals to maximize their particular benefits. However, although most current traffic management schemes rely on simple heuristics, a more effective traffic regulator can be researched using multi-agent reinforcement learning, where in each agent is in charge of only traffic light. The traffic controller model may be influenced by a number of variables. Learning the best feasible result is difficult. Agents in earlier methods chose only the most favorable actions that were close by without cooperating in their activities. Traffic light controllers are not trained to analyze previous data. Due to this, they are not capable to account for the unpredictable shift of traffic flow. A traffic controller model using reinforcement learning was used to obtain fine timing rules by appropriately describing real-time features of the real-world traffic scenario. This research broadens the scope of this technique to include clear cooperation between adjacent traffic lights. The proposed real-time traffic controller prototype can successfully follow traffic signal scheduling guidelines. The model learns and sets up the ideal actions by expanding the vehicle's traffic value, which includes delay time, the number of vehicles halted at a signal, and newly incoming vehicles. The experimentation results show a significant improvement in traffic management, proving that the projected prototype is smart enough for providing real-time dynamic traffic management.KeywordsCooperative learningMulti-agent systemsSmart traffic signal controlReinforcement learning
- Conference Article
- 10.1109/iccceee49695.2021.9429641
- Feb 26, 2021
Traffic flow optimization is an active line of research despite the wealth of literature been written on the topic, the major problem is the high dimension of input information that is available for controlling the traffic lights agents at each scenario, by the information we mean the traffic data that is continuously sampled by traffic cameras and detectors. All the papers came out focused on controlling the traffic lights cycle taking the street plans as a given. Controlling a traffic light cycle for a street plan that does not solve the population demand distribution will not end traffic congestion completely. Because of the inability to build new streets and a continuously changing population demand, the only thing to change is the streets plan. So This study proposes the idea of controlling the directions of these streets (one-way, two-ways) to match the new transportation demands of the ever-changing population in an area a task that is easy to do by using deep reinforcement learning. Deep Reinforcement learning combines both the generalization of reinforcement learning to any new scenario and the ability to handle large input spaces and convergences to minima to deep learning, since the action space in the study is discrete space-streets directions - we chose to use Deep Q-Networks - DQN - several experiments are performed on 4 different SUMO - Simulation of Urban Mobility - simulation networks.
- New
- Research Article
- 10.3389/fmech.2025.1706474
- Nov 6, 2025
- Frontiers in Mechanical Engineering
- New
- Research Article
- 10.3389/fmech.2025.1695174
- Nov 6, 2025
- Frontiers in Mechanical Engineering
- New
- Research Article
- 10.3389/fmech.2025.1689473
- Nov 4, 2025
- Frontiers in Mechanical Engineering
- New
- Research Article
- 10.3389/fmech.2025.1717059
- Nov 4, 2025
- Frontiers in Mechanical Engineering
- New
- Research Article
- 10.3389/fmech.2025.1666911
- Nov 3, 2025
- Frontiers in Mechanical Engineering
- New
- Research Article
- 10.3389/fmech.2025.1688439
- Nov 3, 2025
- Frontiers in Mechanical Engineering
- New
- Research Article
- 10.3389/fmech.2025.1619319
- Nov 3, 2025
- Frontiers in Mechanical Engineering
- Research Article
- 10.3389/fmech.2025.1627308
- Oct 29, 2025
- Frontiers in Mechanical Engineering
- Research Article
- 10.3389/fmech.2025.1647580
- Oct 29, 2025
- Frontiers in Mechanical Engineering
- Research Article
- 10.3389/fmech.2025.1690974
- Oct 23, 2025
- Frontiers in Mechanical Engineering
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.