Continuous residual reinforcement learning for traffic signal control optimization

  • Abstract
  • References
  • Citations
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on continuous reinforcement learning. Although they have been successful in traffic signal control, they may become unstable and fail to converge to near-optimal solutions. We develop adaptive traffic signal controllers based on continuous residual reinforcement learning (CRL-TSC) that is more stable. The effect of three feature functions is empirically investigated in a microscopic traffic simulation. Furthermore, the effects of departing streets, more actions, and the use of the spatial distribution of the vehicles on the performance of CRL-TSCs are assessed. The results show that the best setup of the CRL-TSC leads to saving average travel time by 15% in comparison to an optimized fixed-time controller.

ReferencesShowing 10 of 30 papers
  • Cite Count Icon 165
  • 10.1109/itsc.2011.6083114
Traffic light control in non-stationary environments based on multi agent Q-learning
  • Oct 1, 2011
  • Monireh Abdoos + 2 more

  • Cite Count Icon 446
  • 10.1007/978-3-642-27645-3_1
Reinforcement Learning and Markov Decision Processes
  • Jan 1, 2012
  • Martijn Van Otterlo + 1 more

  • Open Access Icon
  • Cite Count Icon 60901
  • 10.1016/s0019-9958(65)90241-x
Fuzzy sets
  • Jun 1, 1965
  • Information and Control
  • L.A Zadeh

  • Cite Count Icon 343
  • 10.4324/9781315618159
The Geography of Transport Systems
  • Dec 19, 2016
  • Jean-Paul Rodrigue + 2 more

  • Cite Count Icon 1759
  • 10.1201/9781420050646.ptb6
Neural Networks
  • Jan 1, 1996
  • Christopher Bishop

  • Cite Count Icon 1024
  • 10.1016/0191-2615(86)90012-3
A model for the structure of lane-changing decisions
  • Oct 1, 1986
  • Transportation Research Part B: Methodological
  • P.G Gipps

  • Cite Count Icon 76
  • 10.1016/j.eswa.2014.06.022
Type-2 fuzzy multi-intersection traffic signal control with differential evolution optimization
  • Jun 17, 2014
  • Expert Systems with Applications
  • Yunrui Bi + 4 more

  • Cite Count Icon 32
  • 10.1002/atr.1205
Stepwise genetic fuzzy logic signal control under mixed traffic conditions
  • Sep 10, 2012
  • Journal of Advanced Transportation
  • Yu‐Chiun Chiou + 1 more

  • Cite Count Icon 256
  • 10.1109/tits.2010.2091408
Reinforcement Learning With Function Approximation for Traffic Signal Control
  • Jun 1, 2011
  • IEEE Transactions on Intelligent Transportation Systems
  • Prashanth La + 1 more

  • Cite Count Icon 453
  • 10.1109/tits.2013.2255286
Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto
  • Sep 1, 2013
  • IEEE Transactions on Intelligent Transportation Systems
  • Samah El-Tantawy + 2 more

CitationsShowing 10 of 15 papers
  • Open Access Icon
  • Research Article
  • Cite Count Icon 50
  • 10.1016/j.aei.2018.08.002
Traffic signal optimization through discrete and continuous reinforcement learning with robustness analysis in downtown Tehran
  • Oct 1, 2018
  • Advanced Engineering Informatics
  • Mohammad Aslani + 3 more

Traffic signal optimization through discrete and continuous reinforcement learning with robustness analysis in downtown Tehran

  • Book Chapter
  • Cite Count Icon 6
  • 10.4018/978-1-7998-5175-2.ch009
Traffic Signal Control for a Single Intersection-Based Intelligent Transportation System
  • Jan 1, 2020
  • Nouha Rida + 2 more

Traffic optimization at an intersection, using real-time traffic information, presents an important focus of research into intelligent transportation systems. Several studies have proposed adaptive traffic lights control, which concentrates on determining green light length and sequence of the phases for each cycle in accordance with the real-time traffic detected. In order to minimize the waiting time at the intersection, the authors propose an intelligent traffic light using the information collected by a wireless sensors network installed in the road. The proposed algorithm is essentially based on two parameters: the waiting time in each lane and the length of its queue. The simulations show that the algorithm applied at a network of intersections improves significantly the average waiting time, queue length, fuel consumption, and CO2 emissions.

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.3390/electronics10192363
Traffic Signal Control System Based on Intelligent Transportation System and Reinforcement Learning
  • Sep 28, 2021
  • Electronics
  • Julián Hurtado-Gómez + 4 more

Traffic congestion has several causes, including insufficient road capacity, unrestricted demand and improper scheduling of traffic signal phases. A great variety of efforts have been made to properly program such phases. Some of them are based on traditional transportation assumptions, and others are adaptive, allowing the system to learn the control law (signal program) from data obtained from different sources. Reinforcement Learning (RL) is a technique commonly used in previous research. However, properly determining the states and the reward is key to obtain good results and to have a real chance to implement it. This paper proposes and implements a traffic signal control system (TSCS), detailing its development stages: (a) Intelligent Transportation System (ITS) architecture design for the TSCS; (b) design and development of a system prototype, including an RL algorithm to minimize the vehicle queue at intersections, and detection and calculation of such queues by adapting a computer vision algorithm; and (c) design and development of system tests to validate operation of the algorithms and the system prototype. Results include the development of the tests for each module (vehicle queue measurement and RL algorithm) and real-time integration tests. Finally, the article presents a system simulation in the context of a medium-sized city in a developing country, showing that the proposed system allowed reduction of vehicle queues by 29%, of waiting time by 50%, and of lost time by 50%, when compared to fixed phase times in traffic signals.

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 10
  • 10.1155/2020/6489027
Optimizing the Junction-Tree-Based Reinforcement Learning Algorithm for Network-Wide Signal Coordination
  • Feb 21, 2020
  • Journal of Advanced Transportation
  • Yi Zhao + 3 more

This study develops three measures to optimize the junction-tree-based reinforcement learning (RL) algorithm, which will be used for network-wide signal coordination. The first measure is to optimize the frequency of running the junction-tree algorithm (JTA) and the intersection status division. The second one is to optimize the JTA information transmission mode. The third one is to optimize the operation of a single intersection. A test network and three test groups are built to analyze the optimization effect. Group 1 is the control group, group 2 adopts the optimizations for the basic parameters and the information transmission mode, and group 3 adopts optimizations for the operation of a single intersection. Environments with different congestion levels are also tested. Results show that optimizations of the basic parameters and the information transmission mode can improve the system efficiency and the flexibility of the green light, and optimizing the operation of a single intersection can improve the efficiency of both the system and the individual intersection. By applying the proposed optimizations to the existing JTA-based RL algorithm, network-wide signal coordination can perform better.

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 133
  • 10.1016/j.eswa.2022.116830
Reinforcement learning in urban network traffic signal control: A systematic literature review
  • Mar 17, 2022
  • Expert Systems with Applications
  • Mohammad Noaeen + 7 more

Improvement of traffic signal control (TSC) efficiency has been found to lead to improved urban transportation and enhanced quality of life. Recently, the use of reinforcement learning (RL) in various areas of TSC has gained significant traction; thus, we conducted a systematic literature review as a systematic, comprehensive, and reproducible review to dissect all the existing research that applied RL in the network-level TSC domain, called as RL in NTSC or RL-NTSC for brevity. The review only targeted the network-level articles that tested the proposed methods in networks with two or more intersections. This review covers 160 peer-reviewed articles from 30 countries published from 1994 to March 2020. The goal of this study is to provide the research community with statistical and conceptual knowledge, summarize existence evidence, characterize RL applications in NTSC domains, explore all applied methods and major first events in the defined scope, and identify areas for further research based on the explored research problems in current research. We analyzed the extracted data from the included articles in the following seven categories: (i) publication and authors’ data, (ii) method identification and analysis, (iii) environment attributes and traffic simulation, (iv) application domains of RL-NTSC, (v) major first events of RL-NTSC and authors’ key statements, (vi) code availability, and (vii) evaluation. This paper provides a comprehensive view of the past 26 years of research on applying RL to NTSC. It also reveals the role of advancing deep learning methods in the revival of the research area, the rise of using non-commercial microscopic traffic simulators, a lack of interaction between traffic and transportation engineering practitioners and researchers, and a lack of proposal and creation of testbeds which can likely bring different communities together around common goals.

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 16
  • 10.3390/app112210688
Traffic Signal Optimization for Multiple Intersections Based on Reinforcement Learning
  • Nov 12, 2021
  • Applied Sciences
  • Jaun Gu + 5 more

In order to deal with dynamic traffic flow, adaptive traffic signal controls using reinforcement learning are being studied. However, most of the related studies are difficult to apply to the real field considering only mathematical optimization. In this study, we propose a reinforcement learning-based signal optimization model with constraints. The proposed model maintains the sequence of typical signal phases and considers the minimum green time. The model was trained using Simulation of Urban MObility (SUMO), a microscopic traffic simulator. The model was evaluated in the virtual environment similar to a real road with multiple intersections connected. The performance of the proposed model was analyzed by comparing the delay and number of stops with a reinforcement learning model that did not consider constraints and a fixed-time model. In a peak hour, the proposed model reduced the delay from 3 min 15 s to 2 min 15 s and the number of stops from 11 to 4.7 compared to the fixed-time model.

  • Research Article
  • Cite Count Icon 125
  • 10.1111/mice.12558
Deep reinforcement learning for long‐term pavement maintenance planning
  • May 20, 2020
  • Computer-Aided Civil and Infrastructure Engineering
  • Linyi Yao + 3 more

Abstract Inappropriate maintenance and rehabilitation strategies cause many problems such as maintenance budget waste, ineffective pavement distress treatments, and so forth. A method based on a machine learning algorithm called deep reinforcement learning (DRL) was developed in this presented research in order to learn better maintenance strategies that maximize the long‐term cost‐effectiveness in maintenance decision‐making through trial and error. In this method, each single‐lane pavement segment can have different treatments, and the long‐term maintenance cost‐effectiveness of the entire section is treated as the optimization goal. In the DRL algorithm, states are embodied by 42 parameters involving the pavement structures and materials, traffic loads, maintenance records, pavement conditions, and so forth. Specific treatments as well as do‐nothing are the actions. The reward is defined as the increased or decreased cost‐effectiveness after taking corresponding actions. Two expressways, the Ningchang and Zhenli expressways, were selected for a case study. The results show that the DRL model is capable of learning a better strategy to improve the long‐term maintenance cost‐effectiveness. By implementing the optimized maintenance strategies produced by the developed model, the pavement conditions can be controlled in an acceptable range.

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 57
  • 10.3390/s20010137
Cooperative Traffic Signal Control with Traffic Flow Prediction in Multi-Intersection.
  • Dec 24, 2019
  • Sensors
  • Daeho Kim + 1 more

As traffic congestion in cities becomes serious, intelligent traffic signal control has been actively studied. Deep Q-Network (DQN), a representative deep reinforcement learning algorithm, is applied to various domains from fully-observable game environment to traffic signal control. Due to the effective performance of DQN, deep reinforcement learning has improved speeds and various DQN extensions have been introduced. However, most traffic signal control researches were performed at a single intersection, and because of the use of virtual simulators, there are limitations that do not take into account variables that affect actual traffic conditions. In this paper, we propose a cooperative traffic signal control with traffic flow prediction (TFP-CTSC) for a multi-intersection. A traffic flow prediction model predicts future traffic state and considers the variables that affect actual traffic conditions. In addition, for cooperative traffic signal control in multi-intersection, each intersection is modeled as an agent, and each agent is trained to take best action by receiving traffic states from the road environment. To deal with multi-intersection efficiently, agents share their traffic information with other adjacent intersections. In the experiment, TFP-CTSC is compared with existing traffic signal control algorithms in a 4 × 4 intersection environment. We verify our traffic flow prediction and cooperative method.

  • Research Article
  • Cite Count Icon 56
  • 10.1016/j.eswa.2021.114580
Hierarchical traffic signal optimization using reinforcement learning and traffic prediction with long-short term memory
  • Jan 19, 2021
  • Expert Systems with Applications
  • Monireh Abdoos + 1 more

Hierarchical traffic signal optimization using reinforcement learning and traffic prediction with long-short term memory

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.4018/ijssmet.290330
A Collaborative Road Traffic Regulation Approach Using a Wireless Sensor Network
  • Nov 26, 2021
  • International Journal of Service Science, Management, Engineering, and Technology
  • Nouha Rida + 1 more

In this paper, we detail and evaluate a coordinated approach to determining the sequence and duration of green lights at several intersections as part of the Intelligent Transportation System. We present the architecture of a wireless network used to track variations in adjacent intersections. Our algorithm exploits the collected data to determine the sequence of the green lights based on three objectives: (i) reduce the length of queues in the intersection, (ii) prioritize sending vehicle flows to intersections with lower traffic density than the most congested, (iii) synchronize traffic signals between adjacent intersections to create green waves. Traffic simulations have been simulated by the SUMO traffic simulator, they show that our solution manages to react to traffic change and reduce waiting time compared to isolated control strategies.

Similar Papers
  • Research Article
  • Cite Count Icon 50
  • 10.1016/j.aei.2018.08.002
Traffic signal optimization through discrete and continuous reinforcement learning with robustness analysis in downtown Tehran
  • Oct 1, 2018
  • Advanced Engineering Informatics
  • Mohammad Aslani + 3 more

Traffic signal optimization through discrete and continuous reinforcement learning with robustness analysis in downtown Tehran

  • Research Article
  • Cite Count Icon 2
  • 10.3303/cet1972016
The Real-time Traffic Signal Control System for the Minimum Emission using Reinforcement Learning in V2X Environment
  • Jan 31, 2019
  • Chemical engineering transactions
  • Joo-Young Kim + 3 more

As the population and vehicle ownership increase, emission of pollutants is also increasing. The percentage of GHG emission by transportation sector is about 21 % in 2015 (OECD), and this may be caused by frequent stop-and-go phenomenon or delay time of vehicles in signalized intersection. Generally, these could be minimised by driving in constant speed or decreasing the delay times with an efficient traffic signal control. On the other hand, researches try to decrease vehicles’ delay time and to exclude the unnecessary stop-and-go phenomenon in an urban signalized intersection with an advent of V2X (Vehicle-to-Everything) technology development. Especially, in traditional pre-timed traffic signal control situation, even the autonomous vehicles would be impossible to exhibit their own maximum performance. Thus, the development of the traffic signal control system could have effects not only on the traffic flow but also on environmental aspects, which optimizes the signalized traffic flow based on the real-time vehicle information. In this research, on the premise of V2X environment, changes in traffic flow and the emission are analysed based on microscopic traffic information. In specific, the reinforcement learning model is constructed based on Deep Learning which learns the real-time traffic information and displays the optimal traffic signal. The performance of the system was analysed through microscopic traffic simulator - Vissim. The proposed system is expected to contribute on analysing the traffic flow and the environmental effects. Also, it is expected to contribute on constructing the green smart cities with an advent of autonomous vehicle operation in future V2X environment.

  • Research Article
  • 10.1609/aaai.v39i28.35251
On-Policy Algorithms for Continual Reinforcement Learning (Student Abstract)
  • Apr 11, 2025
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Tadeusz Dziarmaga + 3 more

Continual reinforcement learning (CRL) is the study of optimal strategies for maximizing rewards in sequential environments that change over time. This is particularly crucial in domains such as robotics, where the operational environment is inherently dynamic and subject to continual change. Nevertheless, research in this area has thus far concentrated on off-policy algorithms with replay buffers that are capable of amortizing the impact of distribution shifts. Such an approach is not feasible with on-policy reinforcement learning algorithms that learn solely from the data obtained from the current policy. In this paper, we examine the performance of proximal policy optimization (PPO), a prevalent on-policy reinforcement learning (RL) algorithm, in a classical CRL benchmark. Our findings suggest that the current methods are suboptimal in terms of average performance. Nevertheless, they demonstrate encouraging competitive outcomes with respect to forward transfer and forgetting metrics. This highlights the need for further research into continual on-policy reinforcement learning. The source code is available at https://github.com/Teddy298/continualworld-ppo.

  • Book Chapter
  • Cite Count Icon 7
  • 10.1007/978-3-319-56994-9_44
Large-Scale Traffic Grid Signal Control Using Decentralized Fuzzy Reinforcement Learning
  • Aug 20, 2017
  • Tian Tan + 3 more

With the rise of rapid urbanization around the world, a majority of countries have experienced a significant increase in traffic congestion. The negative impacts of this change have resulted in a number of serious and adverse effects, not only regarding the quality of daily life at an individual level but also for nations’ economic growth. Thus, the importance of traffic congestion management is well recognized. Adaptive real-time traffic signal control is effective for traffic congestion management. In particular, adaptive control with reinforcement learning (RL) is a promising technique that has recently been introduced in the field to better manage traffic congestion. Traditionally, most studies on traffic signal control have used centralized reinforcement learning, whose computation inefficiency prevents it from being employed for large traffic networks. In this paper, we propose a computationally cost-effective distributed algorithm, namely, a decentralized fuzzy reinforcement learning approach, to deal with problems related to the exponentially growing number of possible states and actions in RL models for a large-scale traffic signal control network. More specifically, the traffic density at each intersection is first mapped to four different fuzzy sets (i.e., low, medium, high, and extremely high). Next, two different kinds of algorithms, greedy and neighborhood approximate Q-learning (NAQL), are adaptively selected, based on the real-time, fuzzified congestion levels. To further reduce computational costs and the number of state-action pairs in the RL model, coordination and communication between the intersections are confined within a single neighborhood, i.e., the controlled intersection with its immediate neighbor intersections, for the NAQL algorithm. Finally, we conduct several numerical experiments to verify the efficiency and effectiveness of our approach. The results demonstrate that the decentralized fuzzy reinforcement learning algorithm achieves comparable results when measured against traditional heuristic-based algorithms. In addition, the decentralized fuzzy RL algorithm generates more adaptive control rules for the underlying dynamics of large-scale traffic networks. Thus, the proposed approach sheds new light on how to provide further improvements to a networked traffic signal control system for real-time traffic congestion.

  • Conference Article
  • Cite Count Icon 2
  • 10.29007/t895
Chula-SSS: Developmental Framework for Signal Actuated Logics on SUMO Platform in Over-saturated Sathorn Road Network Scenario
  • Jun 25, 2018
  • Chaodit Aswakul + 4 more

In this paper, Chula-Sathorn SUMO Simulator (Chula-SSS) has been proposed as an educational tool for traffic police and traffic engineers. The tool supports our framework to develop actuated traffic signal control logics in order to resolve urban traffic congestion. The framework design aims to incorporate the tacit traffic control expertise of human operators by trying to extract and extend the human-level intelligence in actuating logically traffic signal controls. In this regard, a new software package has been developed for the microscopic-mobility computer simulation capability of the SUMO (Simulation of Urban MObility) platform. Using the SUMO TraCI, our package implements the graphical user interface (GUI) of actual traffic light signal control panel, recently introduced in Bangkok (Thailand) for traffic police deployment in the Chulalongkorn University’s Sathorn Model project under the umbrella of Sustainable Mobility Project 2.0 of the World Business Council for Sustainable Development (WBCSD). The traffic light signal control panel GUI modules can communicate via TraCI in real-time to SUMO in order both to retrieve the raw traffic sensor data emulated within SUMO and to send the desired traffic light signal phase manually entered via GUI by the module users. Each of the users could play a role of traffic police in charge of actuating the traffic light signal at each of the controllable intersections. To demonstrate this framework, Chula-SSS has been implemented with the calibrated SUMO dataset of Sathorn Road network area. This area is one of the most critical areas in Bangkok due to the immense traffic volume with daily recurring traffic bottlenecks and network deadlocks. The simulation comprises of 2375 intersection nodes, 4517 edges, 10 main signalised intersections. The provided datasets with Chula-SSS cover both the morning and evening rush-hour periods each with over 55,000 simulated vehicles based on the comprehensive traffic data collection and SUMO mobility model calibration. It is hoped that the herein developed framework and software package can be not only useful for our Thailand case, but also readily extensible to those developing and least- developed countries where traffic signal controls rely on human operations, not yet fully automated by an area traffic controller. In those cases, the framework proposed herein is expectedly an enabling technology for the human operators to practice, learn, and evolve their traffic signal control strategies systematically.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.1177/1729881420911491
Continuous reinforcement learning to adapt multi-objective optimization online for robot motion
  • Mar 1, 2020
  • International Journal of Advanced Robotic Systems
  • Kai Zhang + 3 more

This article introduces a continuous reinforcement learning framework to enable online adaptation of multi-objective optimization functions for guiding a mobile robot to move in changing dynamic environments. The robot with this framework can continuously learn from multiple or changing environments where it encounters different numbers of obstacles moving in unknown ways at different times. Using both planned trajectories from a real-time motion planner and already executed trajectories as feedback observations, our reinforcement learning agent enables the robot to adapt motion behaviors to environmental changes. The agent contains a Q network connected to a long short-term memory network. The proposed framework is tested in both simulations and real robot experiments over various, dynamically varied task environments. The results show the efficacy of online continuous reinforcement learning for quick adaption to different, unknown, and dynamic environments.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/icdmw58026.2022.00011
Streaming Traffic Flow Prediction Based on Continuous Reinforcement Learning
  • Nov 1, 2022
  • Yanan Xiao + 5 more

Traffic flow prediction is an important part of smart transportation. The goal is to predict future traffic conditions based on historical data recorded by sensors and the traffic net-work. As the city continues to build, parts of the transportation network will be added or modified. How to accurately predict expanding and evolving long-term streaming networks is of great significance. To this end, we propose a new simulation-based criterion that considers teaching autonomous agents to mimic sensor patterns, planning their next visit based on the sensor's profile (e.g., traffic, speed, occupancy). The data recorded by the sensor is most accurate when the agent can perfectly simulate the sensor's activity pattern. We propose to formulate the problem as a continuous reinforcement learning task, where the agent is the next flow value predictor, the action is the next time-series flow value in the sensor, and the environment state is a dynamically fused representation of the sensor and transportation network. Actions taken by the agent change the environment, which in turn forces the agent's mode to update, while the agent further explores changes in the dynamic traffic network, which helps the agent predict its next visit more accurately. Therefore, we develop a strategy in which sensors and traffic networks update each other and incorporate temporal context to quantify state representations evolving over time. Along these lines, we propose streaming traffic flow prediction based on continuous reinforcement learning model (ST-CRL), a kind of predictive model based on reinforcement learning and continuous learning, and an analytical algorithm based on KL divergence that cleverly incorporates long-term novel patterns into model induction. Second, we introduce a prioritized experience replay strategy to consolidate and aggregate previously learned core knowledge into the model. The proposed model is able to continuously learn and predict as the traffic flow network expands and evolves over time. Extensive experiments show that the algorithm has great potential in predicting long-term streaming media networks, while achieving data privacy protection to a certain extent.

  • Research Article
  • 10.3390/en17235876
Exploring the Preference for Discrete over Continuous Reinforcement Learning in Energy Storage Arbitrage
  • Nov 22, 2024
  • Energies
  • Jaeik Jeong + 2 more

In recent research addressing energy arbitrage with energy storage systems (ESSs), discrete reinforcement learning (RL) has often been employed, while the underlying reasons for this preference have not been explicitly clarified. This paper aims to elucidate why discrete RL tends to be more suitable than continuous RL for energy arbitrage problems. When using continuous RL, the charging and discharging actions determined by the agent often exceed the physical limits of the ESS, necessitating clipping to the boundary values. This introduces a critical issue where the learned actions become stuck at the state of charge (SoC) boundaries, hindering effective learning. Although recent advancements in constrained RL offer potential solutions, their application often results in overly conservative policies, preventing the full utilization of ESS capabilities. In contrast, discrete RL, while lacking in granular control, successfully avoids these two key challenges, as demonstrated by simulation results showing superior performance. Additionally, it was found that, due to its characteristics, discrete RL more easily drives the ESS towards fully charged or fully discharged states, thereby increasing the utilization of the storage system. Our findings provide a solid justification for the prevalent use of discrete RL in recent studies involving energy arbitrage with ESSs, offering new insights into the strategic selection of RL methods in this domain. Looking ahead, improving performance will require further advancements in continuous RL methods. This study provides valuable direction for future research in continuous RL, highlighting the challenges and potential strategies to overcome them to fully exploit ESS capabilities.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/cec.2019.8790300
Signal Control of Urban Traffic Network Based on Multi-Agent Architecture and Fireworks Algorithm
  • Jun 1, 2019
  • Zhimin Qiao + 3 more

The application of multi-agent technology in urban traffic network control makes the traffic signal control has ability of adaptive adjustment. With the changing of traffic flow in the road, it can adjust the important parameters such as the offset, green ratio and public cycle of signal lights in real time, which can effectively reduce traffic congestion and improve the vehicle capacity of the urban traffic network. In this paper, a three level multi-agent control framework is used. The objective functions are the optimization model of green ratio delay, public cycle and offset delay. Fireworks algorithm is used to solve the modeling optimization problem. The simulation results show that the adaptive traffic network signal control can significantly reduce the total delay time of the traffic network and improve the road utilization rate with the continuous change of traffic flow. Compared with the traditional traffic signal timing control, the adaptive traffic network signal control has great advantages and overcomes the disadvantages of traditional traffic signal control. At the same time, the fireworks algorithm shows significant performance in solving the optimizing model of the traffic network.

  • Research Article
  • Cite Count Icon 338
  • 10.1109/tits.2006.874716
Neural Networks for Real-Time Traffic Signal Control
  • Sep 1, 2006
  • IEEE Transactions on Intelligent Transportation Systems
  • D Srinivasan + 2 more

Real-time traffic signal control is an integral part of the urban traffic control system, and providing effective real-time traffic signal control for a large complex traffic network is an extremely challenging distributed control problem. This paper adopts the multiagent system approach to develop distributed unsupervised traffic responsive signal control models, where each agent in the system is a local traffic signal controller for one intersection in the traffic network. The first multiagent system is developed using hybrid computational intelligent techniques. Each agent employs a multistage online learning process to update and adapt its knowledge base and decision-making mechanism. The second multiagent system is developed by integrating the simultaneous perturbation stochastic approximation theorem in fuzzy neural networks (NN). The problem of real-time traffic signal control is especially challenging if the agents are used for an infinite horizon problem, where online learning has to take place continuously once the agent-based traffic signal controllers are implemented into the traffic network. A comprehensive simulation model of a section of the Central Business District of Singapore has been developed using PARAMICS microscopic simulation program. Simulation results show that the hybrid multiagent system provides significant improvement in traffic conditions when evaluated against an existing traffic signal control algorithm as well as the SPSA-NN-based multiagent system as the complexity of the simulation scenario increases. Using the hybrid NN-based multiagent system, the mean delay of each vehicle was reduced by 78% and the mean stoppage time, by 85% compared to the existing traffic signal control algorithm. The promising results demonstrate the efficacy of the hybrid NN-based multiagent system in solving large-scale traffic signal control problems in a distributed manner

  • Conference Article
  • Cite Count Icon 7
  • 10.1145/3319619.3322044
Towards continual reinforcement learning through evolutionary meta-learning
  • Jul 13, 2019
  • Djordje Grbic + 1 more

In continual learning, an agent is exposed to a changing environment, requiring it to adapt during execution time. While traditional reinforcement learning (RL) methods have shown impressive results in various domains, there has been less progress in addressing the challenge of continual learning. Current RL approaches do not allow the agent to adapt during execution but only during a dedicated training phase. Here we study the problem of continual learning in a 2D bipedal walker domain, in which the legs of the walker grow over its lifetime, requiring the agent to adapt. The introduced approach combines neuroevolution, to determine the starting weights of a deep neural network, and a version of deep reinforcement learning that is continually running during execution time. The proof-of-concept results show that the combined approach gives a better generalisation performance when compared to evolution or reinforcement learning alone. The hybridization of reinforcement learning and evolution opens up exciting new research directions for continually learning agents that can benefit from suitable priors determined by an evolutionary process.

  • Research Article
  • Cite Count Icon 1
  • 10.1063/5.0239718
Forced convection heat transfer control for cylinder via closed-loop continuous goal-oriented reinforcement learning
  • Nov 1, 2024
  • Physics of Fluids
  • Yangwei Liu + 3 more

Forced convection heat transfer control offers considerable engineering value. This study focuses on a two-dimensional rapid temperature control problem in a heat exchange system, where a cylindrical heat source is immersed in a narrow cavity. First, a closed-loop continuous deep reinforcement learning (DRL) framework based on the deep deterministic policy gradient (DDPG) algorithm is developed. This framework swiftly achieves the target temperature with a temperature variance of 0.0116, which is only 5.7% of discrete frameworks. Particle tracking technology is used to analyze the evolution of flow and heat transfer under different control strategies. Due to the broader action space for exploration, continuous algorithms inherently excel in addressing delicate control issues. Furthermore, to address the deficiency that traditional DRL-based active flow control (AFC) frameworks require retraining with each goal changes and cost substantial computational resources to develop strategies for varied goals, the goal information is directly embedded into the agent, and the hindsight experience replay (HER) is employed to improve the training stability and sample efficiency. Then, a closed-loop continuous goal-oriented reinforcement learning (GoRL) framework based on the HER-DDPG algorithm is first proposed to perform real-time rapid temperature transition control and address multiple goals without retraining. Generalization tests show the proposed GoRL framework accomplishes multi-goal tasks with a temperature variance of 0.0121, which is only 5.8% of discrete frameworks, and consumes merely 11% of the computational resources compared with frameworks without goal-oriented capability. The GoRL framework greatly enhances the ability of AFC systems to handle multiple targets and time-varying goals.

  • Research Article
  • 10.3390/math13162542
Uncertainty-Aware Continual Reinforcement Learning via PPO with Graph Representation Learning
  • Aug 8, 2025
  • Mathematics
  • Dongjae Kim

Continual reinforcement learning (CRL) agents face significant challenges when encountering distributional shifts. This paper formalizes these shifts into two key scenarios, namely virtual drift (domain switches), where object semantics change (e.g., walls becoming lava), and concept drift (task switches), where the environment’s structure is reconfigured (e.g., moving from object navigation to a door key puzzle). This paper demonstrates that while conventional convolutional neural networks (CNNs) struggle to preserve relational knowledge during these transitions, graph convolutional networks (GCNs) can inherently mitigate catastrophic forgetting by encoding object interactions through explicit topological reasoning. A unified framework is proposed that integrates GCN-based state representation learning with a proximal policy optimization (PPO) agent. The GCN’s message-passing mechanism preserves invariant relational structures, which diminishes performance degradation during abrupt domain switches. Experiments conducted in procedurally generated MiniGrid environments show that the method significantly reduces catastrophic forgetting in domain switch scenarios. While showing comparable mean performance in task switch scenarios, our method demonstrates substantially lower performance variance (Levene’s test, p<1.0×10−10), indicating superior learning stability compared to CNN-based methods. By bridging graph representation learning with robust policy optimization in CRL, this research advances the stability of decision-making in dynamic environments and establishes GCNs as a principled alternative to CNNs for applications requiring stable, continual learning.

  • Book Chapter
  • 10.1007/978-3-031-06427-2_44
Avalanche RL: A Continual Reinforcement Learning Library
  • Jan 1, 2022
  • Nicoló Lucchesi + 3 more

Continual Reinforcement Learning (CRL) is a challenging setting where an agent learns to interact with an environment that is constantly changing over time (the stream of experiences). In this paper, we describe Avalanche RL, a library for Continual Reinforcement Learning which allows users to easily train agents on a continuous stream of tasks. Avalanche RL is based on PyTorch [23] and supports any OpenAI Gym [4] environment. Its design is based on Avalanche [16], one of the most popular continual learning libraries, which allow us to reuse a large number of continual learning strategies and improve the interaction between reinforcement learning and continual learning researchers. Additionally, we propose Continual Habitat-Lab, a novel benchmark and a high-level library which enables the usage of the photorealistic simulator Habitat-Sim [28] for CRL research. Overall, Avalanche RL attempts to unify under a common framework continual reinforcement learning applications, which we hope will foster the growth of the field.KeywordsContinual learningReinforcement learningReproducibility

  • Research Article
  • Cite Count Icon 16
  • 10.1007/s10489-020-01786-1
SLER: Self-generated long-term experience replay for continual reinforcement learning
  • Aug 7, 2020
  • Applied Intelligence
  • Chunmao Li + 4 more

Deep reinforcement learning has achieved significant success in various domains. However, it still faces a huge challenge when learning multiple tasks in sequence. This is because the interaction in a complex setting involves continual learning that results in the change in data distributions over time. A continual learning system should ensure that the agent acquires new knowledge without forgetting the previous one. However, catastrophic forgetting may occur as the new experience can overwrite previous experience due to limited memory size. The dual experience replay algorithm which retains previous experience is widely applied to reduce forgetting, but it cannot be applied in scalable tasks when the memory size is constrained. To alleviate the constrained by the memory size, we propose a new continual reinforcement learning algorithm called Self-generated Long-term Experience Replay (SLER). Our method is different from the standard dual experience replay algorithm, which uses short-term experience replay to retain current task experience, and the long-term experience replay retains all past tasks’ experience to achieve continual learning. In this paper, we first trained an environment sample model called Experience Replay Mode (ERM) to generate the simulated state sequence of the previous tasks for knowledge retention. Then combined the ERM with the experience of the new task to generate the simulation experience all previous tasks to alleviate forgetting. Our method can effectively decrease the requirement of memory size in multiple tasks, reinforcement learning. We show that our method in StarCraft II and the GridWorld environments performs better than the state-of-the-art deep learning method and achieve a comparable result to the dual experience replay method, which retains the experience of all the tasks.

More from: Canadian Journal of Civil Engineering
  • New
  • Research Article
  • 10.1139/cjce-2025-0075
Hierarchical Bayesian model for joint prediction of runway pavement metrics considering measurement uncertainty
  • Nov 10, 2025
  • Canadian Journal of Civil Engineering
  • Ahmad Altarabsheh + 4 more

  • Research Article
  • 10.1139/cjce-2025-0052
Development of a Gradient Boosting Model for Predicting the Rutting Factor of Graphene Oxide-Modified Bitumen at Medium and High Temperatures
  • Nov 4, 2025
  • Canadian Journal of Civil Engineering
  • Huong-Giang Thi Hoang + 3 more

  • Research Article
  • 10.1139/cjce-2024-0398
Vector combination of gaussian wind effect based on extreme value probability distribution
  • Oct 29, 2025
  • Canadian Journal of Civil Engineering
  • Haiwei Guan + 3 more

  • Research Article
  • 10.1139/cjce-2024-0321
Evaluation of high performance noise-reducing asphalt layers using accelerated pavement testing
  • Oct 23, 2025
  • Canadian Journal of Civil Engineering
  • Marie Alhajj + 8 more

  • Research Article
  • 10.1139/cjce-2025-0191
Enhancing Micro-Surfacing Asphalt with Granite Waste: A Study on Durability and Performance
  • Oct 20, 2025
  • Canadian Journal of Civil Engineering
  • Muthanna Hikmat Hamid + 1 more

  • Research Article
  • 10.1139/cjce-2025-0153
Ground Motion Selection and Scaling for Seismic Design and Assessment of Structures in Canada: A Critical Review
  • Oct 20, 2025
  • Canadian Journal of Civil Engineering
  • Mohammadreza Salek Faramarzi + 3 more

  • Research Article
  • 10.1139/cjce-2024-0567
Repeatability of Whole-Building Airtightness Testing in a Residential Wood-Framed Building Under Varying Weather Conditions
  • Oct 14, 2025
  • Canadian Journal of Civil Engineering
  • Benjamin Croyle + 2 more

  • Research Article
  • 10.1139/cjce-2024-0581
Cyclic Performance of CFST Column-to-Steel Beam Joints with a Novel Grooved End-plate Connection
  • Oct 14, 2025
  • Canadian Journal of Civil Engineering
  • Chunyan Gao + 3 more

  • Research Article
  • 10.1139/cjce-2025-0169
Mechanical and durability properties of reactive powder concrete exposed to acid, sulphate and fire
  • Oct 14, 2025
  • Canadian Journal of Civil Engineering
  • Fatih Ozalp + 5 more

  • Research Article
  • 10.1139/cjce-2025-0184
Comparative Study of Mechanical Performance and Durability of Concrete Incorporating Supplementary Cementitious Materials
  • Oct 14, 2025
  • Canadian Journal of Civil Engineering
  • Shanjida Khan + 3 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon