Continuous residual reinforcement learning for traffic signal control optimization
Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on continuous reinforcement learning. Although they have been successful in traffic signal control, they may become unstable and fail to converge to near-optimal solutions. We develop adaptive traffic signal controllers based on continuous residual reinforcement learning (CRL-TSC) that is more stable. The effect of three feature functions is empirically investigated in a microscopic traffic simulation. Furthermore, the effects of departing streets, more actions, and the use of the spatial distribution of the vehicles on the performance of CRL-TSCs are assessed. The results show that the best setup of the CRL-TSC leads to saving average travel time by 15% in comparison to an optimized fixed-time controller.
165
- 10.1109/itsc.2011.6083114
- Oct 1, 2011
446
- 10.1007/978-3-642-27645-3_1
- Jan 1, 2012
60901
- 10.1016/s0019-9958(65)90241-x
- Jun 1, 1965
- Information and Control
343
- 10.4324/9781315618159
- Dec 19, 2016
1759
- 10.1201/9781420050646.ptb6
- Jan 1, 1996
1024
- 10.1016/0191-2615(86)90012-3
- Oct 1, 1986
- Transportation Research Part B: Methodological
76
- 10.1016/j.eswa.2014.06.022
- Jun 17, 2014
- Expert Systems with Applications
32
- 10.1002/atr.1205
- Sep 10, 2012
- Journal of Advanced Transportation
256
- 10.1109/tits.2010.2091408
- Jun 1, 2011
- IEEE Transactions on Intelligent Transportation Systems
453
- 10.1109/tits.2013.2255286
- Sep 1, 2013
- IEEE Transactions on Intelligent Transportation Systems
- Research Article
50
- 10.1016/j.aei.2018.08.002
- Oct 1, 2018
- Advanced Engineering Informatics
Traffic signal optimization through discrete and continuous reinforcement learning with robustness analysis in downtown Tehran
- Book Chapter
6
- 10.4018/978-1-7998-5175-2.ch009
- Jan 1, 2020
Traffic optimization at an intersection, using real-time traffic information, presents an important focus of research into intelligent transportation systems. Several studies have proposed adaptive traffic lights control, which concentrates on determining green light length and sequence of the phases for each cycle in accordance with the real-time traffic detected. In order to minimize the waiting time at the intersection, the authors propose an intelligent traffic light using the information collected by a wireless sensors network installed in the road. The proposed algorithm is essentially based on two parameters: the waiting time in each lane and the length of its queue. The simulations show that the algorithm applied at a network of intersections improves significantly the average waiting time, queue length, fuel consumption, and CO2 emissions.
- Research Article
12
- 10.3390/electronics10192363
- Sep 28, 2021
- Electronics
Traffic congestion has several causes, including insufficient road capacity, unrestricted demand and improper scheduling of traffic signal phases. A great variety of efforts have been made to properly program such phases. Some of them are based on traditional transportation assumptions, and others are adaptive, allowing the system to learn the control law (signal program) from data obtained from different sources. Reinforcement Learning (RL) is a technique commonly used in previous research. However, properly determining the states and the reward is key to obtain good results and to have a real chance to implement it. This paper proposes and implements a traffic signal control system (TSCS), detailing its development stages: (a) Intelligent Transportation System (ITS) architecture design for the TSCS; (b) design and development of a system prototype, including an RL algorithm to minimize the vehicle queue at intersections, and detection and calculation of such queues by adapting a computer vision algorithm; and (c) design and development of system tests to validate operation of the algorithms and the system prototype. Results include the development of the tests for each module (vehicle queue measurement and RL algorithm) and real-time integration tests. Finally, the article presents a system simulation in the context of a medium-sized city in a developing country, showing that the proposed system allowed reduction of vehicle queues by 29%, of waiting time by 50%, and of lost time by 50%, when compared to fixed phase times in traffic signals.
- Research Article
10
- 10.1155/2020/6489027
- Feb 21, 2020
- Journal of Advanced Transportation
This study develops three measures to optimize the junction-tree-based reinforcement learning (RL) algorithm, which will be used for network-wide signal coordination. The first measure is to optimize the frequency of running the junction-tree algorithm (JTA) and the intersection status division. The second one is to optimize the JTA information transmission mode. The third one is to optimize the operation of a single intersection. A test network and three test groups are built to analyze the optimization effect. Group 1 is the control group, group 2 adopts the optimizations for the basic parameters and the information transmission mode, and group 3 adopts optimizations for the operation of a single intersection. Environments with different congestion levels are also tested. Results show that optimizations of the basic parameters and the information transmission mode can improve the system efficiency and the flexibility of the green light, and optimizing the operation of a single intersection can improve the efficiency of both the system and the individual intersection. By applying the proposed optimizations to the existing JTA-based RL algorithm, network-wide signal coordination can perform better.
- Research Article
133
- 10.1016/j.eswa.2022.116830
- Mar 17, 2022
- Expert Systems with Applications
Improvement of traffic signal control (TSC) efficiency has been found to lead to improved urban transportation and enhanced quality of life. Recently, the use of reinforcement learning (RL) in various areas of TSC has gained significant traction; thus, we conducted a systematic literature review as a systematic, comprehensive, and reproducible review to dissect all the existing research that applied RL in the network-level TSC domain, called as RL in NTSC or RL-NTSC for brevity. The review only targeted the network-level articles that tested the proposed methods in networks with two or more intersections. This review covers 160 peer-reviewed articles from 30 countries published from 1994 to March 2020. The goal of this study is to provide the research community with statistical and conceptual knowledge, summarize existence evidence, characterize RL applications in NTSC domains, explore all applied methods and major first events in the defined scope, and identify areas for further research based on the explored research problems in current research. We analyzed the extracted data from the included articles in the following seven categories: (i) publication and authors’ data, (ii) method identification and analysis, (iii) environment attributes and traffic simulation, (iv) application domains of RL-NTSC, (v) major first events of RL-NTSC and authors’ key statements, (vi) code availability, and (vii) evaluation. This paper provides a comprehensive view of the past 26 years of research on applying RL to NTSC. It also reveals the role of advancing deep learning methods in the revival of the research area, the rise of using non-commercial microscopic traffic simulators, a lack of interaction between traffic and transportation engineering practitioners and researchers, and a lack of proposal and creation of testbeds which can likely bring different communities together around common goals.
- Research Article
16
- 10.3390/app112210688
- Nov 12, 2021
- Applied Sciences
In order to deal with dynamic traffic flow, adaptive traffic signal controls using reinforcement learning are being studied. However, most of the related studies are difficult to apply to the real field considering only mathematical optimization. In this study, we propose a reinforcement learning-based signal optimization model with constraints. The proposed model maintains the sequence of typical signal phases and considers the minimum green time. The model was trained using Simulation of Urban MObility (SUMO), a microscopic traffic simulator. The model was evaluated in the virtual environment similar to a real road with multiple intersections connected. The performance of the proposed model was analyzed by comparing the delay and number of stops with a reinforcement learning model that did not consider constraints and a fixed-time model. In a peak hour, the proposed model reduced the delay from 3 min 15 s to 2 min 15 s and the number of stops from 11 to 4.7 compared to the fixed-time model.
- Research Article
125
- 10.1111/mice.12558
- May 20, 2020
- Computer-Aided Civil and Infrastructure Engineering
Abstract Inappropriate maintenance and rehabilitation strategies cause many problems such as maintenance budget waste, ineffective pavement distress treatments, and so forth. A method based on a machine learning algorithm called deep reinforcement learning (DRL) was developed in this presented research in order to learn better maintenance strategies that maximize the long‐term cost‐effectiveness in maintenance decision‐making through trial and error. In this method, each single‐lane pavement segment can have different treatments, and the long‐term maintenance cost‐effectiveness of the entire section is treated as the optimization goal. In the DRL algorithm, states are embodied by 42 parameters involving the pavement structures and materials, traffic loads, maintenance records, pavement conditions, and so forth. Specific treatments as well as do‐nothing are the actions. The reward is defined as the increased or decreased cost‐effectiveness after taking corresponding actions. Two expressways, the Ningchang and Zhenli expressways, were selected for a case study. The results show that the DRL model is capable of learning a better strategy to improve the long‐term maintenance cost‐effectiveness. By implementing the optimized maintenance strategies produced by the developed model, the pavement conditions can be controlled in an acceptable range.
- Research Article
57
- 10.3390/s20010137
- Dec 24, 2019
- Sensors
As traffic congestion in cities becomes serious, intelligent traffic signal control has been actively studied. Deep Q-Network (DQN), a representative deep reinforcement learning algorithm, is applied to various domains from fully-observable game environment to traffic signal control. Due to the effective performance of DQN, deep reinforcement learning has improved speeds and various DQN extensions have been introduced. However, most traffic signal control researches were performed at a single intersection, and because of the use of virtual simulators, there are limitations that do not take into account variables that affect actual traffic conditions. In this paper, we propose a cooperative traffic signal control with traffic flow prediction (TFP-CTSC) for a multi-intersection. A traffic flow prediction model predicts future traffic state and considers the variables that affect actual traffic conditions. In addition, for cooperative traffic signal control in multi-intersection, each intersection is modeled as an agent, and each agent is trained to take best action by receiving traffic states from the road environment. To deal with multi-intersection efficiently, agents share their traffic information with other adjacent intersections. In the experiment, TFP-CTSC is compared with existing traffic signal control algorithms in a 4 × 4 intersection environment. We verify our traffic flow prediction and cooperative method.
- Research Article
56
- 10.1016/j.eswa.2021.114580
- Jan 19, 2021
- Expert Systems with Applications
Hierarchical traffic signal optimization using reinforcement learning and traffic prediction with long-short term memory
- Research Article
3
- 10.4018/ijssmet.290330
- Nov 26, 2021
- International Journal of Service Science, Management, Engineering, and Technology
In this paper, we detail and evaluate a coordinated approach to determining the sequence and duration of green lights at several intersections as part of the Intelligent Transportation System. We present the architecture of a wireless network used to track variations in adjacent intersections. Our algorithm exploits the collected data to determine the sequence of the green lights based on three objectives: (i) reduce the length of queues in the intersection, (ii) prioritize sending vehicle flows to intersections with lower traffic density than the most congested, (iii) synchronize traffic signals between adjacent intersections to create green waves. Traffic simulations have been simulated by the SUMO traffic simulator, they show that our solution manages to react to traffic change and reduce waiting time compared to isolated control strategies.
- Research Article
50
- 10.1016/j.aei.2018.08.002
- Oct 1, 2018
- Advanced Engineering Informatics
Traffic signal optimization through discrete and continuous reinforcement learning with robustness analysis in downtown Tehran
- Research Article
2
- 10.3303/cet1972016
- Jan 31, 2019
- Chemical engineering transactions
As the population and vehicle ownership increase, emission of pollutants is also increasing. The percentage of GHG emission by transportation sector is about 21 % in 2015 (OECD), and this may be caused by frequent stop-and-go phenomenon or delay time of vehicles in signalized intersection. Generally, these could be minimised by driving in constant speed or decreasing the delay times with an efficient traffic signal control. On the other hand, researches try to decrease vehicles’ delay time and to exclude the unnecessary stop-and-go phenomenon in an urban signalized intersection with an advent of V2X (Vehicle-to-Everything) technology development. Especially, in traditional pre-timed traffic signal control situation, even the autonomous vehicles would be impossible to exhibit their own maximum performance. Thus, the development of the traffic signal control system could have effects not only on the traffic flow but also on environmental aspects, which optimizes the signalized traffic flow based on the real-time vehicle information. In this research, on the premise of V2X environment, changes in traffic flow and the emission are analysed based on microscopic traffic information. In specific, the reinforcement learning model is constructed based on Deep Learning which learns the real-time traffic information and displays the optimal traffic signal. The performance of the system was analysed through microscopic traffic simulator - Vissim. The proposed system is expected to contribute on analysing the traffic flow and the environmental effects. Also, it is expected to contribute on constructing the green smart cities with an advent of autonomous vehicle operation in future V2X environment.
- Research Article
- 10.1609/aaai.v39i28.35251
- Apr 11, 2025
- Proceedings of the AAAI Conference on Artificial Intelligence
Continual reinforcement learning (CRL) is the study of optimal strategies for maximizing rewards in sequential environments that change over time. This is particularly crucial in domains such as robotics, where the operational environment is inherently dynamic and subject to continual change. Nevertheless, research in this area has thus far concentrated on off-policy algorithms with replay buffers that are capable of amortizing the impact of distribution shifts. Such an approach is not feasible with on-policy reinforcement learning algorithms that learn solely from the data obtained from the current policy. In this paper, we examine the performance of proximal policy optimization (PPO), a prevalent on-policy reinforcement learning (RL) algorithm, in a classical CRL benchmark. Our findings suggest that the current methods are suboptimal in terms of average performance. Nevertheless, they demonstrate encouraging competitive outcomes with respect to forward transfer and forgetting metrics. This highlights the need for further research into continual on-policy reinforcement learning. The source code is available at https://github.com/Teddy298/continualworld-ppo.
- Book Chapter
7
- 10.1007/978-3-319-56994-9_44
- Aug 20, 2017
With the rise of rapid urbanization around the world, a majority of countries have experienced a significant increase in traffic congestion. The negative impacts of this change have resulted in a number of serious and adverse effects, not only regarding the quality of daily life at an individual level but also for nations’ economic growth. Thus, the importance of traffic congestion management is well recognized. Adaptive real-time traffic signal control is effective for traffic congestion management. In particular, adaptive control with reinforcement learning (RL) is a promising technique that has recently been introduced in the field to better manage traffic congestion. Traditionally, most studies on traffic signal control have used centralized reinforcement learning, whose computation inefficiency prevents it from being employed for large traffic networks. In this paper, we propose a computationally cost-effective distributed algorithm, namely, a decentralized fuzzy reinforcement learning approach, to deal with problems related to the exponentially growing number of possible states and actions in RL models for a large-scale traffic signal control network. More specifically, the traffic density at each intersection is first mapped to four different fuzzy sets (i.e., low, medium, high, and extremely high). Next, two different kinds of algorithms, greedy and neighborhood approximate Q-learning (NAQL), are adaptively selected, based on the real-time, fuzzified congestion levels. To further reduce computational costs and the number of state-action pairs in the RL model, coordination and communication between the intersections are confined within a single neighborhood, i.e., the controlled intersection with its immediate neighbor intersections, for the NAQL algorithm. Finally, we conduct several numerical experiments to verify the efficiency and effectiveness of our approach. The results demonstrate that the decentralized fuzzy reinforcement learning algorithm achieves comparable results when measured against traditional heuristic-based algorithms. In addition, the decentralized fuzzy RL algorithm generates more adaptive control rules for the underlying dynamics of large-scale traffic networks. Thus, the proposed approach sheds new light on how to provide further improvements to a networked traffic signal control system for real-time traffic congestion.
- Conference Article
2
- 10.29007/t895
- Jun 25, 2018
In this paper, Chula-Sathorn SUMO Simulator (Chula-SSS) has been proposed as an educational tool for traffic police and traffic engineers. The tool supports our framework to develop actuated traffic signal control logics in order to resolve urban traffic congestion. The framework design aims to incorporate the tacit traffic control expertise of human operators by trying to extract and extend the human-level intelligence in actuating logically traffic signal controls. In this regard, a new software package has been developed for the microscopic-mobility computer simulation capability of the SUMO (Simulation of Urban MObility) platform. Using the SUMO TraCI, our package implements the graphical user interface (GUI) of actual traffic light signal control panel, recently introduced in Bangkok (Thailand) for traffic police deployment in the Chulalongkorn University’s Sathorn Model project under the umbrella of Sustainable Mobility Project 2.0 of the World Business Council for Sustainable Development (WBCSD). The traffic light signal control panel GUI modules can communicate via TraCI in real-time to SUMO in order both to retrieve the raw traffic sensor data emulated within SUMO and to send the desired traffic light signal phase manually entered via GUI by the module users. Each of the users could play a role of traffic police in charge of actuating the traffic light signal at each of the controllable intersections. To demonstrate this framework, Chula-SSS has been implemented with the calibrated SUMO dataset of Sathorn Road network area. This area is one of the most critical areas in Bangkok due to the immense traffic volume with daily recurring traffic bottlenecks and network deadlocks. The simulation comprises of 2375 intersection nodes, 4517 edges, 10 main signalised intersections. The provided datasets with Chula-SSS cover both the morning and evening rush-hour periods each with over 55,000 simulated vehicles based on the comprehensive traffic data collection and SUMO mobility model calibration. It is hoped that the herein developed framework and software package can be not only useful for our Thailand case, but also readily extensible to those developing and least- developed countries where traffic signal controls rely on human operations, not yet fully automated by an area traffic controller. In those cases, the framework proposed herein is expectedly an enabling technology for the human operators to practice, learn, and evolve their traffic signal control strategies systematically.
- Research Article
6
- 10.1177/1729881420911491
- Mar 1, 2020
- International Journal of Advanced Robotic Systems
This article introduces a continuous reinforcement learning framework to enable online adaptation of multi-objective optimization functions for guiding a mobile robot to move in changing dynamic environments. The robot with this framework can continuously learn from multiple or changing environments where it encounters different numbers of obstacles moving in unknown ways at different times. Using both planned trajectories from a real-time motion planner and already executed trajectories as feedback observations, our reinforcement learning agent enables the robot to adapt motion behaviors to environmental changes. The agent contains a Q network connected to a long short-term memory network. The proposed framework is tested in both simulations and real robot experiments over various, dynamically varied task environments. The results show the efficacy of online continuous reinforcement learning for quick adaption to different, unknown, and dynamic environments.
- Conference Article
4
- 10.1109/icdmw58026.2022.00011
- Nov 1, 2022
Traffic flow prediction is an important part of smart transportation. The goal is to predict future traffic conditions based on historical data recorded by sensors and the traffic net-work. As the city continues to build, parts of the transportation network will be added or modified. How to accurately predict expanding and evolving long-term streaming networks is of great significance. To this end, we propose a new simulation-based criterion that considers teaching autonomous agents to mimic sensor patterns, planning their next visit based on the sensor's profile (e.g., traffic, speed, occupancy). The data recorded by the sensor is most accurate when the agent can perfectly simulate the sensor's activity pattern. We propose to formulate the problem as a continuous reinforcement learning task, where the agent is the next flow value predictor, the action is the next time-series flow value in the sensor, and the environment state is a dynamically fused representation of the sensor and transportation network. Actions taken by the agent change the environment, which in turn forces the agent's mode to update, while the agent further explores changes in the dynamic traffic network, which helps the agent predict its next visit more accurately. Therefore, we develop a strategy in which sensors and traffic networks update each other and incorporate temporal context to quantify state representations evolving over time. Along these lines, we propose streaming traffic flow prediction based on continuous reinforcement learning model (ST-CRL), a kind of predictive model based on reinforcement learning and continuous learning, and an analytical algorithm based on KL divergence that cleverly incorporates long-term novel patterns into model induction. Second, we introduce a prioritized experience replay strategy to consolidate and aggregate previously learned core knowledge into the model. The proposed model is able to continuously learn and predict as the traffic flow network expands and evolves over time. Extensive experiments show that the algorithm has great potential in predicting long-term streaming media networks, while achieving data privacy protection to a certain extent.
- Research Article
- 10.3390/en17235876
- Nov 22, 2024
- Energies
In recent research addressing energy arbitrage with energy storage systems (ESSs), discrete reinforcement learning (RL) has often been employed, while the underlying reasons for this preference have not been explicitly clarified. This paper aims to elucidate why discrete RL tends to be more suitable than continuous RL for energy arbitrage problems. When using continuous RL, the charging and discharging actions determined by the agent often exceed the physical limits of the ESS, necessitating clipping to the boundary values. This introduces a critical issue where the learned actions become stuck at the state of charge (SoC) boundaries, hindering effective learning. Although recent advancements in constrained RL offer potential solutions, their application often results in overly conservative policies, preventing the full utilization of ESS capabilities. In contrast, discrete RL, while lacking in granular control, successfully avoids these two key challenges, as demonstrated by simulation results showing superior performance. Additionally, it was found that, due to its characteristics, discrete RL more easily drives the ESS towards fully charged or fully discharged states, thereby increasing the utilization of the storage system. Our findings provide a solid justification for the prevalent use of discrete RL in recent studies involving energy arbitrage with ESSs, offering new insights into the strategic selection of RL methods in this domain. Looking ahead, improving performance will require further advancements in continuous RL methods. This study provides valuable direction for future research in continuous RL, highlighting the challenges and potential strategies to overcome them to fully exploit ESS capabilities.
- Conference Article
2
- 10.1109/cec.2019.8790300
- Jun 1, 2019
The application of multi-agent technology in urban traffic network control makes the traffic signal control has ability of adaptive adjustment. With the changing of traffic flow in the road, it can adjust the important parameters such as the offset, green ratio and public cycle of signal lights in real time, which can effectively reduce traffic congestion and improve the vehicle capacity of the urban traffic network. In this paper, a three level multi-agent control framework is used. The objective functions are the optimization model of green ratio delay, public cycle and offset delay. Fireworks algorithm is used to solve the modeling optimization problem. The simulation results show that the adaptive traffic network signal control can significantly reduce the total delay time of the traffic network and improve the road utilization rate with the continuous change of traffic flow. Compared with the traditional traffic signal timing control, the adaptive traffic network signal control has great advantages and overcomes the disadvantages of traditional traffic signal control. At the same time, the fireworks algorithm shows significant performance in solving the optimizing model of the traffic network.
- Research Article
338
- 10.1109/tits.2006.874716
- Sep 1, 2006
- IEEE Transactions on Intelligent Transportation Systems
Real-time traffic signal control is an integral part of the urban traffic control system, and providing effective real-time traffic signal control for a large complex traffic network is an extremely challenging distributed control problem. This paper adopts the multiagent system approach to develop distributed unsupervised traffic responsive signal control models, where each agent in the system is a local traffic signal controller for one intersection in the traffic network. The first multiagent system is developed using hybrid computational intelligent techniques. Each agent employs a multistage online learning process to update and adapt its knowledge base and decision-making mechanism. The second multiagent system is developed by integrating the simultaneous perturbation stochastic approximation theorem in fuzzy neural networks (NN). The problem of real-time traffic signal control is especially challenging if the agents are used for an infinite horizon problem, where online learning has to take place continuously once the agent-based traffic signal controllers are implemented into the traffic network. A comprehensive simulation model of a section of the Central Business District of Singapore has been developed using PARAMICS microscopic simulation program. Simulation results show that the hybrid multiagent system provides significant improvement in traffic conditions when evaluated against an existing traffic signal control algorithm as well as the SPSA-NN-based multiagent system as the complexity of the simulation scenario increases. Using the hybrid NN-based multiagent system, the mean delay of each vehicle was reduced by 78% and the mean stoppage time, by 85% compared to the existing traffic signal control algorithm. The promising results demonstrate the efficacy of the hybrid NN-based multiagent system in solving large-scale traffic signal control problems in a distributed manner
- Conference Article
7
- 10.1145/3319619.3322044
- Jul 13, 2019
In continual learning, an agent is exposed to a changing environment, requiring it to adapt during execution time. While traditional reinforcement learning (RL) methods have shown impressive results in various domains, there has been less progress in addressing the challenge of continual learning. Current RL approaches do not allow the agent to adapt during execution but only during a dedicated training phase. Here we study the problem of continual learning in a 2D bipedal walker domain, in which the legs of the walker grow over its lifetime, requiring the agent to adapt. The introduced approach combines neuroevolution, to determine the starting weights of a deep neural network, and a version of deep reinforcement learning that is continually running during execution time. The proof-of-concept results show that the combined approach gives a better generalisation performance when compared to evolution or reinforcement learning alone. The hybridization of reinforcement learning and evolution opens up exciting new research directions for continually learning agents that can benefit from suitable priors determined by an evolutionary process.
- Research Article
1
- 10.1063/5.0239718
- Nov 1, 2024
- Physics of Fluids
Forced convection heat transfer control offers considerable engineering value. This study focuses on a two-dimensional rapid temperature control problem in a heat exchange system, where a cylindrical heat source is immersed in a narrow cavity. First, a closed-loop continuous deep reinforcement learning (DRL) framework based on the deep deterministic policy gradient (DDPG) algorithm is developed. This framework swiftly achieves the target temperature with a temperature variance of 0.0116, which is only 5.7% of discrete frameworks. Particle tracking technology is used to analyze the evolution of flow and heat transfer under different control strategies. Due to the broader action space for exploration, continuous algorithms inherently excel in addressing delicate control issues. Furthermore, to address the deficiency that traditional DRL-based active flow control (AFC) frameworks require retraining with each goal changes and cost substantial computational resources to develop strategies for varied goals, the goal information is directly embedded into the agent, and the hindsight experience replay (HER) is employed to improve the training stability and sample efficiency. Then, a closed-loop continuous goal-oriented reinforcement learning (GoRL) framework based on the HER-DDPG algorithm is first proposed to perform real-time rapid temperature transition control and address multiple goals without retraining. Generalization tests show the proposed GoRL framework accomplishes multi-goal tasks with a temperature variance of 0.0121, which is only 5.8% of discrete frameworks, and consumes merely 11% of the computational resources compared with frameworks without goal-oriented capability. The GoRL framework greatly enhances the ability of AFC systems to handle multiple targets and time-varying goals.
- Research Article
- 10.3390/math13162542
- Aug 8, 2025
- Mathematics
Continual reinforcement learning (CRL) agents face significant challenges when encountering distributional shifts. This paper formalizes these shifts into two key scenarios, namely virtual drift (domain switches), where object semantics change (e.g., walls becoming lava), and concept drift (task switches), where the environment’s structure is reconfigured (e.g., moving from object navigation to a door key puzzle). This paper demonstrates that while conventional convolutional neural networks (CNNs) struggle to preserve relational knowledge during these transitions, graph convolutional networks (GCNs) can inherently mitigate catastrophic forgetting by encoding object interactions through explicit topological reasoning. A unified framework is proposed that integrates GCN-based state representation learning with a proximal policy optimization (PPO) agent. The GCN’s message-passing mechanism preserves invariant relational structures, which diminishes performance degradation during abrupt domain switches. Experiments conducted in procedurally generated MiniGrid environments show that the method significantly reduces catastrophic forgetting in domain switch scenarios. While showing comparable mean performance in task switch scenarios, our method demonstrates substantially lower performance variance (Levene’s test, p<1.0×10−10), indicating superior learning stability compared to CNN-based methods. By bridging graph representation learning with robust policy optimization in CRL, this research advances the stability of decision-making in dynamic environments and establishes GCNs as a principled alternative to CNNs for applications requiring stable, continual learning.
- Book Chapter
- 10.1007/978-3-031-06427-2_44
- Jan 1, 2022
Continual Reinforcement Learning (CRL) is a challenging setting where an agent learns to interact with an environment that is constantly changing over time (the stream of experiences). In this paper, we describe Avalanche RL, a library for Continual Reinforcement Learning which allows users to easily train agents on a continuous stream of tasks. Avalanche RL is based on PyTorch [23] and supports any OpenAI Gym [4] environment. Its design is based on Avalanche [16], one of the most popular continual learning libraries, which allow us to reuse a large number of continual learning strategies and improve the interaction between reinforcement learning and continual learning researchers. Additionally, we propose Continual Habitat-Lab, a novel benchmark and a high-level library which enables the usage of the photorealistic simulator Habitat-Sim [28] for CRL research. Overall, Avalanche RL attempts to unify under a common framework continual reinforcement learning applications, which we hope will foster the growth of the field.KeywordsContinual learningReinforcement learningReproducibility
- Research Article
16
- 10.1007/s10489-020-01786-1
- Aug 7, 2020
- Applied Intelligence
Deep reinforcement learning has achieved significant success in various domains. However, it still faces a huge challenge when learning multiple tasks in sequence. This is because the interaction in a complex setting involves continual learning that results in the change in data distributions over time. A continual learning system should ensure that the agent acquires new knowledge without forgetting the previous one. However, catastrophic forgetting may occur as the new experience can overwrite previous experience due to limited memory size. The dual experience replay algorithm which retains previous experience is widely applied to reduce forgetting, but it cannot be applied in scalable tasks when the memory size is constrained. To alleviate the constrained by the memory size, we propose a new continual reinforcement learning algorithm called Self-generated Long-term Experience Replay (SLER). Our method is different from the standard dual experience replay algorithm, which uses short-term experience replay to retain current task experience, and the long-term experience replay retains all past tasks’ experience to achieve continual learning. In this paper, we first trained an environment sample model called Experience Replay Mode (ERM) to generate the simulated state sequence of the previous tasks for knowledge retention. Then combined the ERM with the experience of the new task to generate the simulation experience all previous tasks to alleviate forgetting. Our method can effectively decrease the requirement of memory size in multiple tasks, reinforcement learning. We show that our method in StarCraft II and the GridWorld environments performs better than the state-of-the-art deep learning method and achieve a comparable result to the dual experience replay method, which retains the experience of all the tasks.
- New
- Research Article
- 10.1139/cjce-2025-0075
- Nov 10, 2025
- Canadian Journal of Civil Engineering
- Research Article
- 10.1139/cjce-2025-0052
- Nov 4, 2025
- Canadian Journal of Civil Engineering
- Research Article
- 10.1139/cjce-2024-0398
- Oct 29, 2025
- Canadian Journal of Civil Engineering
- Research Article
- 10.1139/cjce-2024-0321
- Oct 23, 2025
- Canadian Journal of Civil Engineering
- Research Article
- 10.1139/cjce-2025-0191
- Oct 20, 2025
- Canadian Journal of Civil Engineering
- Research Article
- 10.1139/cjce-2025-0153
- Oct 20, 2025
- Canadian Journal of Civil Engineering
- Research Article
- 10.1139/cjce-2024-0567
- Oct 14, 2025
- Canadian Journal of Civil Engineering
- Research Article
- 10.1139/cjce-2024-0581
- Oct 14, 2025
- Canadian Journal of Civil Engineering
- Research Article
- 10.1139/cjce-2025-0169
- Oct 14, 2025
- Canadian Journal of Civil Engineering
- Research Article
- 10.1139/cjce-2025-0184
- Oct 14, 2025
- Canadian Journal of Civil Engineering
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.