Reinforcement learning for residential energy storage management at the neighborhood scale: A multi-benchmark evaluation
Reinforcement learning for residential energy storage management at the neighborhood scale: A multi-benchmark evaluation
- Research Article
48
- 10.1016/j.egyai.2022.100202
- Sep 11, 2022
- Energy and AI
Real-world challenges for multi-agent reinforcement learning in grid-interactive buildings
- Conference Article
4
- 10.23919/acc53348.2022.9867239
- Jun 8, 2022
District cooling energy plants (DCEPs) consisting of chillers, cooling towers, and thermal energy storage (TES) systems consume a considerable amount of electricity. Optimizing the scheduling of the TES and chillers to take advantage of time-varying electricity price is a challenging optimal control problem. The classical method, model predictive control (MPC), requires solving a high dimensional mixed-integer nonlinear program (MINLP) because of the on/off actuation of the chillers and charge/discharge of TES, which are computationally challenging. RL is an attractive alternative: the real time control computation is a low-dimensional optimization problem that can be easily solved. However, the performance of an RL controller depends on many design choices.In this paper, we propose a Q-learning based reinforcement learning (RL) controller for this problem. Numerical simulation results show that the proposed RL controller is able to reduce energy cost over a rule-based baseline controller by approximately 8%, comparable to savings reported in the literature with MPC for similar DCEPs. We describe the design choices in the RL controller, including basis functions, reward function shaping, and learning algorithm parameters. Compared to existing work on RL for DCEPs, the proposed controller is designed for continuous state and actions spaces.
- Research Article
117
- 10.1016/j.applthermaleng.2023.120430
- Mar 27, 2023
- Applied Thermal Engineering
Comparison of reinforcement learning and model predictive control for building energy system optimization
- Research Article
7
- 10.1007/s41870-017-0074-z
- Jan 3, 2018
- International Journal of Information Technology
This work proposes Lyapunov theory based Fuzzy/Neural Reinforcement Learning (RL) controllers with guaranteed stability. We look at ways in which Lyapunov theory could be used to produce RL controllers wherein the control action is hybrid or Lyapunov constrained, resulting in self learning controllers that are optimal, effective and stable. Fuzzy systems and Neural networks have been used as generic function approximators to handle exponential rise in computational burden that arises when RL is extended to high dimensional/continuous state-action spaces. We propose two distinct approaches: (i) Hybridized fuzzy Lyapunov RL control by combining Fuzzy Q Learning methodology in a Lyapunov setting thereby guarantying stability, and (ii) Lyapunov constrained Neural RL control wherein the controller’s action set is constrained to satisfy Lyapunov stability condition. Incorporating Lyapunov theory based element in the action generation mechanism of an RL based controller guarantees stability. We implement our soft computing based Lyapunov RL on the two benchmark non linear problems: (a) inverted pendulum and (b) cart pole balancing. The results obtained and associated comparison with baseline Neural/Fuzzy Q-Learning based controllers bring out the advantage of our Lyapunov RL based scheme.
- Research Article
- 10.1002/asjc.3800
- Jul 22, 2025
- Asian Journal of Control
This paper presents a risk‐aware safe reinforcement learning (RL) control design for stochastic discrete‐time linear systems. Rather than using a safety certifier to myopically intervene with the RL controller, a risk‐informed safe controller is also learned besides the RL controller, and the RL and safe controllers are combined together. Several advantages come along with this approach: (1) High‐confidence safety can be certified without relying on a high‐fidelity system model and using limited data available, (2) myopic interventions and convergence to an undesired equilibrium can be avoided by deciding on the contribution of two stabilizing controllers, and (3) highly efficient and computationally tractable solutions can be provided by optimizing over a scalar decision variable and linear programming polyhedral sets. To learn safe controllers with a large invariant set, piecewise affine controllers are learned instead of linear controllers. To this end, the closed‐loop system is first represented using collected data, a decision variable, and noise. The effect of the decision variable on the variance of the safe violation of the closed‐loop system is formalized. The decision variable is then designed such that the probability of safety violation for the learned closed‐loop system is minimized. It is shown that this control‐oriented approach reduces the data requirements and can also reduce the variance of safety violations. Finally, to integrate the safe and RL controllers, a new data‐driven interpolation technique is introduced. This method aims to maintain the RL agent's optimal implementation while ensuring its safety within environments characterized by noise. The study concludes with a simulation example that serves to validate the theoretical results.
- Research Article
1
- 10.1115/1.4064023
- Dec 4, 2023
- ASME Journal of Engineering for Sustainable Buildings and Cities
We consider the problem of optimal control of district cooling energy plants (DCEPs) consisting of multiple chillers, a cooling tower, and a thermal energy storage (TES), in the presence of time-varying electricity prices. A straightforward application of model predictive control (MPC) requires solving a challenging mixed-integer nonlinear program (MINLP) because of the on/off of chillers and the complexity of the DCEP model. Reinforcement learning (RL) is an attractive alternative since its real-time control computation is much simpler. But designing an RL controller is challenging due to myriad design choices and computationally intensive training. In this paper, we propose an RL controller and an MPC controller for minimizing the electricity cost of a DCEP, and compare them via simulations. The two controllers are designed to be comparable in terms of objective and information requirements. The RL controller uses a novel Q-learning algorithm that is based on least-squares policy iteration. We describe the design choices for the RL controller, including the choice of state space and basis functions, that are found to be effective. The proposed MPC controller does not need a mixed-integer solver for implementation, but only a nonlinear program (NLP) solver. A rule-based baseline controller is also proposed to aid in comparison. Simulation results show that the proposed RL and MPC controllers achieve similar savings over the baseline controller, about 17%.
- Research Article
4
- 10.1109/access.2025.3548990
- Jan 1, 2025
- IEEE Access
Effective energy management in microgrids with renewable energy sources is crucial for maintaining system stability while minimizing operational costs. However, traditional Reinforcement Learning (RL) controllers often encounter challenges, including long training time and instability during the training process. This study introduces a novel approach that integrates Transfer Learning (TL) techniques with RL controllers to address these issues. By using synthetic datasets generated by advanced forecasting models, such as ResNet18+BiLSTM, the proposed method pre-trains RL agents, embedding domain knowledge to enhance performance. The results, based on one year of operational data, show that TL-enhanced RL controllers significantly reduce cumulative operation costs and system imbalance, achieving up to a 62.63% reduction in costs and an 80% improvement in balance compared to baseline models. Furthermore, the proposed method improves initial performance and shortens the training duration needed to reach operational thresholds. This approach demonstrates the potential of combining TL with RL to develop efficient, cost-effective solutions for real-time energy management in complex power systems.
- Conference Article
12
- 10.1109/icopesa56898.2023.10141314
- Feb 24, 2023
Various linear and nonlinear controllers have been developed to improve the dynamic performance of DC-DC converters. Most controllers can only be designed on the basis of understanding the mathematical model of DC-DC converter, but the inherent nonlinear and time-varying characteristics of DC-DC switching converter make it difficult to complete the precise modeling, so the model-based control design is complex and the control performance is limited. In order to overcome the problem, this paper proposes a reinforcement learning (RL) controller based on the twin-delayed deep deterministic policy gradient (TD3) algorithm. This controller does not need the model of the switching converter. The converter will be regarded as a black box model, the policy approximation function (policy neural network) can be trained and learned by constructing a Markov decision process interacting with the black box model in the control system, and the optimal control action can be output. The RL controller is developed based on actor critic architecture, and a TD3 algorithm with higher learning efficiency is proposed to improve the control performance of the RL controller. The proposed RL controller based on TD3 algorithm is compared with the traditional PI controller. The simulation results show that the RL controller has better dynamic performance when the converter starts and the load step changes.
- Research Article
- 10.1088/1742-6596/3140/5/052005
- Nov 1, 2025
- Journal of Physics: Conference Series
This study examines how different driver travel patterns impact electric vehicle (EV) energy flexibility potential in Montreal, Quebec. Using Canadian Time Use Survey data, we identified three distinct driver travel patterns: Normal Work Hours, Extended Work Hours, and Non-Commuter. We implemented a decentralized reinforcement learning (RL) approach to control EV charging across ten households, aiming to minimize electricity consumption during peak hours. The RL controller was benchmarked against a rule-based controller (RBC) that charges EVs immediately upon connection. Results demonstrate that Non-Commuter patterns provided the greatest flexibility potential, with the RL controller able to provide 2204 kWh discharged back to the grid across all 10 households during peak periods while the RBC consumed 2602 kWh during the same period. These actions translated to 15% cost savings for the RL controller as opposed to 50% increase in cost with the RBC for the Non-Commuter driver pattern. The RL controller reduced electricity consumption during peak periods significantly across all driver patterns while maintaining 97% departure state-of-charge levels, thus highlighting the significant energy flexibility potential. The findings provide valuable insights for grid operators and policymakers on how mobility patterns affect demand response potential and highlight the importance of time-varying electricity rates in incentivizing vehicle-to-grid participation.
- Research Article
1
- 10.1016/j.egyai.2026.100727
- May 1, 2026
- Energy and AI
Autonomous digital twin framework for gas turbine combined cycle control loops: Comparative study of proportional-integral control, reinforcement learning, and reinforcement learning with agents
- Dissertation
- 10.37099/mtu.dc.etdr/1306
- Jan 1, 2021
Wave energy has great potential but has a high levelized energy cost compared to other renewable energy sources (e.g., solar and wind). Improving the buoy control performance in the wave-to-wire energy conversion would be a straightforward way to increase the wave energy conversion efficiency and decrease the wave energy levelized cost. To improve the buoy control schemes design, the assessment of the state of the art controls and the study of the power take-off (PTO) power loss model are demanded. This dissertation starts with the basic dynamics of the wave energy converter (WEC) buoy and electrical PTO, introduces essential mechanics of the WEC wave-to-wire model composing. Furthermore, the details of the electrical machine control methodologies and the state-of-the-art buoy control schemes are included as well to generate the WEC wave-to-wire control frame. According to the wave-to-wire dynamic model, one fast evaluation methodology for energy extraction potential assessment is introduced. The sea-state-output-power matrices are generated while considering various electrical PTO effects and constraints to obtain electrical output power directly instead of relying on dynamic models propagation. Based upon the fast evaluation methodology, 16-years ground truth ocean wave data is analyzed for solving energy storage system (ESS) sizing problems for off-shore applications. To improve the ESS design reliability, the statistical study is applied as well. To further study the electrical PTO power loss model, the PTO dynamic model is implemented xxxi to the WEC buoy dynamic model. Several state-of-the-art WEC buoy control schemes are applied to the device and the performance is assessed. While considering the PTO copper losses, operation constraints, and the PTO nonlinear power loss model, the results show that the buoy control schemes will be affected significantly by the actual PTO dynamics. By studying the PTO operation efficiency, the possible solutions for improving the WEC energy extraction performance are provided. Designing the control for the wave-to-wire from a global point of view is demanded. So in the last chapter, the machine reinforcement learning (RL) control for the WEC wave-to-wire modeling is proposed, and the results are compared to other model-based controls, which turns out that the RL control can achieve much higher output power with better power qualities and it is robust for various wave conditions. According to the research results, a future study plan is discussed as well in the last.
- Research Article
35
- 10.3390/en14102933
- May 19, 2021
- Energies
Demand Response (DR) programs represent an effective way to optimally manage building energy demand while increasing Renewable Energy Sources (RES) integration and grid reliability, helping the decarbonization of the electricity sector. To fully exploit such opportunities, buildings are required to become sources of energy flexibility, adapting their energy demand to meet specific grid requirements. However, in most cases, the energy flexibility of a single building is typically too small to be exploited in the flexibility market, highlighting the necessity to perform analysis at a multiple-building scale. This study explores the economic benefits associated with the implementation of a Reinforcement Learning (RL) control strategy for the participation in an incentive-based demand response program of a cluster of commercial buildings. To this purpose, optimized Rule-Based Control (RBC) strategies are compared with a RL controller. Moreover, a hybrid control strategy exploiting both RBC and RL is proposed. Results show that the RL algorithm outperforms the RBC in reducing the total energy cost, but it is less effective in fulfilling DR requirements. The hybrid controller achieves a reduction in energy consumption and energy costs by respectively 7% and 4% compared to a manually optimized RBC, while fulfilling DR constraints during incentive-based events.
- Research Article
127
- 10.1145/3358230
- Oct 8, 2019
- ACM Transactions on Embedded Computing Systems
This paper proposes a new forward reachability analysis approach to verify safety of cyber-physical systems (CPS) with reinforcement learning controllers. The foundation of our approach lies on two efficient, exact and over-approximate reachability algorithms for neural network control systems using star sets, which is an efficient representation of polyhedra. Using these algorithms, we determine the initial conditions for which a safety-critical system with a neural network controller is safe by incrementally searching a critical initial condition where the safety of the system cannot be established. Our approach produces tight over-approximation error and it is computationally efficient, which allows the application to practical CPS with learning enable components (LECs). We implement our approach in NNV, a recent verification tool for neural networks and neural network control systems, and evaluate its advantages and applicability by verifying safety of a practical Advanced Emergency Braking System (AEBS) with a reinforcement learning (RL) controller trained using the deep deterministic policy gradient (DDPG) method. The experimental results show that our new reachability algorithms are much less conservative than existing polyhedra-based approaches. We successfully determine the entire region of the initial conditions of the AEBS with the RL controller such that the safety of the system is guaranteed, while a polyhedra-based approach cannot prove the safety properties of the system.
- Research Article
8
- 10.3390/drones8110660
- Nov 9, 2024
- Drones
Most literature has extensively discussed reinforcement learning (RL) for controlling rotorcraft drones during flight for traversal tasks. However, most studies lack adequate details regarding the design of reward and punishment mechanisms, and there is a limited exploration of the feasibility of applying reinforcement learning in actual flight control following simulation experiments. Consequently, this study focuses on the exploration of reward and punishment design and state input for RL. The simulation environment is constructed using AirSim and Unreal Engine, with onboard camera footage serving as the state input for reinforcement learning. The research investigates three RL algorithms suitable for discrete action training. The Deep Q Network (DQN), Advantage Actor–Critic (A2C), and Proximal Policy Optimization (PPO) were combined with three different reward and punishment design mechanisms for training and testing. The results indicate that employing the PPO algorithm along with a continuous return method as the reward mechanism allows for effective convergence during the training process, achieving a target traversal rate of 71% in the testing environment. Furthermore, this study proposes integrating the YOLOv7-tiny object detection (OD) system to assess the applicability of reinforcement learning in real-world settings. Unifying the state inputs of simulated and OD environments and replacing the original simulated image inputs with a maximum dual-target approach, the experimental simulation achieved a target traversal rate of 52% ultimately. In summary, this research formulates a set of logical frameworks for an RL reward and punishment design deployed with real-time Yolo’s OD implementation synergized as a useful aid for related RL studies.
- Research Article
17
- 10.1016/j.enbuild.2023.112778
- Jan 7, 2023
- Energy and Buildings
Reinforcement learning control strategy for differential pressure setpoint in large-scale multi-source looped district cooling system