Reinforcement Learning and Stochastic Control
A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.
- Research Article
7
- 10.1007/s41870-017-0074-z
- Jan 3, 2018
- International Journal of Information Technology
This work proposes Lyapunov theory based Fuzzy/Neural Reinforcement Learning (RL) controllers with guaranteed stability. We look at ways in which Lyapunov theory could be used to produce RL controllers wherein the control action is hybrid or Lyapunov constrained, resulting in self learning controllers that are optimal, effective and stable. Fuzzy systems and Neural networks have been used as generic function approximators to handle exponential rise in computational burden that arises when RL is extended to high dimensional/continuous state-action spaces. We propose two distinct approaches: (i) Hybridized fuzzy Lyapunov RL control by combining Fuzzy Q Learning methodology in a Lyapunov setting thereby guarantying stability, and (ii) Lyapunov constrained Neural RL control wherein the controller’s action set is constrained to satisfy Lyapunov stability condition. Incorporating Lyapunov theory based element in the action generation mechanism of an RL based controller guarantees stability. We implement our soft computing based Lyapunov RL on the two benchmark non linear problems: (a) inverted pendulum and (b) cart pole balancing. The results obtained and associated comparison with baseline Neural/Fuzzy Q-Learning based controllers bring out the advantage of our Lyapunov RL based scheme.
- Research Article
109
- 10.1016/j.applthermaleng.2023.120430
- Mar 27, 2023
- Applied Thermal Engineering
Comparison of reinforcement learning and model predictive control for building energy system optimization
- Research Article
- 10.1002/asjc.3800
- Jul 22, 2025
- Asian Journal of Control
This paper presents a risk‐aware safe reinforcement learning (RL) control design for stochastic discrete‐time linear systems. Rather than using a safety certifier to myopically intervene with the RL controller, a risk‐informed safe controller is also learned besides the RL controller, and the RL and safe controllers are combined together. Several advantages come along with this approach: (1) High‐confidence safety can be certified without relying on a high‐fidelity system model and using limited data available, (2) myopic interventions and convergence to an undesired equilibrium can be avoided by deciding on the contribution of two stabilizing controllers, and (3) highly efficient and computationally tractable solutions can be provided by optimizing over a scalar decision variable and linear programming polyhedral sets. To learn safe controllers with a large invariant set, piecewise affine controllers are learned instead of linear controllers. To this end, the closed‐loop system is first represented using collected data, a decision variable, and noise. The effect of the decision variable on the variance of the safe violation of the closed‐loop system is formalized. The decision variable is then designed such that the probability of safety violation for the learned closed‐loop system is minimized. It is shown that this control‐oriented approach reduces the data requirements and can also reduce the variance of safety violations. Finally, to integrate the safe and RL controllers, a new data‐driven interpolation technique is introduced. This method aims to maintain the RL agent's optimal implementation while ensuring its safety within environments characterized by noise. The study concludes with a simulation example that serves to validate the theoretical results.
- Conference Article
12
- 10.1109/icopesa56898.2023.10141314
- Feb 24, 2023
Various linear and nonlinear controllers have been developed to improve the dynamic performance of DC-DC converters. Most controllers can only be designed on the basis of understanding the mathematical model of DC-DC converter, but the inherent nonlinear and time-varying characteristics of DC-DC switching converter make it difficult to complete the precise modeling, so the model-based control design is complex and the control performance is limited. In order to overcome the problem, this paper proposes a reinforcement learning (RL) controller based on the twin-delayed deep deterministic policy gradient (TD3) algorithm. This controller does not need the model of the switching converter. The converter will be regarded as a black box model, the policy approximation function (policy neural network) can be trained and learned by constructing a Markov decision process interacting with the black box model in the control system, and the optimal control action can be output. The RL controller is developed based on actor critic architecture, and a TD3 algorithm with higher learning efficiency is proposed to improve the control performance of the RL controller. The proposed RL controller based on TD3 algorithm is compared with the traditional PI controller. The simulation results show that the RL controller has better dynamic performance when the converter starts and the load step changes.
- Research Article
1
- 10.1115/1.4064023
- Dec 4, 2023
- ASME Journal of Engineering for Sustainable Buildings and Cities
We consider the problem of optimal control of district cooling energy plants (DCEPs) consisting of multiple chillers, a cooling tower, and a thermal energy storage (TES), in the presence of time-varying electricity prices. A straightforward application of model predictive control (MPC) requires solving a challenging mixed-integer nonlinear program (MINLP) because of the on/off of chillers and the complexity of the DCEP model. Reinforcement learning (RL) is an attractive alternative since its real-time control computation is much simpler. But designing an RL controller is challenging due to myriad design choices and computationally intensive training. In this paper, we propose an RL controller and an MPC controller for minimizing the electricity cost of a DCEP, and compare them via simulations. The two controllers are designed to be comparable in terms of objective and information requirements. The RL controller uses a novel Q-learning algorithm that is based on least-squares policy iteration. We describe the design choices for the RL controller, including the choice of state space and basis functions, that are found to be effective. The proposed MPC controller does not need a mixed-integer solver for implementation, but only a nonlinear program (NLP) solver. A rule-based baseline controller is also proposed to aid in comparison. Simulation results show that the proposed RL and MPC controllers achieve similar savings over the baseline controller, about 17%.
- Conference Article
1
- 10.1115/detc2017-67659
- Aug 6, 2017
The control of pure feedback system, which is widely used but has non-affine property, has always been an important and challenging problem. In order to achieve precise tracking control of pure feedback system through improving the disturbance rejection ability of existing reinforcement learning algorithm, a reinforcement learning (RL) control strategy based on extended state observer (ESO) is proposed in this paper. In the proposed method, the extended state observer can reject the total disturbances and transform the pure feedback system which is in an input-output predictor from to overcome the non-causal problem into a cascade integral form. This allows the continuous reinforcement learning strategy of the actor-critic (AC) structure not to depend on the detailed model information, which makes it practically data-driven. It is worth noting that, in order to further improve the ability to track the changing reference trajectory, a novel curvature acceleration factor is proposed, which can adjust the learning speed of the reinforcement learning controller according to the curvature of the reference trajectory. The validity of the proposed algorithm is verified by the simulation results.
- Conference Article
4
- 10.23919/acc53348.2022.9867239
- Jun 8, 2022
District cooling energy plants (DCEPs) consisting of chillers, cooling towers, and thermal energy storage (TES) systems consume a considerable amount of electricity. Optimizing the scheduling of the TES and chillers to take advantage of time-varying electricity price is a challenging optimal control problem. The classical method, model predictive control (MPC), requires solving a high dimensional mixed-integer nonlinear program (MINLP) because of the on/off actuation of the chillers and charge/discharge of TES, which are computationally challenging. RL is an attractive alternative: the real time control computation is a low-dimensional optimization problem that can be easily solved. However, the performance of an RL controller depends on many design choices.In this paper, we propose a Q-learning based reinforcement learning (RL) controller for this problem. Numerical simulation results show that the proposed RL controller is able to reduce energy cost over a rule-based baseline controller by approximately 8%, comparable to savings reported in the literature with MPC for similar DCEPs. We describe the design choices in the RL controller, including basis functions, reward function shaping, and learning algorithm parameters. Compared to existing work on RL for DCEPs, the proposed controller is designed for continuous state and actions spaces.
- Research Article
125
- 10.1145/3358230
- Oct 8, 2019
- ACM Transactions on Embedded Computing Systems
This paper proposes a new forward reachability analysis approach to verify safety of cyber-physical systems (CPS) with reinforcement learning controllers. The foundation of our approach lies on two efficient, exact and over-approximate reachability algorithms for neural network control systems using star sets, which is an efficient representation of polyhedra. Using these algorithms, we determine the initial conditions for which a safety-critical system with a neural network controller is safe by incrementally searching a critical initial condition where the safety of the system cannot be established. Our approach produces tight over-approximation error and it is computationally efficient, which allows the application to practical CPS with learning enable components (LECs). We implement our approach in NNV, a recent verification tool for neural networks and neural network control systems, and evaluate its advantages and applicability by verifying safety of a practical Advanced Emergency Braking System (AEBS) with a reinforcement learning (RL) controller trained using the deep deterministic policy gradient (DDPG) method. The experimental results show that our new reachability algorithms are much less conservative than existing polyhedra-based approaches. We successfully determine the entire region of the initial conditions of the AEBS with the RL controller such that the safety of the system is guaranteed, while a polyhedra-based approach cannot prove the safety properties of the system.
- Research Article
46
- 10.1016/j.compchemeng.2021.107630
- Dec 8, 2021
- Computers & Chemical Engineering
Safe chance constrained reinforcement learning for batch process control
- Research Article
21
- 10.1016/j.compchemeng.2023.108511
- Nov 23, 2023
- Computers & Chemical Engineering
A practically implementable reinforcement learning control approach by leveraging offset-free model predictive control
- Conference Article
204
- 10.1109/icuas.2019.8798254
- Jun 1, 2019
Contemporary autopilot systems for unmanned aerial vehicles (UAVs) are far more limited in their flight envelope as compared to experienced human pilots, thereby restricting the conditions UAVs can operate in and the types of missions they can accomplish autonomously. This paper proposes a deep reinforcement learning (DRL) controller to handle the nonlinear attitude control problem, enabling extended flight envelopes for fixed-wing UAVs. A proof-of-concept controller using the proximal policy optimization (PPO) algorithm is developed, and is shown to be capable of stabilizing a fixed-wing UAV from a large set of initial conditions to reference roll, pitch and airspeed values. The training process is outlined and key factors for its progression rate are considered, with the most important factor found to be limiting the number of variables in the observation vector, and including values for several previous time steps for these variables. The trained reinforcement learning (RL) controller is compared to a proportional-integral-derivative (PID) controller, and is found to converge in more cases than the PID controller, with comparable performance. Furthermore, the RL controller is shown to generalize well to unseen disturbances in the form of wind and turbulence, even in severe disturbance conditions.
- Conference Article
2
- 10.1109/pesgm46819.2021.9637834
- Jul 26, 2021
In this paper, we propose a curriculum learned reinforcement learning (RL) controller to facilitate distribution system critical load restoration (CLR), leveraging RL's fast online response and its outstanding optimal sequential control capability. Like many grid control problems, CLR is complicated due to the large control action space and renewable uncertainty in a heavily constrained non-linear environment with strong intertemporal dependency. The nature of the problem oftentimes causes the RL policy to converge to a poor-performing local optimum if learned directly. To overcome this, we design a two-stage curriculum in which the RL agent will learn generation control and load restoration decision under different scenarios progressively. Via curriculum learning, the trained RL controller is expected to achieve a better control performance, with critical loads restored as rapidly and reliably as possible. Using the IEEE 13-bus test system, we illustrate the performance of the RL controller trained by the proposed curriculum-based method.
- Conference Article
5
- 10.1109/iciea51954.2021.9516140
- Aug 1, 2021
Reinforcement learning (RL) has attracted great interest from researchers in recent years. RL performs as human or better in many fields such as games and robot control. Although this technology is booming in computer science, it has not been practically applied in industrial process control. Up to now, proportional–integral–derivative (PID) control is still the most dominating and popular control method in industrial control. In this paper, we propose a combination of deep reinforcement learning (DRL) and PID control for better process control performance. The idea is generated by the following observations: for PID controller, its transient performance is not usually well enough to meet a strict requirement or in complex signal tracking tasks; For RL technology, a perfectly designed reward function is required for training. However, in practice, the reward function needs to be tested through trial and error, which will lead to a waste of computational power and time. By combining these two strategies, PID controller can help to improve the steady-state performance of RL control by its integral term, while the trained RL agent is able to improve the transient performance of PID controller. Several case studies with the water tank system are presented to demonstrate the effectiveness of the combined PID + RL control strategy.
- Research Article
45
- 10.1016/j.egyai.2022.100202
- Sep 11, 2022
- Energy and AI
Real-world challenges for multi-agent reinforcement learning in grid-interactive buildings
- Book Chapter
- 10.1007/978-3-030-79092-9_10
- Jan 1, 2022
We present a class of reinforcement learning (RL) controllers that is of actor-critic type. We focus on the direct heuristic dynamic programming (dHDP) method. Over the past several years, new analysis and synthesis of the dHDP as a reinforcement learning controller as well as impressive applications of the dHDP have emerged. In this chapter we provide a summary on how the dHDP works, what analytical properties it possesses, and how it was applied and implemented in a wearable robot for the automatic tuning of the prosthesis control parameters with a human user in the loop.KeywordsReinforcement learningAdaptive/approximation dynamic programOptimal controlStochastic gradient descentAdaptive optimal controlDynamic programmingBellman optimalityDirect heuristic dynamic programmingRobotic prosthesis control