Off‐policy model‐based end‐to‐end safe reinforcement learning

Soha Kanso,Didier Theilliol,Mayank Shekhar Jha

doi:10.1002/rnc.7109

Abstract

AbstractSafety and stability considerations play a crucial role in the development of learning based strategies for control design of systems that require high levels of safety. Safe reinforcement learning (RL) based approaches traditionally seek learning of the control laws that are optimal with respect to system performance whilst ensuring system stability and safety. In this article, an off‐policy safe RL based approach is proposed for nonlinear systems affine in control in continuous time. In this novel work, safety and stability are guaranteed during initialization and exploration phases by adjusting the control input with the solution of a quadratic programming problem combining both input to state stable‐control Lyapunov function and robust control barrier function (R‐CBF) conditions. Moreover, the safety of the learned policy is assured by augmenting the cost function with a CBF to maintain safety and optimize performance simultaneously. Novel mathematically rigorous proofs are provided to establish the stability and safety guarantees, offering a sound theoretical foundation for the approach. To demonstrate the effectiveness of the algorithm, two examples are presented: engine surge and stall dynamics, and an unstable nonlinear system.

Full Text