Abstract

Reinforcement learning has been established over the past decade as an effective tool to find optimal control policies for dynamical systems, with recent focus on approaches that guarantee safety during the learning and/or execution phases. In general, safety guarantees are critical in reinforcement learning when the system is safety-critical and/or task restarts are not practically feasible. In optimal control theory, safety requirements are often expressed in terms of state and/or control constraints. In recent years, reinforcement learning approaches that rely on persistent excitation have been combined with a barrier transformation to learn the optimal control policies under state constraints. To soften the excitation requirements, model-based reinforcement learning methods that rely on exact model knowledge have also been integrated with the barrier transformation framework. The objective of this paper is to develop safe reinforcement learning method for deterministic nonlinear systems, with parametric uncertainties in the model, to learn approximate constrained optimal policies without relying on stringent excitation conditions. To that end, a model-based reinforcement learning technique that utilizes a novel filtered concurrent learning method, along with a barrier transformation, is developed in this paper to realize simultaneous learning of unknown model parameters and approximate optimal state-constrained control policies for safety-critical systems.

Highlights

  • Due to advantages such as repeatability, accuracy, and lack of physical fatigue, autonomous systems have been increasingly utilized to perform tasks that are dull, dirty, or dangerous

  • Sample efficiency in RL can be improved via model-based reinforcement learning (MBRL); MBRL methods are prone to failure due to inaccurate models [see, e.g., Safe Model-Based Reinforcement Learning

  • While the extrapolation states sk are assumed to be constant in this analysis for ease of exposition, the analysis extends in a straightforward manner to time-varying extrapolation states that are confined to a compact neighborhood of the origin

Read more

Summary

Introduction

Due to advantages such as repeatability, accuracy, and lack of physical fatigue, autonomous systems have been increasingly utilized to perform tasks that are dull, dirty, or dangerous. Autonomy in safety-critical applications such as autonomous driving and unmanned flight relies on the ability to synthesize safe controllers. To improve robustness to parametric uncertainties and changing objectives and models, autonomous systems need the ability to simultaneously synthesize and execute control policies online and in real time. This paper concerns reinforcement learning (RL), which has been established as an effective tool for safe policy synthesis for both known and uncertain dynamical systems with finite state and action spaces [see, e.g., Sutton and Barto (1998); Doya (2000)]. RL typically requires a large number of iterations due to sample inefficiency [see, e.g., Sutton and Barto (1998)]. Sample efficiency in RL can be improved via model-based reinforcement learning (MBRL); MBRL methods are prone to failure due to inaccurate models [see, e.g., Safe Model-Based Reinforcement Learning

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call