Abstract

We investigate the important problem of certifying stability of reinforcement learning policies when interconnected with nonlinear dynamical systems. We show that by regulating the partial gradients of policies, strong guarantees of robust stability can be obtained based on a proposed semidefinite programming feasibility problem. The method is able to certify a large set of stabilizing controllers by exploiting problem-specific structures; furthermore, we analyze and establish its (non)conservatism. Empirical evaluations on two decentralized control tasks, namely multi-flight formation and power system frequency regulation, demonstrate that the reinforcement learning agents can have high performance within the stability-certified parameter space and also exhibit stable learning behaviors in the long run.

Highlights

  • R EINFORCEMENT learning (RL) aims at guiding an agent to perform a task as efficiently and skillfully as possible through interactions with the environment

  • We focus on system (16) where an linear time-invariant (LTI) system is interconnected with a fixed RL policy

  • The present analysis aims at offering stability certificates when applying RL to dynamical systems, which is orthogonal to the line of research that aims at improving the performance of the existing RL algorithms

Read more

Summary

INTRODUCTION

R EINFORCEMENT learning (RL) aims at guiding an agent to perform a task as efficiently and skillfully as possible through interactions with the environment. The main challenge of these methods is that the robustness guarantee can be conservative due to coarse constraints on nonlinearity such as those based on Lipschitz constants [16], [17], leading to a limited search space for safe policies To mitigate this issue, the integral quadratic constraint (IQC) framework proposed in [18] can be employed, which has been widely used to analyze the stability of large-scale complex systems, such as aircraft control [19]. Existing techniques can be computational intensive for deep neural networks, and establishing necessary conditions for robustness has been limited to only a few cases (e.g., block-diagonal structured uncertainty operators with bounded singular values [20]) To address this issue, we introduce a more informative quadratic constraint and analyze the necessity of the certificate criterion as an extension of the preliminary conference.

PROBLEM FORMULATION
EXTENSION TO NONLINEAR SYSTEMS WITH UNCERTAINTY
ANALYSIS OF CONSERVATISM OF THE STABILITY CERTIFICATE
A BW B I 00
REINFORCEMENT LEARNING WITH STABILITY REGULARIZATION
CASE STUDIES
MULTI-AGENT FLIGHT FORMATION
50 Ite10ra0tions 150 200
A2 A3 A4
POWER SYSTEM FREQUENCY REGULATION
CONCLUSIONS
G7 G8 G9 G10
PROOF OF LEMMA 1
PROOF OF THEOREM 2
PROOF OF LEMMA 2
STATEMENT AND PROOF OF LEMMA 5
STATEMENT AND PROOF OF LEMMA 6
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.