Abstract
This paper considers the Linear Quadratic Regulator problem for linear systems with unknown dynamics, a central problem in data-driven control and reinforcement learning. We propose a method that uses data to directly return a controller without estimating a model of the system. Sufficient conditions are given under which this method returns a stabilizing controller with guaranteed relative error when the data used to design the controller are affected by noise. This method has low complexity as it only requires a finite number of samples of the system response to a sufficiently exciting input, and can be efficiently implemented as a semi-definite programme.
Highlights
Control theory is witnessing an increasing renewed interest towards data-driven control
This paper considers the infinite horizon Linear Quadratic Regulator (LQR) problem for linear time-invariant systems, which is one of the problems more studied in the control literature
Where P is the controllability Gramian of the closed-loop system (5), which is the unique solution to (A + BK)P (A + BK)⊤ − P + I = 0 (7). This corresponds in the time domain to the 2-norm of the output z when impulses are applied to the input channels, and can be interpreted as the mean-square deviation of z when d is a white process with unit covariance, which is the classic stochastic LQR formulation
Summary
Control theory is witnessing an increasing renewed interest towards data-driven (data-based) control. Starting from [Fiechter, 1997], a tremendous effort has been made for establishing non-asymptotic properties of data-driven methods This term refers to all those methods that aim at providing closedloop stability and performance guarantees using only a finite number of data points. A strength of our method (of direct methods in general) is a parsimonious use of such priors, which allows us to cope with situations where the noise has no convenient statistics In such situations indirect methods (at least those proposed for LQR) are instead much more difficult to pursue since the ID step is strongly reliant on such statistics [Mania et al, 2019, Dean et al, 2019]. This result states that a (noise-free) system trajectory generated by a persistently exciting input is a data-based non-parametric system model.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.