Abstract

In spite of the lack of convexity, convergence and sample complexity properties were recently established for the random search method applied to the linear quadratic regulator (LQR) problem. Since policy gradient approaches require an initial stabilizing controller, we propose a model-free algorithm that searches over the set of state-feedback gains and returns a stabilizing controller in a finite number of iterations. Our algorithm involves a sequence of relaxed LQR problems for which the associated domains converge to the set of stabilizing controllers for the original continuous-time linear time-invariant system. Starting from a stabilizing controller for the relaxed problem, the proposed approach alternates between updating the controller via policy gradient iterations and decreasing relaxation parameter in the LQR cost while preserving stability at all iterations. By properly tuning the relaxation parameter updates we ensure that the cost values do not exceed a uniform threshold and establish computable bounds on the total number of iterations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call