Linear Quadratic Control Using Model-Free Reinforcement Learning

Farnaz Adib Yaghmaie,Fredrik Gustafsson,Lennart Ljung

doi:10.1109/tac.2022.3145632

Farnaz Adib Yaghmaie, Fredrik Gustafsson + Show 1 more

Open Access

https://doi.org/10.1109/tac.2022.3145632

Copy DOI

Abstract

In this article, we consider linear quadratic (LQ) control problem with process and measurement noises. We analyze the LQ problem in terms of the average cost and the structure of the value function. We assume that the dynamics of the linear system is unknown and only noisy measurements of the state variable are available. Using noisy measurements of the state variable, we propose two model-free iterative algorithms to solve the LQ problem. The proposed algorithms are variants of policy iteration routine where the policy is greedy with respect to the average of all previous iterations. We rigorously analyze the properties of the proposed algorithms, including stability of the generated controllers and convergence. We analyze the effect of measurement noise on the performance of the proposed algorithms, the classical off-policy, and the classical <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$Q$</tex-math></inline-formula> -learning routines. We also investigate a model-building approach, inspired by adaptive control, where a model of the dynamical system is estimated and the optimal control problem is solved assuming that the estimated model is the true model. We use a benchmark to evaluate and compare our proposed algorithms with the classical off-policy, the classical <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$Q$</tex-math></inline-formula> -learning, and the policy gradient. We show that our model-building approach performs nearly identical to the analytical solution and our proposed policy iteration-based algorithms outperform the classical off-policy and the classical <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$Q$</tex-math></inline-formula> -learning algorithms on this benchmark but do not outperform the model-building approach.

Full Text