Abstract

This paper focuses on the linear-quadratic optimal control problem for unknown stochastic-parameter linear systems using reinforcement learning methods. Based on the second moments of random system matrices, a model-based value iteration algorithm is proposed to solve the problem and is proved to be convergent by using the contraction mapping theorem. For the case without knowing any information about the random system matrices, a normalized model-free value iteration algorithm is presented to learn the optimal control law by estimating the data for the next time. The collected data are normalized first in our model-free algorithm to reduce the error of the least squares method. It is proved that our algorithm can obtain an approximate optimal solution. Our algorithm is applicable for any distribution of the random system matrices and does not require an initial mean-square stabilizing control policy. Finally, an example illustrates that our algorithm converges to an approximate optimal control policy and that the normalization step can significantly reduce convergence errors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call