Abstract Solving the stochastic linear quadratic (SLQ) optimal control problem generally needs full information about system dynamics. In this paper, a Q-learning iteration algorithm is adopted to solve the control problem for model-free discrete-time systems. Firstly, the condition of the well-posedness for the SLQ problem is given. In order to solve the SLQ problem, the stochastic problem is transformed into the deterministic one. Secondly, in the iteration process of Q-learning algorithm, the H matrix sequence and control gain matrix sequence are obtained without the knowledge of system parameters, and the convergence proof of two sequences is also given. Lastly, two simulation examples are supplied to explain the effectiveness of the Q-learning algorithm.
Read full abstract