This article investigates the optimal tracking control problem for data-based stochastic discrete-time linear systems. An average off-policy Q-learning algorithm is proposed to solve the optimal control problem with random disturbances. Compared with the existing off-policy reinforcement learning (RL) algorithm, the proposed average off-policy Q-learning algorithm avoids the assumption of an initial stability control. First, a pole placement strategy is used to design an initial stable control for systems with unknown dynamics. Second, the initial stable control is used to design a data-based average off-policy Q-learning algorithm. Then, this algorithm is used to solve the stochastic linear quadratic tracking (LQT) problem, and a convergence proof of the algorithm is provided. Finally, numerical examples show that this algorithm outperforms other algorithms in a simulation.
Read full abstract