AbstractIn this paper, the optimal output tracking control problem for Markov jump systems is investigated, where the two cases with known or completely unknown transition probabilities are both considered. Based on game theory and performance, quadratic cost is considered, where a discount parameter is introduced into the quadratic cost in order to track unstable systems and eliminate the assumption that the noise energy is bounded. The game coupled algebraic Riccati equation and the corresponding controller are presented by dynamic programming. The stochastic stability of the tracking error system is further investigated. Moreover, iterative and reinforcement learning‐based algorithms are proposed for solving the optimal tracking controller with known or completely unknown transition probabilities, respectively. Finally, some numerical simulations on a DC motor are performed to validate the effectiveness of the proposed results.