This article addresses the solution of continuous-time linear Itô stochastic systems with Markovian jumps using an online policy iteration (PI) approach grounded in Q -learning. Initially, a model-dependent offline algorithm, structured according to traditional optimal control strategies, is designed to solve the algebraic Riccati equation (ARE). Employing Lyapunov theory, we rigorously derive the convergence of the offline PI algorithm and the admissibility of the iterative control law through mathematical analysis. This article represents the first attempt to tackle these technical challenges. Subsequently, to address the limitations inherent in the offline algorithm, we introduce a novel online Q -learning algorithm tailored for Itô stochastic systems with Markovian jumps. The proposed Q -learning algorithm obviates the need for transition probabilities and system matrices. We provide a thorough stability analysis of the closed-loop system. Finally, the effectiveness and applicability of the proposed algorithms are demonstrated through a simulation example, underpinned by the theorems established herein.
Read full abstract