Abstract

AbstractIn the present study, a temporal difference (TD) learning algorithm is proposed to solve the optimal control problem for semi‐Markov jump linear systems (S‐MJLSs). The proposed scheme is TP‐free so that it can be applied in cases without pre‐known transition probabilities of embedded Markov chain. Coupled algebraic Riccati equations (CAREs) implied with the analytical solution of control gains are derived by utilizing a S‐MJLS augmented with maximum sojourn time, which contributes to develop the TD learning algorithm. It is proved that for sufficiently rich enough jumping modes and jumping numbers observed online, the value function in TD algorithm converges to CAREs solutions. Finally, an example is carried out to evaluate the learning capability of TD algorithm and the effectiveness of the proposed control method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call