Optimal control for semi‐Markov jump linear systems via TP‐free temporal difference () learning

Yaogang Chen,Xiaoli Luan,Fei Liu,Jiwei Wen

doi:10.1002/rnc.5648

Abstract

AbstractIn the present study, a temporal difference (TD) learning algorithm is proposed to solve the optimal control problem for semi‐Markov jump linear systems (S‐MJLSs). The proposed scheme is TP‐free so that it can be applied in cases without pre‐known transition probabilities of embedded Markov chain. Coupled algebraic Riccati equations (CAREs) implied with the analytical solution of control gains are derived by utilizing a S‐MJLS augmented with maximum sojourn time, which contributes to develop the TD learning algorithm. It is proved that for sufficiently rich enough jumping modes and jumping numbers observed online, the value function in TD algorithm converges to CAREs solutions. Finally, an example is carried out to evaluate the learning capability of TD algorithm and the effectiveness of the proposed control method.

Full Text