Abstract

This article investigates two On-policy and Off-policy Q-learning algorithms for time-varying linear discrete-time systems (DTSs) in the presence of complete dynamic uncertainties. To handle the challenge of time-varying description, the lifting method is employed to transform the original time-varying linear DTS into time-invariant linear DTS in the absence of the conventional controllability condition, which affects to the convergence of traditional Q-learning algorithms. Based on theoretical analysis of the structure in the obtained time-invariant linear DTS, On-policy and Off-policy algorithms are proposed to guarantee the convergence of Q-learning algorithms. Both On-policy and Off-policy Q-learning algorithms guarantee model-free consideration under the data collection. Especially, the Off-policy technique is able to develop the algorithm with high data efficiency because the collected data can be utilized again after each iteration. Finally, the simulation results of two-dimensional systems and spacecraft control systems are presented to validate the effectiveness of the two proposed control schemes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call