Abstract

<abstract><p>In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal tracking problem with time delay. Compared with the traditional reinforcement learning method, Q-learning method avoids the need for accurate system model. Firstly, the delay operator is introduced to construct a novel augmented system composed of the original system and the command generator. Secondly, the SLQ optimal tracking problem is transformed into a deterministic one by system transformation and the corresponding Q function of SLQ optimal tracking control is derived. Based on this, Q-learning algorithm is proposed and its convergence is proved. Finally, a simulation example shows the effectiveness of the proposed algorithm.</p></abstract>

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.