Abstract
In this paper, the optimal output tracking control problem of discrete-time nonlinear systems is considered. First, the augmented system is derived and the tracking control problem is converted to the regulation problem with a discounted performance index, which relies on the solution of the Bellman equation. It is known that policy iteration and value iteration are two classical algorithms for solving the Bellman equation. Through analysis of the two algorithms, it is found that policy iteration converges fast while requires an initial admissible control policy, and value iteration avoids the requirement of an initial admissible control policy but converges slowly. To achieve the tradeoff between policy iteration and value iteration, the multistep heuristic dynamic programming (MsHDP) is proposed by using multistep policy evaluation scheme. The convergence of MsHDP algorithm is proved by demonstrating that it converges to the solution of the Bellman equation. Subsequently, neural network-based actor-critic structure is developed to implement the MsHDP algorithm. The effectiveness and advantages of the developed MsHDP method are validated through comparative simulation studies.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Systems, Man, and Cybernetics: Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.