Abstract

The infinite-horizon optimal control problem for nonlinear systems is studied. In the context of model-based, iterative learning strategies we propose an alternative definition and construction of the <i>temporal difference error</i> arising in Policy Iteration strategies. In such architectures the error is computed via the evolution of the Hamiltonian function (or, possibly, of its integral) along the trajectories of the closed-loop system. Herein the temporal difference error is instead obtained via two subsequent steps: first the dynamics of the underlying costate variable in the Hamiltonian system is steered by means of a (virtual) control input in such a way that the stable invariant manifold becomes externally attractive. Then, the <i>distance-from-invariance</i> of the manifold, induced by approximate solutions, yields a natural candidate measure for the <i>policy evaluation</i> step. The <i>policy improvement</i> phase is then performed by means of standard gradient descent methods that allows to correctly update the weights of the underlying functional approximator. The above architecture then yields an <i>iterative (episodic) learning</i> scheme based on a scalar, constant <i>reward</i> at each iteration, the value of which is insensitive to the length of the episode, as in the original spirit of Reinforcement Learning strategies for discrete-time systems. Finally, the theory is validated by means of a numerical simulation involving an automatic flight control problem.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.