Abstract

This article considers the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$H_{\infty}$</tex-math> </inline-formula> control problem of nonlinear systems having unavailable dynamics and asymmetric saturating actuators. Initially, such an <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$H_{\infty}$</tex-math> </inline-formula> control problem is converted into the zero-sum game with a nonquadratic cost function being introduced. Then, in order to solve the Hamilton–Jacobi–Isaacs equation arising in the zero-sum game, a simultaneous policy iteration (SPI) algorithm is developed under the adaptive dynamic programming framework. Meanwhile, it is proved that the convergence of the SPI algorithm in essence amounts to the convergence of the sequential PI algorithm. To implement the SPI algorithm, the critic, the actor, and the perturbation neural networks (NNs) are, respectively, constructed to estimate the cost function, the control policy, and the perturbation. The three NNs’ weights are simultaneously determined by using the least-squares method together with the Monte Carlo integration technique. A remarkable characteristic of such an SPI algorithm is that arbitrary control policies and perturbations are applicable in the learning process. This makes system’s information be able to be replaced by the data collected along system’s trajectories in advance. More importantly, the persistence of the excitation condition is not required. Finally, simulations of two nonlinear examples are given to validate the present SPI algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call