Abstract

This article investigates the optimal control problem for nonlinear nonzero-sum differential game in the environment of no initial admissible policies while considering the control constraint. An adaptive learning algorithm is thus developed based on policy iteration technique to approximately obtain the Nash equilibrium using real-time data. A two-player continuous-time system is used to present this approximate mechanism, which is implemented as a critic-actor architecture for every player. The constraint is incorporated into this optimization by introducing the nonquadratic value function, and the associated constrained Hamilton-Jacobi equation is derived. The critic neural network (NN) and actor NN are utilized to learn the value function and the optimal control policy, respectively, in the light of novel weight tuning laws. In order to tackle the stability during the learning phase, two stable operators are designed for two actors. The proposed algorithm is proved to be convergent as a Newton's iteration, and the stability of this closed-loop system is also ensured by Lyapunov analysis. Finally, two simulation examples demonstrate the effectiveness of the proposed learning scheme by considering different constraint scenes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call