Abstract

In this article, a novel online method for multi-player non-zero-sum (NZS) differential games of nonlinear partially unknown continuous time (CT) systems with control constraints is developed based on neural networks (NN). The issue of multi-player NZS games with saturated actuator is elaborately analyzed and the unknown dynamics model is learned by applying identifier NN. Different from using the standard identifier-actor-critic framework of adaptive dynamic programming (ADP), the proposed method uses only identifier networks and critic networks for all the players to solve the coupled Hamilton-Jacobi (HJ) equations for multi-player NZS games, which could effectively simplify the algorithm and save computing resources. Moreover, a tuning law which utilizes the gradient descent method is designed for each critic network. Meanwhile, to remove the requirement for the initial stabilizing control, a novel stability term is designed to ensure the system stability during the training phase of the critic NN. By the means of Lyapunov approach, it is proven that the system states, the critic network weight estimation errors and the obtained control are all uniformly ultimately bounded (UUB). Finally, two numerical examples are simulated to illustrate the validity of the developed method for multi-player NZS games with control constraints.

Highlights

  • The theories with respect to differential games have received more and more attentions since it was firstly studied in [1]

  • In multi-player NZS games, the key is to obtain a cluster of optimal control policies called Nash equilibrium for each player to pursue the minimization of their own performance function

  • A tuning law with a novel stability term was developed for each critic neural networks (NN) such that the stability of the closed-loop system was guaranteed during NN training phase and the need for the initial stabilizing control was removed

Read more

Summary

INTRODUCTION

The theories with respect to differential games have received more and more attentions since it was firstly studied in [1]. For the unknown multi−input system, a three−layer NN identifier, reinforcement learning scheme and NZS game theory were utilized together to solve the optimal tracking control issue [25]. Event-triggered mechanism has been widely employed to save transmission bandwidths and computing resources [30], [31] This technology was combined with ADP to obtain control schemes for every players in NZS games [32]. In [37], an IRL method was used to figure out the optimal control policies for players in NZS games with saturated actuator This method employed both actor NN and critic NN, which perplexed the algorithm and aggravated the computing burdens. The · represents the Euclidean norm of a vector or a matrix. ∇(·) ∂(·)/∂x is taken for denoting the gradient operator

PROBLEM FORMULATION
STABILITY ANALYSIS
SIMULATIONS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call