Off-policy synchronous iteration IRL method for multi-player zero-sum games with input constraints

He Ren,Huaguang Zhang,Yunfei Mu,Jie Duan

doi:10.1016/j.neucom.2019.10.075

Abstract

In this paper, a novel synchronous off-policy method is given to solve multi-player zero-sum (ZS) game under the condition that the knowledge of system data are completely unknown, the actuators of controls are constrained and the disturbances are bounded simultaneously. The cost functions are built by nonquadratic functions to reflect the constrained properties of inputs. The integral reinforcement learning (IRL) technology is employed to solve Hamilton–Jacobi–Bellman equation, so that the system dynamics are not necessary anymore. The obtained value function is proved to converge to the optimal game values. And the equivalent of traditional policy iteration (PI) algorithm and the proposed algorithm is given in solving the multi-player ZS game with constrained inputs. Three neural networks in this paper are utilized, the critic neural network (CNN) to approach the cost function, the action neural network (ANN) to approach the control policies and the disturbance neural networks (DNN) to approach the disturbances are utilized. Finally, a simulation example is given to demonstrate the convergence and performance of the proposed algorithm.

Full Text