Data-driven adaptive dynamic programming for two-player nonzero-sum game

Qichao Zhang,Dongbin Zhao,Yafei Zhou

doi:10.1109/ccdc.2017.7979102

Abstract

In this paper, we propose a data-driven adaptive dynamic programming approach to solve the Hamilton-Jacobi (HJ) equations for the two-player nonzero-sum (NZS) game with completely unknown dynamics. First, the model-based policy iteration (PI) algorithm is given, where the knowledge of system dynamics is required. To relax this requirement, a data-driven adaptive dynamic programming (ADP) is proposed in this paper to solve the unknown nonlinear NZS game with only online data. Neural network approximators are constructed to approach the solution of the HJ equations. The online data is collected under the two initial admissible control policies. Then, the NN weights are updated based on the least-squares method using the collected online data repeatedly, which is a kind of the off-policy learning scheme. Finally, a simulation example is provided to demonstrate the effectiveness of the proposed control scheme.

Full Text