Abstract

In this paper, in order to solve the two-player nonzero-sum (NZS) differential games with completely unknown linear discrete-time dynamics, we develop a data-driven algorithm to learn the Nash equilibrium based on off-policy reinforcement learning (RL). This algorithm is a fully model-free method, which solves the couple algebraic Riccati equations (CAREs) forward in time using measured data along the system trajectories. It is shown that the two-player NZS differential games results in solving the CAREs. Then, model-based on-policy and model-free off-policy RL algorithms are presented to solve the CAREs. Compared to the on-policy RL, the off-policy RL algorithm can eliminate the influence of probing noise to guarantee unbiased solutions. Finally, a simulation example is carried out to show the efficacy of the proposed approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call