Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning

Yongliang Yang,Yixin Yin,Jie Dong,Sen Zhang

doi:10.1109/access.2019.2960064

Abstract

In this paper, we develop a data-driven algorithm to learn the Nash equilibrium solution for a two-player non-zero-sum (NZS) game with completely unknown linear discrete-time dynamics based on off-policy reinforcement learning (RL). This algorithm solves the coupled algebraic Riccati equations (CARE) forward in time in a model-free manner by using the online measured data. We first derive the CARE for solving the two-player NZS game. Then, model-free off-policy RL is developed to obviate the requirement of complete knowledge of system dynamics. Besides, on- and off-policy RL algorithms are compared in terms of the robustness against the probing noise. Finally, a simulation example is presented to show the efficacy of the presented approach.

Highlights

Game theory is widely used in the complex decision-making problems where the collective behavior depends on the compilation of local interactions [1], [2]
In this paper, we develop an on- and off-policy variants of reinforcement learning algorithm to learn online the Nash equilibrium solution for the two-player NZS game of linear discrete-time (DT) dynamics
Off-policy is robust to the probing noise, i.e., there is no bias as a result of adding a probing noise to the control input to satisfy the condition of the persistence of excitation

Summary

INTRODUCTION

Game theory is widely used in the complex decision-making problems where the collective behavior depends on the compilation of local interactions [1], [2]. A novel model-free algorithm is developed for the discrete-time systems to solve the NZS game to obviate the requirement of complete knowledge of system dynamics. MODEL-BASED ADAPTIVE DYNAMIC PROGRAMMING In Section II, an off-line algorithm is developed to solve CARE (16) and (17), which are sufficient and necessary conditions for the Nash equilibrium. It is shown that one can be approximate the solution to the CARE (16) and (17) by iteratively solving the off-policy Bellman equations (48) and (49). The off-policy Bellman equation (49) can be rewritten as xkT Pi2xk = xkT Q1xk +xkT K i T R21K ixk +xkT Li T R22Lixk xkT + xkT A+B1K i +B2Li T Pi2 A+B1K i +B2Li xk . As discussed in [52], the stopping criterion with

1: Data Collection Phase

SIMULATION

CASE 1

CASE 2

CASE 3

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Dec 26, 2019
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Model-Free Solution to the Discrete-Time Coupled Riccati Equation Using Off-Policy Reinforcement Learning
Lu Li ... Jie Dong
-
Lu Li, et. al.Lu Li ... Jie Dong
01 Jul 2019
01 Jul 2019

Formula omitted] control of linear discrete-time systems: Off-policy reinforcement learning
Bahare Kiumarsi ... Zhong-Ping Jiang
Automatica | VOL. 78
Bahare Kiumarsi, et. al.Bahare Kiumarsi ... Zhong-Ping Jiang
24 Jan 2017
Automatica | VOL. 78

Output-feedback Quadratic Tracking Control of Continuous-time Systems by Using Off-policy Reinforcement Learning with Neural Networks Observer
Qingqing Meng ... Yunjian Peng
-
Qingqing Meng, et. al.Qingqing Meng ... Yunjian Peng
01 Aug 2020
01 Aug 2020

Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics.
Ruizhuo Song ... Huaguang Zhang
IEEE transactions on cybernetics | VOL. 51
Ruizhuo Song, et. al.Ruizhuo Song ... Huaguang Zhang
18 May 2021
IEEE transactions on cybernetics | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions