Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning

Jinna Li,Ping Li,Zhenfei Xiao

doi:10.1109/access.2019.2939384

Jinna Li, Ping Li + Show 1 more

Open Access

https://doi.org/10.1109/access.2019.2939384

Copy DOI

Abstract

In this paper, an off-policy game Q-learning algorithm is proposed for solving linear discrete-time non-zero sum multi-player game problems. Unlike the existing Q-learning methods for solving the Riccati equation by on-policy learning approaches for multi-player games, an off-policy game Q-learning method is developed for achieving the Nash equilibrium of multiple players. To this end, first, a non-zero sum game problem is formulated, and the value function and the Q-function defined according to each-player individual performance index are rigorously proved to be linear quadratic forms. Then, based on the dynamic programming and Q-learning methods, an off-policy game Q-learning algorithm is developed to find the control policies for multi-player games, such that the Nash equilibrium is reached under the learned control policies. The merit of this paper lies in that the proposed algorithm does not require the system model parameters to be known a priori and fully utilizes measurable data to learn the Nash equilibrium solution. Moreover, there is no bias of Nash equilibrium solution when implementing the proposed off-policy game Q-learning algorithm even though probing noises are added to control policies for maintaining the persistent excitation condition. While bias of the Nash equilibrium solution could be produced if on-policy game Q-learning is employed. This is another contribution of this paper.

Highlights

Reinforcement learning (RL), as one of machine learning methods, has been widely used in solving optimal control problems [1]–[4] by using partially or completely unknown dynamics for systems with [5]–[12]
The approximate optimal control strategies for varieties of control issues and control systems have been reported in the latest decade, such as [3] for MIMO systems, [5] for multi-agent graphical games, [8], [10], [12] for H∞ control, [13]–[17] for optimal tracking control, and [18], [19] for Q-learning based controller design, etc
When conducting the on-policy RL, the data used for learning the optimal control policies are generated by

Summary

INTRODUCTION

Reinforcement learning (RL), as one of machine learning methods, has been widely used in solving optimal control problems [1]–[4] by using partially or completely unknown dynamics for systems with [5]–[12]. If it can work, how to design the off-policy Q-learning algorithm for achieving the Nash equilibrium of linear DT multi-player games using only measured data is the key point. Notice that the off-policy RL algorithm has been proposed in [30] for linear DT multi-player systems, whether the off-policy Q-learning method can be used to study the optimal control problem of the completely unknown linear DT multi-player games or not? Rp×q is the set of all real p by q matrices. ⊗ stands for the Kronecker product. vec(L) is used to turn any matrix L into a single column vector

PROBLEM STATEMENT

1: Initialization

DERIVATION OF OFF-POLICY GAME Q-LEARNING ALGORITHM

1: Data collection

NO BIAS ANALYSIS OF SOLUTION FOR THE OFF-POLICY Q-LEARNING ALGORITHM

SIMULATION RESULTS

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2019
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Open-loop and closed-loop local and remote stochastic nonzero-sum game with inconsistent information structure
Xin Li ... Xinbei Lv
International journal of systems science | VOL. 55
Xin Li, et. al.Xin Li ... Xinbei Lv
17 Feb 2024
International journal of systems science | VOL. 55

Scheduling flexible job shop problem subject to machine breakdown with game theory
Di-Hua Sun ... Xiao-Yong Liao
International Journal of Production Research | VOL. 52
Di-Hua Sun, et. al.Di-Hua Sun ... Xiao-Yong Liao
06 Jun 2013
International Journal of Production Research | VOL. 52

∃ℝ-complete Decision Problems about (Symmetric) Nash Equilibria in (Symmetric) Multi-player Games
V Bilò ... M Mavronicolas
ACM Transactions on Economics and Computation | VOL. 9
V Bilò, et. al.V Bilò ... M Mavronicolas
29 May 2021
ACM Transactions on Economics and Computation | VOL. 9

Optimal Duopolistic Competition Strategies in Social Networks
Dionisios N Sotiropoulos ... Christos Bilanakos
-
Dionisios N Sotiropoulos, et. al.Dionisios N Sotiropoulos ... Christos Bilanakos
01 Jul 2019
01 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions