Abstract

In this paper, an off-policy game Q-learning algorithm is proposed for solving linear discrete-time non-zero sum multi-player game problems. Unlike the existing Q-learning methods for solving the Riccati equation by on-policy learning approaches for multi-player games, an off-policy game Q-learning method is developed for achieving the Nash equilibrium of multiple players. To this end, first, a non-zero sum game problem is formulated, and the value function and the Q-function defined according to each-player individual performance index are rigorously proved to be linear quadratic forms. Then, based on the dynamic programming and Q-learning methods, an off-policy game Q-learning algorithm is developed to find the control policies for multi-player games, such that the Nash equilibrium is reached under the learned control policies. The merit of this paper lies in that the proposed algorithm does not require the system model parameters to be known a priori and fully utilizes measurable data to learn the Nash equilibrium solution. Moreover, there is no bias of Nash equilibrium solution when implementing the proposed off-policy game Q-learning algorithm even though probing noises are added to control policies for maintaining the persistent excitation condition. While bias of the Nash equilibrium solution could be produced if on-policy game Q-learning is employed. This is another contribution of this paper.

Highlights

  • Reinforcement learning (RL), as one of machine learning methods, has been widely used in solving optimal control problems [1]–[4] by using partially or completely unknown dynamics for systems with [5]–[12]

  • The approximate optimal control strategies for varieties of control issues and control systems have been reported in the latest decade, such as [3] for MIMO systems, [5] for multi-agent graphical games, [8], [10], [12] for H∞ control, [13]–[17] for optimal tracking control, and [18], [19] for Q-learning based controller design, etc

  • When conducting the on-policy RL, the data used for learning the optimal control policies are generated by

Read more

Summary

INTRODUCTION

Reinforcement learning (RL), as one of machine learning methods, has been widely used in solving optimal control problems [1]–[4] by using partially or completely unknown dynamics for systems with [5]–[12]. If it can work, how to design the off-policy Q-learning algorithm for achieving the Nash equilibrium of linear DT multi-player games using only measured data is the key point. Notice that the off-policy RL algorithm has been proposed in [30] for linear DT multi-player systems, whether the off-policy Q-learning method can be used to study the optimal control problem of the completely unknown linear DT multi-player games or not? Rp×q is the set of all real p by q matrices. ⊗ stands for the Kronecker product. vec(L) is used to turn any matrix L into a single column vector

PROBLEM STATEMENT
1: Initialization
DERIVATION OF OFF-POLICY GAME Q-LEARNING ALGORITHM
1: Data collection
NO BIAS ANALYSIS OF SOLUTION FOR THE OFF-POLICY Q-LEARNING ALGORITHM
SIMULATION RESULTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call