An Enhanced Model-Free Reinforcement Learning Algorithm to Solve Nash Equilibrium for Multi-Agent Cooperative Game Systems

Yuannan Jiang,Fuxiao Tan

doi:10.1109/access.2020.3043806

Yuannan Jiang, Fuxiao Tan

Open Access

https://doi.org/10.1109/access.2020.3043806

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 17	License type: CC BY 4.0

Affiliation: Shanghai Maritime University

Abstract

Solving the Nash equilibrium is important for multi-agent game systems, and the speed of reaching Nash equilibrium is critical for the agent to quickly make real-time decisions. A typical scheme is the model-free reinforcement learning algorithm based on policy iteration, which is slow because each iteration will be calculated from the start state to the end state. In this paper, we propose a faster scheme based on value iteration, using Q-function in an online manner to solve the Nash equilibrium of the system. Since the calculation is based on the value from the last iteration, the convergence speed of the proposed scheme is much faster than the policy iteration. The rationality and convergence of this scheme are analyzed and proved theoretically. An actor-critic network structure is used to implement this scheme through simulation. The simulation results show that the convergence speed of our proposed scheme is about 10 times faster than that of the policy iteration algorithm.

Highlights

M ULTI-AGENT consensus research involves the knowledge, goals, skills, and planning of how to enable the agents to take coordinated actions to solve problems
We propose a value iteration algorithm to solve the Nash equilibrium for multi-agent game systems by designing a cooperative agent’s RL algorithm jointly using Q-function in an online manner
C) All agents are in Nash equilibrium, with Ji u∗i, u∗i ≤ Ji ui, u∗i [6], [25]

Summary

INTRODUCTION

M ULTI-AGENT consensus research involves the knowledge, goals, skills, and planning of how to enable the agents to take coordinated actions to solve problems. The model-free algorithm is an important research direction for multi-agent systems used in unknown environments [20], [21]. We propose a value iteration algorithm to solve the Nash equilibrium for multi-agent game systems by designing a cooperative agent’s RL algorithm jointly using Q-function in an online manner. System matrix Input matrix of agent i Control input of agent i Control inputs of the neighbors of agent i Local neighborhood tracking error of agent i Local neighborhood tracking error of the neighbors of agent i Vector of εi and ε−i Synchronization error vector

MULTI-AGENT GRAPHS

BELLMAN FUNCTION The Bellman function is defined as

HAMILTONIAN FUNCTION FOR DYNAMIC GRAPHICAL GAMES

BELLMAN FUNCTION BASED ON Q-FUNCTION The Q-function of agent i is defined as follows: Qπi (εik, uik)

NASH SOLUTION FOR THE DYNAMIC GRAPHICAL GAME

STABILITY AND NASH SOLUTION FOR THE GRAPHICAL GAMES

BEST RESPONSE SOLUTION OF DYNAMIC GRAPHICAL GAMES

THE PROPOSED VALUE ITERATION ALGORITHM

9: Actor update rule

CONCLUSIONS

THE PROOF OF LEMMA 1

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Enhanced Model-Free Reinforcement Learning Algorithm to Solve Nash Equilibrium for Multi-Agent Cooperative Game Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies
Huizhen Yu ... Dimitri P Bertsekas
Mathematics of Operations Research | VOL. 40
Huizhen Yu, et. al.Huizhen Yu ... Dimitri P Bertsekas
01 Oct 2015
Mathematics of Operations Research | VOL. 40

Hybrid Iteration ADP Algorithm to Solve Cooperative, Optimal Output Regulation Problem for Continuous-Time, Linear, Multiagent Systems: Theory and Application in Islanded Modern Microgrids With IBRs
Omar Qasem ... Tianyou Chai
IEEE Transactions on Industrial Electronics | VOL. 71
Omar Qasem, et. al.Omar Qasem ... Tianyou Chai
01 Jan 2024
IEEE Transactions on Industrial Electronics | VOL. 71

On the convergence of techniques that improve value iteration
Marek Grzes ... Jesse Hoey
-
Marek Grzes, et. al.Marek Grzes ... Jesse Hoey
01 Aug 2013
01 Aug 2013

Randomised Procedures for Initialising and Switching Actions in Policy Iteration
Shivaram Kalyanakrishnan ... Neeldhara Misra
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 30
Shivaram Kalyanakrishnan, et. al.Shivaram Kalyanakrishnan ... Neeldhara Misra
05 Mar 2016
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Enhanced Model-Free Reinforcement Learning Algorithm to Solve Nash Equilibrium for Multi-Agent Cooperative Game Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access