A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential.

Zhen Zhang,Binqiang Xue,Dongqing Wang,Yew-Soon Ong

doi:10.1109/tcyb.2019.2932203

Abstract

Gradient-based method has been extensively used in today's multiagent reinforcement learning (MARL). In a gradient-based MARL algorithm, each agent updates its parameterized strategy in the direction of the gradient of some performance index. However, studies on the convergence of the existing gradient-based MARL algorithms for identical interest games are quite few. In this article, we propose a policy gradient potential (PGP) algorithm that takes PGP as the source of information for guiding the strategy update, as opposed to the gradient itself, to learn the optimal joint strategy that has a maximal global reward. Since the payoff matrix and the joint strategy are often unavailable to the learning agents in reality, we consider the probability of obtaining the maximal reward as the performance index. Theoretical analysis of the PGP algorithm on the continuous model involving an identical interest repeated game shows that if the component action of every optimal joint action is unique, the critical points corresponding to all optimal joint actions are asymptotically stable. The PGP algorithm is experimentally studied and compared against other MARL algorithms on two commonly used collaborative tasks-the robots leaving a room task and the distributed sensor network task, as well as a real-world minefield navigation problem where only local state and local reward information are available. The results show that the PGP algorithm outperforms the other algorithms in terms of the cumulative reward and the number of time steps used in an episode.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cybernetics

Lead the way for us

Journal: IEEE Transactions on Cybernetics	Publication Date: Jan 15, 2021
Citations: 52

Similar Papers

A Cooperative Multi-Agent Reinforcement Learning Method Based on Coordination Degree
Haoyan Cui ... Zhen Zhang
IEEE Access | VOL. 9
Haoyan Cui, et. al.Haoyan Cui ... Zhen Zhang
01 Jan 2020
IEEE Access | VOL. 9

A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics
S Abdallah ... V Lesser
Journal of Artificial Intelligence Research | VOL. 33
S Abdallah, et. al.S Abdallah ... V Lesser
17 Dec 2008
Journal of Artificial Intelligence Research | VOL. 33

A Multiagent Fuzzy Policy Reinforcement Learning Algorithm with Application to Leader-Follower Robotic Systems
Erfu Yang ... Dongbing Gu
-
Erfu Yang, et. al.Erfu Yang ... Dongbing Gu
01 Oct 2006
01 Oct 2006

Rules-PPO-QMIX: Multi-Agent Reinforcement Learning with Mixed Rules for Large Scene Tasks
Zi-Zhen Shen ... Rui Yu
-
Zi-Zhen Shen, et. al.Zi-Zhen Shen ... Rui Yu
22 Oct 2021
22 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cybernetics