Credible Negotiation for Multi-agent Reinforcement Learning in Long-term Coordination

Tianlong Gu,Taihang Zhi,Xuguang Bao,Liang Chang

doi:10.1145/3706110

Abstract

The coordination of multi-agent is one of the critical problems in Multi-Agent Reinforcement Learning (MARL). The traditional methods of MARL focus on finding a stochastically acceptable solution called Nash Equilibrium (NE) for all agents from the Markov game in which multiple equilibria exist. However, learning a fair equilibrium is crucial for the sustainability and stability of collaboration in the long-term coordination game, especially when the leadership competition exists. In this paper, we propose the bi-level reinforcement learning method N-Bi-AC, whose solution is a Pareto improvement for traditional NE, to choose a fair Equilibrium. There are two parts in our method, the first is that we propose the Negotiator to determine the leader in stage game, and the other is to update the Q-value of agents in the game by using a bi-level actor-critic learning method based on the Joint Mixed Strategy Equilibrium Q-learning algorithm (JMSE Q-learning). The convergence proof is given, and the learning algorithm is compared with the state-of-the-art algorithms. We found that the proposed N-Bi-AC method successfully converged to a fair Nash Equilibrium, which guarantees the fairness of agents in different matrix game environments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Credible Negotiation for Multi-agent Reinforcement Learning in Long-term Coordination

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Autonomous and Adaptive Systems

Lead the way for us

Similar Papers

Bi-Level Actor-Critic for Multi-Agent Coordination
Haifeng Zhang ... Minne Li
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Haifeng Zhang, et. al.Haifeng Zhang ... Minne Li
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Policy Adaptive Multi-agent Deep Deterministic Policy Gradient
Yixiang Wang ... Feng Wu
-
Yixiang Wang, et. al.Yixiang Wang ... Feng Wu
01 Jan 2020
01 Jan 2020

Learning with Opponent-Learning Awareness
...
-
, et. al. ...
09 Jul 2018
09 Jul 2018

Inferring Passengers’ Interactive Choices on Public Transits via MA-AL: Multi-Agent Apprenticeship Learning
Mingzhou Yang ... Jun Luo
-
Mingzhou Yang, et. al.Mingzhou Yang ... Jun Luo
20 Apr 2020
20 Apr 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Credible Negotiation for Multi-agent Reinforcement Learning in Long-term Coordination

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Autonomous and Adaptive Systems