DVF:Multi-agent Q-learning with difference value factorization

Anqi Huang,Yongli Wang,Jianghui Sang,Xiaoli Wang,Yupeng Wang

doi:10.1016/j.knosys.2024.111422

Abstract

In cooperative multi-agent games, agents are required to learn effective cooperative behaviors within complex action spaces. An effective approach is to utilize the Individual-Global-Max (IGM) principle to decompose the global reward signal into individual contributions from each agent. However, existing methods confront two significant challenges: ensuring the accuracy of the value function decomposition process and guaranteeing monotonic improvement in joint policies. These challenges become exacerbated in scenarios involving non-monotonic matrices. To address these challenges, we introduce a novel and flexible value factorization method called Difference Value Factorization (DVF). The key idea of our method is to transform the IGM principle into a new form DVF-IGM, which addresses non-monotonic constraints by ensuring the consistency of the IGM process between the joint difference value and the complex non-linear sum of independent difference values. A centralized evaluator is employed to estimate global Q-values, which not only enhances expressiveness but also constructs difference values for updating individual value functions. We demonstrate that DVF-IGM is an equivalent transformation of IGM and that DVF has the monotonic improvement property. Empirically, our method has been shown to maintain and recover the optimal policy in non-monotonic matrix games and achieve state-of-the-art performance in cooperative tasks within the StarCraft Multi-Agent Challenge (SMAC).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DVF:Multi-agent Q-learning with difference value factorization

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Journal: Knowledge-Based Systems	Publication Date: Jan 20, 2024
Citations: 1

Similar Papers

Planning Not to Talk: Multiagent Systems that are Robust to Communication Loss
...
-
, et. al. ...
20 Apr 2022
20 Apr 2022

Planning Not to Talk: Multiagent Systems that are Robust to Communication Loss
...
-
, et. al. ...
20 Apr 2022
20 Apr 2022

Solving Transition-Independent Multi-Agent MDPs with Sparse Interactions
Joris Scharpff ... Frans Oliehoek
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 30
Joris Scharpff, et. al.Joris Scharpff ... Frans Oliehoek
05 Mar 2016
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 30

Minimum set of feedback sensors for high performance decentralized cooperative force control of redundant manipulators
D Navarro-Alarcon ... V Parra-Vega
-
D Navarro-Alarcon, et. al.D Navarro-Alarcon ... V Parra-Vega
01 Oct 2008
01 Oct 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DVF:Multi-agent Q-learning with difference value factorization

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems