Abstract

In cooperative multi-agent games, agents are required to learn effective cooperative behaviors within complex action spaces. An effective approach is to utilize the Individual-Global-Max (IGM) principle to decompose the global reward signal into individual contributions from each agent. However, existing methods confront two significant challenges: ensuring the accuracy of the value function decomposition process and guaranteeing monotonic improvement in joint policies. These challenges become exacerbated in scenarios involving non-monotonic matrices. To address these challenges, we introduce a novel and flexible value factorization method called Difference Value Factorization (DVF). The key idea of our method is to transform the IGM principle into a new form DVF-IGM, which addresses non-monotonic constraints by ensuring the consistency of the IGM process between the joint difference value and the complex non-linear sum of independent difference values. A centralized evaluator is employed to estimate global Q-values, which not only enhances expressiveness but also constructs difference values for updating individual value functions. We demonstrate that DVF-IGM is an equivalent transformation of IGM and that DVF has the monotonic improvement property. Empirically, our method has been shown to maintain and recover the optimal policy in non-monotonic matrix games and achieve state-of-the-art performance in cooperative tasks within the StarCraft Multi-Agent Challenge (SMAC).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call