Multiagent Gumbel MuZero: Efficient Planning in Combinatorial Action Spaces

Xiaotian Hao,Dong Li,Jianye Hao,Yan Zheng,Kai Li,Chenjun Xiao

doi:10.1609/aaai.v38i11.29121

Abstract

AlphaZero and MuZero have achieved state-of-the-art (SOTA) performance in a wide range of domains, including board games and robotics, with discrete and continuous action spaces. However, to obtain an improved policy, they often require an excessively large number of simulations, especially for domains with large action spaces. As the simulation budget decreases, their performance drops significantly. In addition, many important real-world applications have combinatorial (or exponential) action spaces, making it infeasible to search directly over all possible actions. In this paper, we extend AlphaZero and MuZero to learn and plan in more complex multiagent (MA) Markov decision processes, where the action spaces increase exponentially with the number of agents. Our new algorithms, MA Gumbel AlphaZero and MA Gumbel MuZero, respectively without and with model learning, achieve superior performance on cooperative multiagent control problems, while reducing the number of environmental interactions by up to an order of magnitude compared to model-free approaches. In particular, we significantly improve prior performance when planning with much fewer simulation budgets. The code and appendix are available at https://github.com/tjuHaoXiaotian/MA-MuZero.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multiagent Gumbel MuZero: Efficient Planning in Combinatorial Action Spaces

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Action decoupled SAC reinforcement learning with discrete-continuous hybrid action spaces
Yahao Xu ... Hongbin Deng
Neurocomputing | VOL. 537
Yahao Xu, et. al.Yahao Xu ... Hongbin Deng
31 Mar 2023
Neurocomputing | VOL. 537

BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning
Gongju Wang ... Dianxi Shi
-
Gongju Wang, et. al.Gongju Wang ... Dianxi Shi
01 Jan 2020
01 Jan 2020

A comparative study of 13 deep reinforcement learning based energy management methods for a hybrid electric vehicle
Hanchen Wang ... Bin Xu
Energy | VOL. 266
Hanchen Wang, et. al.Hanchen Wang ... Bin Xu
20 Dec 2022
Energy | VOL. 266

Generalising Discrete Action Spaces with Conditional Action Trees
Christopher Bamford ... Alvaro Ovalle
-
Christopher Bamford, et. al.Christopher Bamford ... Alvaro Ovalle
17 Aug 2021
17 Aug 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multiagent Gumbel MuZero: Efficient Planning in Combinatorial Action Spaces

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence