Model-Based Multi-agent Policy Optimization with Dynamic Dependence Modeling

Biyang Hu,Zifan Wu,Chao Yu

doi:10.1007/978-3-030-96772-7_36

Abstract

AbstractThis paper explores the combination of model-based methods and multi-agent reinforcement learning (MARL) for more efficient coordination among multiple agents. A decentralized model-based MARL method, Policy Optimization with Dynamic Dependence Modeling (POD2M), is proposed to dynamically determine the importance of other agents’ information during the model building process. In POD2M, the agents adapt their mutual dependence during building their own dynamic models in order to make a trade-off between an individual-learning process and a coordinated-learning process. Once the dynamic models have been built, the policies are then trained based on one-step model predictive rollouts. Empirical experiments on both cooperative and competitive scenarios indicate that our method can achieve higher sample efficiency against the compared model-free MARL algorithms, and outperforms the centralized method in large domains.KeywordsMulti-agent reinforcement learningModel-based policy optimizationDynamic dependence

Full Text