Decentralized Learning of Finite-Memory Policies in Dec-POMDPs

Weichao Mao,Kaiqing Zhang,Zhuoran Yang,Tamer Başar

doi:10.1016/j.ifacol.2023.10.1346

Abstract

Multi-agent reinforcement learning (MARL) under partial observability is notoriously challenging as the agents only have asymmetric partial observations of the system. In this paper, we study MARL in decentralized partially observable Markov decision processes (Dec-POMDPs) with partial history sharing. In search of decentralized and tractable MARL solutions, we identify the appropriate conditions under which we can adopt the common information approach to naturally extend existing single-agent policy learners to Dec-POMDPs. In particular, under the conditions of bounded local memories and an efficient representation of the common information, we present a MARL algorithm that learns a near-optimal finite-memory policy in Dec-POMDPs. We establish the iteration complexity of the algorithm, which depends only linearly on the number of agents. Simulations on classic Dec-POMDP tasks show that our approach significantly outperforms existing decentralized solutions, and nearly matches the centralized ones that require stronger informational assumptions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Decentralized Learning of Finite-Memory Policies in Dec-POMDPs

Abstract

Talk to us

Similar Papers

More From: IFAC PapersOnLine

Lead the way for us

Similar Papers

Multi-agent reinforcement learning as a rehearsal for decentralized planning
Landon Kraemer ... Bikramjit Banerjee
Neurocomputing | VOL. 190
Landon Kraemer, et. al.Landon Kraemer ... Bikramjit Banerjee
03 Feb 2016
Neurocomputing | VOL. 190

Bayesian-Game-Based Fuzzy Reinforcement Learning Control for Decentralized POMDPs
Rajneesh Sharma ... Matthijs T J Spaan
IEEE Transactions on Computational Intelligence and AI in Games | VOL. 4
Rajneesh Sharma, et. al.Rajneesh Sharma ... Matthijs T J Spaan
01 Dec 2012
IEEE Transactions on Computational Intelligence and AI in Games | VOL. 4

Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions
Shayegan Omidshafiei ... Shih–Yuan Liu
The International Journal of Robotics Research | VOL. 36
Shayegan Omidshafiei, et. al.Shayegan Omidshafiei ... Shih–Yuan Liu
01 Feb 2017
The International Journal of Robotics Research | VOL. 36

Multi-agent reinforcement learning based on local communication
Wenxu Zhang ... Lei Ma
Cluster Computing | VOL. 22
Wenxu Zhang, et. al.Wenxu Zhang ... Lei Ma
26 Mar 2018
Cluster Computing | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Decentralized Learning of Finite-Memory Policies in Dec-POMDPs

Abstract

Talk to us

Similar Papers

More From: IFAC PapersOnLine