Informed Initial Policies for Learning in Dec-POMDPs

Landon Kraemer,Bikramjit Banerjee

doi:10.1609/aaai.v26i1.8426

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, and local information. Prevalent Dec-POMDP solution techniques have mostly been centralized and have assumed knowledge of the model. In real world scenarios, however, solving centrally may not be an option and model parameters maybe unknown. To address this, we propose a distributed, model-free algorithm for learning Dec-POMDP policies, in which agents take turns learning, with each agent not currently learning following a static policy. For agents that have not yet learned a policy, this static policy must be initialized. We propose a principled method for learning such initial policies through interaction with the environment. We show that by using such informed initial policies, our alternate learning algorithm can find near-optimal policies for two benchmark problems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Informed Initial Policies for Learning in Dec-POMDPs

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Sep 20, 2021
Citations: 4

Similar Papers

Reinforcement Learning of Informed Initial Policies for Decentralized Planning
Landon Kraemer ... Bikramjit Banerjee
ACM Transactions on Autonomous and Adaptive Systems | VOL. 9
Landon Kraemer, et. al.Landon Kraemer ... Bikramjit Banerjee
08 Dec 2014
ACM Transactions on Autonomous and Adaptive Systems | VOL. 9

Bayesian-Game-Based Fuzzy Reinforcement Learning Control for Decentralized POMDPs
Rajneesh Sharma ... Matthijs T J Spaan
IEEE Transactions on Computational Intelligence and AI in Games | VOL. 4
Rajneesh Sharma, et. al.Rajneesh Sharma ... Matthijs T J Spaan
01 Dec 2012
IEEE Transactions on Computational Intelligence and AI in Games | VOL. 4

Fuzzy reinforcement learning control for decentralized partially observable Markov decision processes
Rajneesh Sharma ... Matthijs T J Spaan
-
Rajneesh Sharma, et. al.Rajneesh Sharma ... Matthijs T J Spaan
01 Jun 2011
01 Jun 2011

Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion
Xiaofeng Jiang ... Xiaodong Wang
IEEE Transactions on Automatic Control | VOL. 62
Xiaofeng Jiang, et. al.Xiaofeng Jiang ... Xiaodong Wang
01 Nov 2017
IEEE Transactions on Automatic Control | VOL. 62

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Informed Initial Policies for Learning in Dec-POMDPs

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence