Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion

Xiaofeng Jiang,Hongsheng Xi,Falin Liu,Xiaodong Wang

doi:10.1109/tac.2017.2702203

Abstract

In this paper, the decentralized partially observable Markov decision process (Dec-POMDP) systems with discrete state and action spaces are studied from a gradient point of view. Dec-POMDPs have recently emerged as a promising approach to optimizing multiagent decision making in the partially observable stochastic environment. However, the decentralized nature of the Dec-POMDP framework results in a lack of shared belief state, which makes the decision maker impossible to estimate the system state based on local information. In contrast to the belief-based policy, this paper focuses on optimizing the decentralized observation-based policy, which is easily to be applied and does not have the sharing problem. By analyzing the gradient of the objective function, we have developed a centralized stochastic gradient policy iteration algorithm to find the optimal policy on the basis of gradient estimates from a single sample path. This algorithm does not need any specific assumption and can be applied to most practical Dec-POMDP problems. One numerical example is provided to demonstrate the effectiveness of the algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Automatic Control

Lead the way for us

Journal: IEEE Transactions on Automatic Control	Publication Date: Nov 1, 2017
Citations: 23

Similar Papers

Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement
Mikko Lauri ... Jan Peters
Autonomous Agents and Multi-Agent Systems | VOL. 34
Mikko Lauri, et. al.Mikko Lauri ... Jan Peters
10 Jun 2020
Autonomous Agents and Multi-Agent Systems | VOL. 34

Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control
Ziyan Yin ... Jun Li
IEEE Journal of Selected Topics in Signal Processing | VOL. 17
Ziyan Yin, et. al.Ziyan Yin ... Jun Li
01 Jan 2023
IEEE Journal of Selected Topics in Signal Processing | VOL. 17

Informed Initial Policies for Learning in Dec-POMDPs
Landon Kraemer ... Bikramjit Banerjee
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 26
Landon Kraemer, et. al.Landon Kraemer ... Bikramjit Banerjee
20 Sep 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 26

A Deep Reinforcement Learning-Based MPPT Control for PV Systems under Partial Shading Condition.
Bao Chau Phan ... Ying-Chih Lai
Sensors | VOL. 20
Bao Chau Phan, et. al.Bao Chau Phan ... Ying-Chih Lai
27 May 2020
Sensors | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Automatic Control