Abstract

In this paper, the decentralized partially observable Markov decision process (Dec-POMDP) systems with discrete state and action spaces are studied from a gradient point of view. Dec-POMDPs have recently emerged as a promising approach to optimizing multiagent decision making in the partially observable stochastic environment. However, the decentralized nature of the Dec-POMDP framework results in a lack of shared belief state, which makes the decision maker impossible to estimate the system state based on local information. In contrast to the belief-based policy, this paper focuses on optimizing the decentralized observation-based policy, which is easily to be applied and does not have the sharing problem. By analyzing the gradient of the objective function, we have developed a centralized stochastic gradient policy iteration algorithm to find the optimal policy on the basis of gradient estimates from a single sample path. This algorithm does not need any specific assumption and can be applied to most practical Dec-POMDP problems. One numerical example is provided to demonstrate the effectiveness of the algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.