Distributed randomized multiagent policy iteration in reinforcement learning

Weipeng Zhang

doi:10.1016/j.rico.2023.100255

Abstract

We propose a distributed randomized policy iteration algorithm for infinite horizon dynamic programming problems for which the control at each stage is m-dimensional. The traditional policy iteration algorithm involves performing a minimization over an m-dimensional constraint set and has a computational complexity that increases exponentially in m, resulting in an intractable combinatorial search problem. In each iteration, our algorithm performs a series of sequential minimizations followed by policy evaluation and policy improvement using the policy that attains the minimum cost over the sequential minimizations. Our algorithm is well-suited for parallel computation, has a complexity that increases linearly in m, and converges to an agent-by-agent optimal policy. We characterize sufficient conditions for which our algorithm generates a globally optimal policy that coincides with that obtained from standard policy iteration.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distributed randomized multiagent policy iteration in reinforcement learning

Abstract

Talk to us

Similar Papers

More From: Results in Control and Optimization

Lead the way for us

Journal: Results in Control and Optimization	Publication Date: Jun 19, 2023
License type: cc-by-nc-nd

Similar Papers

Multiagent Reinforcement Learning: Rollout and Policy Iteration
Dimitri Bertsekas
IEEE/CAA Journal of Automatica Sinica | VOL. 8
Dimitri BertsekasDimitri Bertsekas
01 Feb 2021
IEEE/CAA Journal of Automatica Sinica | VOL. 8

Least Square Policy Iteration in Reinforcement Learning
Bin Zhao ... Ying Hong
-
Bin Zhao, et. al.Bin Zhao ... Ying Hong
01 Jan 2015
01 Jan 2015

Contraction Mappings in the Theory Underlying Dynamic Programming
Eric V Denardo
SIAM Review | VOL. 9
Eric V DenardoEric V Denardo
01 Apr 1967
SIAM Review | VOL. 9

Traffic Signal Control based on Markov Decision Process**This work is supported in part by the National Science Foundation of China (Grant No. 61374110, 61433002, 61221003), NSFC International Cooperation Project (Grant No. 71361130012).
Yunwen Xu ... Zhao Zhou
IFAC-PapersOnLine | VOL. 49
Yunwen Xu, et. al.Yunwen Xu ... Zhao Zhou
01 Jan 2015
IFAC-PapersOnLine | VOL. 49

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed randomized multiagent policy iteration in reinforcement learning

Abstract

Talk to us

Similar Papers

More From: Results in Control and Optimization