Multiagent value iteration algorithms in dynamic programming and reinforcement learning

Dimitri Bertsekas

doi:10.1016/j.rico.2020.100003

Dimitri Bertsekas

Open Access

https://doi.org/10.1016/j.rico.2020.100003

Copy DOI

Export

Save

Cite

Journal: Results in Control and Optimization	Publication Date: Nov 10, 2020
Citations: 17	License type: cc-by-nc-nd

Abstract
Full-Text
Similar Papers

Abstract

Listen

We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. In an earlier work we introduced a policy iteration algorithm, where the policy improvement is done one-agent-at-a-time in a given order, with knowledge of the choices of the preceding agents in the order. As a result, the amount of computation for each policy improvement grows linearly with the number of agents, as opposed to exponentially for the standard all-agents-at-once method. For the case of a finite-state discounted problem, we showed convergence to an agent-by-agent optimal policy. In this paper, this result is extended to value iteration and optimistic versions of policy iteration, as well as to more general DP problems where the Bellman operator is a contraction mapping, such as stochastic shortest path problems with all policies being proper.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Multiagent value iteration algorithms in dynamic programming and reinforcement learning

Abstract

Published Version

Talk to us

Similar Papers

More From: Results in Control and Optimization

Lead the way for us

Similar Papers

Contraction Mappings in the Theory Underlying Dynamic Programming
Eric V Denardo
SIAM Review | VOL. 9
Eric V DenardoEric V Denardo
01 Apr 1967
SIAM Review | VOL. 9

Heuristic Search for Generalized Stochastic Shortest Path MDPs
Andrey Kolobov ... Mausam Mausam
Proceedings of the International Conference on Automated Planning and Scheduling | VOL. 21
Andrey Kolobov, et. al.Andrey Kolobov ... Mausam Mausam
22 Mar 2011
Proceedings of the International Conference on Automated Planning and Scheduling | VOL. 21

Q-learning and policy iteration algorithms for stochastic shortest path problems
Huizhen Yu ... Dimitri P Bertsekas
Annals of Operations Research | VOL. 208
Huizhen Yu, et. al.Huizhen Yu ... Dimitri P Bertsekas
18 Apr 2012
Annals of Operations Research | VOL. 208

Approximate Dynamic Programming and Reinforcement Learning
Lucian Buşoniu ... Robert Babuška
-
Lucian Buşoniu, et. al.Lucian Buşoniu ... Robert Babuška
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Multiagent value iteration algorithms in dynamic programming and reinforcement learning

Abstract

Published Version

Talk to us

Similar Papers

More From: Results in Control and Optimization