On the convergence of techniques that improve value iteration

Marek Grzes,Jesse Hoey

doi:10.1109/ijcnn.2013.6706982

Abstract

Prioritisation of Bellman backups or updating only a small subset of actions represent important techniques for speeding up planning in MDPs. The recent literature showed new efficient approaches which exploit these directions. Backward value iteration and backing up only the best actions were shown to lead to a significant reduction of the planning time. This paper conducts a theoretical and empirical analysis of these techniques and shows new important proofs. In particular, (1) it identifies weaker requirements for the convergence of backups based on best actions only, (2) a new method for evaluation of the Bellman error is shown for the update that updates one best action once, (3) it presents the theoretical proof of backward value iteration and establishes required initialisation, (4) and shows that the default state ordering of backups in standard value iteration can significantly influence its performance. Additionally, (5) the existing literature did not compare these methods, either empirically or analytically, against policy iteration. The rigorous empirical and novel theoretical parts of the paper reveal important associations and allow drawing guidelines on which type of value or policy iteration is suitable for a given domain. Finally, our chief message is that standard value iteration can be made far more efficient by simple modifications shown in the paper.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the convergence of techniques that improve value iteration

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies
Huizhen Yu ... Dimitri P Bertsekas
Mathematics of Operations Research | VOL. 40
Huizhen Yu, et. al.Huizhen Yu ... Dimitri P Bertsekas
01 Oct 2015
Mathematics of Operations Research | VOL. 40

Generalized Second-Order Value Iteration in Markov Decision Processes
Chandramouli Kamanchi ... Shalabh Bhatnagar
IEEE Transactions on Automatic Control | VOL. 67
Chandramouli Kamanchi, et. al.Chandramouli Kamanchi ... Shalabh Bhatnagar
01 Aug 2022
IEEE Transactions on Automatic Control | VOL. 67

Optimistic Value Iteration
Arnd Hartmanns ... Benjamin Lucien Kaminski
-
Arnd Hartmanns, et. al.Arnd Hartmanns ... Benjamin Lucien Kaminski
01 Jan 2020
01 Jan 2020

Approximate Dynamic Programming
Warren B Powell
-
Warren B PowellWarren B Powell
04 Aug 2011
04 Aug 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the convergence of techniques that improve value iteration

Abstract

Talk to us

Similar Papers