Optimistic Value Iteration

Arnd Hartmanns,Benjamin Lucien Kaminski

doi:10.1007/978-3-030-53291-8_26

Abstract

Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two “sound” variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration’s ability to usually deliver good lower bounds: we obtain a lower bound via standard value iteration, use the result to “guess” an upper bound, and prove the latter’s correctness. We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards. It is easy to implement and performs well, as we show via an extensive experimental evaluation using our implementation within the mcsta model checker of the Modest Toolset.

Highlights

IntroductionMarkov decision processes (MDP, [30]) are a widely-used formalism to represent discrete-state and -time systems in which probabilistic effects meet controllable nondeterministic decisions
Markov decision processes (MDP, [30]) are a widely-used formalism to represent discrete-state and -time systems in which probabilistic effects meet controllable nondeterministic decisions. The former may arise from an environment or agent whose behaviour is only known statistically, or it may be intentional as part of a randomised algorithm. The latter may be under the control of the system— we are in a planning setting and typically look for a scheduler that minimises the probability of unsafe behaviour or maximises a reward—or it may be considered adversarial, which is the standard assumption in verification: we want to establish that the maximum probability of unsafe behaviour is below, or that the minimum reward is above, a specified threshold
Both II and sound value iteration (SVI) fundamentally depend on the Markov Decision Processes (MDP) being contracting; this must be ensured by appropriate structural transformations, e.g. by collapsing end components, a priori

Summary

Introduction

Markov decision processes (MDP, [30]) are a widely-used formalism to represent discrete-state and -time systems in which probabilistic effects meet controllable nondeterministic decisions. The idea is to perform two iterations concurrently, one starting from 0 as before, and one starting from 1 The latter improves an overapproximation of the true values, and the process can be stopped once the (relative or absolute) difference between the two values for the initial state is below the specified , or at any earlier time with a correspondingly larger but known error. We found SVI tricky to implement correctly; some edge cases not considered by the algorithm as presented in [31] initially caused our implementation to deliver incorrect results or diverge on very few benchmarks Both II and SVI fundamentally depend on the MDP being contracting; this must be ensured by appropriate structural transformations, e.g. by collapsing end components, a priori.

Preliminaries

Value Iteration

Theoretical Foundations

Uniqueness of Fixed

Convergence

Termination of OVI

Variants of OVI

Experimental Evaluation

Comparison with VI

Comparison with II and SVI

Comparing Relative and Absolute Error

Verification Phases

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimistic Value Iteration

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 38	License type: CC BY 4.0

Similar Papers

On the convergence of techniques that improve value iteration
Marek Grzes ... Jesse Hoey
-
Marek Grzes, et. al.Marek Grzes ... Jesse Hoey
01 Aug 2013
01 Aug 2013

Generalized Second-Order Value Iteration in Markov Decision Processes
Chandramouli Kamanchi ... Shalabh Bhatnagar
IEEE Transactions on Automatic Control | VOL. 67
Chandramouli Kamanchi, et. al.Chandramouli Kamanchi ... Shalabh Bhatnagar
01 Aug 2022
IEEE Transactions on Automatic Control | VOL. 67

Incremental Value Iteration for Time-Aggregated Markov-Decision Processes
Tao Sun ... Peter B Luh
IRE Transactions on Automatic Control | VOL. 52
Tao Sun, et. al.Tao Sun ... Peter B Luh
01 Nov 2007
IRE Transactions on Automatic Control | VOL. 52

Parallel Hierarchical Pre-Gauss-Seidel Value Iteration Algorithm
Sanaa Chafik ... Abdelhadi Larach
International Journal of Decision Support System Technology | VOL. 10
Sanaa Chafik, et. al.Sanaa Chafik ... Abdelhadi Larach
01 Apr 2018
International Journal of Decision Support System Technology | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimistic Value Iteration

Abstract

Highlights

Summary

Talk to us

Similar Papers