Abstract

Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two “sound” variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration’s ability to usually deliver good lower bounds: we obtain a lower bound via standard value iteration, use the result to “guess” an upper bound, and prove the latter’s correctness. We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards. It is easy to implement and performs well, as we show via an extensive experimental evaluation using our implementation within the mcsta model checker of the Modest Toolset.

Highlights

  • IntroductionMarkov decision processes (MDP, [30]) are a widely-used formalism to represent discrete-state and -time systems in which probabilistic effects meet controllable nondeterministic decisions

  • Markov decision processes (MDP, [30]) are a widely-used formalism to represent discrete-state and -time systems in which probabilistic effects meet controllable nondeterministic decisions. The former may arise from an environment or agent whose behaviour is only known statistically, or it may be intentional as part of a randomised algorithm. The latter may be under the control of the system— we are in a planning setting and typically look for a scheduler that minimises the probability of unsafe behaviour or maximises a reward—or it may be considered adversarial, which is the standard assumption in verification: we want to establish that the maximum probability of unsafe behaviour is below, or that the minimum reward is above, a specified threshold

  • Both II and sound value iteration (SVI) fundamentally depend on the Markov Decision Processes (MDP) being contracting; this must be ensured by appropriate structural transformations, e.g. by collapsing end components, a priori

Read more

Summary

Introduction

Markov decision processes (MDP, [30]) are a widely-used formalism to represent discrete-state and -time systems in which probabilistic effects meet controllable nondeterministic decisions. The idea is to perform two iterations concurrently, one starting from 0 as before, and one starting from 1 The latter improves an overapproximation of the true values, and the process can be stopped once the (relative or absolute) difference between the two values for the initial state is below the specified , or at any earlier time with a correspondingly larger but known error. We found SVI tricky to implement correctly; some edge cases not considered by the algorithm as presented in [31] initially caused our implementation to deliver incorrect results or diverge on very few benchmarks Both II and SVI fundamentally depend on the MDP being contracting; this must be ensured by appropriate structural transformations, e.g. by collapsing end components, a priori.

Preliminaries
Value Iteration
Theoretical Foundations
Uniqueness of Fixed
Convergence
Termination of OVI
Variants of OVI
Experimental Evaluation
Comparison with VI
Comparison with II and SVI
Comparing Relative and Absolute Error
Verification Phases
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call