Abstract
Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two “sound” variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration’s ability to usually deliver good lower bounds: we obtain a lower bound via standard value iteration, use the result to “guess” an upper bound, and prove the latter’s correctness. We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards. It is easy to implement and performs well, as we show via an extensive experimental evaluation using our implementation within the mcsta model checker of the Modest Toolset.
Highlights
IntroductionMarkov decision processes (MDP, [30]) are a widely-used formalism to represent discrete-state and -time systems in which probabilistic effects meet controllable nondeterministic decisions
Markov decision processes (MDP, [30]) are a widely-used formalism to represent discrete-state and -time systems in which probabilistic effects meet controllable nondeterministic decisions. The former may arise from an environment or agent whose behaviour is only known statistically, or it may be intentional as part of a randomised algorithm. The latter may be under the control of the system— we are in a planning setting and typically look for a scheduler that minimises the probability of unsafe behaviour or maximises a reward—or it may be considered adversarial, which is the standard assumption in verification: we want to establish that the maximum probability of unsafe behaviour is below, or that the minimum reward is above, a specified threshold
Both II and sound value iteration (SVI) fundamentally depend on the Markov Decision Processes (MDP) being contracting; this must be ensured by appropriate structural transformations, e.g. by collapsing end components, a priori
Summary
Markov decision processes (MDP, [30]) are a widely-used formalism to represent discrete-state and -time systems in which probabilistic effects meet controllable nondeterministic decisions. The idea is to perform two iterations concurrently, one starting from 0 as before, and one starting from 1 The latter improves an overapproximation of the true values, and the process can be stopped once the (relative or absolute) difference between the two values for the initial state is below the specified , or at any earlier time with a correspondingly larger but known error. We found SVI tricky to implement correctly; some edge cases not considered by the algorithm as presented in [31] initially caused our implementation to deliver incorrect results or diverge on very few benchmarks Both II and SVI fundamentally depend on the MDP being contracting; this must be ensured by appropriate structural transformations, e.g. by collapsing end components, a priori.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.