Abstract

This paper presents an efficient procedure for multi-objective model checking of long-run average reward (aka: mean pay-off) and total reward objectives as well as their combination. We consider this for Markov automata, a compositional model that captures both traditional Markov decision processes (MDPs) as well as a continuous-time variant thereof. The crux of our procedure is a generalization of Forejt et al.’s approach for total rewards on MDPs to arbitrary combinations of long-run and total reward objectives on Markov automata. Experiments with a prototypical implementation on top of the Storm model checker show encouraging results for both model types and indicate a substantial improved performance over existing multi-objective long-run MDP model checking based on linear programming.

Highlights

  • Markov decision processes (MDPs) model checking In various applications, multiple decision criteria and uncertainty frequently co-occur

  • Multi-objective MDP Various types of objectives known from conventional— single-objective—model checking have been lifted to the multi-objective case

  • According to MultiGain’s log files, the majority of its runtime is spend for solving linear programming (LP), suggesting that the better performance of Storm is likely due to the iterative approach presented in this work

Read more

Summary

Introduction

MDP model checking In various applications, multiple decision criteria and uncertainty frequently co-occur. Stochastic decision processes for which the objective is to achieve multiple—possibly partly conflicting—objectives occur in various fields These include operations research, economics, planning in AI, and game theory, to mention a few. Multi-objective MDP Various types of objectives known from conventional— single-objective—model checking have been lifted to the multi-objective case These objectives range over ω-regular specifications including LTL [26,27], expected (discounted and non-discounted) total rewards [21,27,28,52,22], stepbounded and reward-bounded reachability probabilities [28,35], and—most relevant for this work—expected long-run average (LRA) rewards [18,11,20], known as mean pay-offs. For the latter, all current approaches build upon linear programming (LP) which yields a theoretical time-complexity polynomial in the model size. The LP formulation of [11,20] is implemented in MultiGain [12], an extension of PRISM for multi-objective LRA rewards

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.