Multi-objective Optimization of Long-run Average and Total Rewards

Tim Quatmann,Joost-Pieter Katoen

doi:10.1007/978-3-030-72016-2_13

Abstract

This paper presents an efficient procedure for multi-objective model checking of long-run average reward (aka: mean pay-off) and total reward objectives as well as their combination. We consider this for Markov automata, a compositional model that captures both traditional Markov decision processes (MDPs) as well as a continuous-time variant thereof. The crux of our procedure is a generalization of Forejt et al.’s approach for total rewards on MDPs to arbitrary combinations of long-run and total reward objectives on Markov automata. Experiments with a prototypical implementation on top of the Storm model checker show encouraging results for both model types and indicate a substantial improved performance over existing multi-objective long-run MDP model checking based on linear programming.

Highlights

Markov decision processes (MDPs) model checking In various applications, multiple decision criteria and uncertainty frequently co-occur
Multi-objective MDP Various types of objectives known from conventional— single-objective—model checking have been lifted to the multi-objective case
According to MultiGain’s log files, the majority of its runtime is spend for solving linear programming (LP), suggesting that the better performance of Storm is likely due to the iterative approach presented in this work

Summary

Introduction

MDP model checking In various applications, multiple decision criteria and uncertainty frequently co-occur. Stochastic decision processes for which the objective is to achieve multiple—possibly partly conflicting—objectives occur in various fields These include operations research, economics, planning in AI, and game theory, to mention a few. Multi-objective MDP Various types of objectives known from conventional— single-objective—model checking have been lifted to the multi-objective case These objectives range over ω-regular specifications including LTL [26,27], expected (discounted and non-discounted) total rewards [21,27,28,52,22], stepbounded and reward-bounded reachability probabilities [28,35], and—most relevant for this work—expected long-run average (LRA) rewards [18,11,20], known as mean pay-offs. For the latter, all current approaches build upon linear programming (LP) which yields a theoretical time-complexity polynomial in the model size. The LP formulation of [11,20] is implemented in MultiGain [12], an extension of PRISM for multi-objective LRA rewards

Objectives

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-objective Optimization of Long-run Average and Total Rewards

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 2	License type: CC BY 4.0

Similar Papers

Multi-objective Optimization of Long-run Average and Total Rewards
Joost-Pieter Katoen ... Tim Quatmann
-
Joost-Pieter Katoen, et. al.Joost-Pieter Katoen ... Tim Quatmann
19 Mar 2021
19 Mar 2021

Contraction Mappings in the Theory Underlying Dynamic Programming
Eric V Denardo
SIAM Review | VOL. 9
Eric V DenardoEric V Denardo
01 Apr 1967
SIAM Review | VOL. 9

Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features
Xi-Ren Cao
IEEE Transactions on Automatic Control | VOL. 62
Xi-Ren CaoXi-Ren Cao
01 Sep 2017
IEEE Transactions on Automatic Control | VOL. 62

The application of Markov decision process with penalty function in restaurant delivery robot
Yong Wang ... Zhen Hu
-
Yong Wang, et. al.Yong Wang ... Zhen Hu
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-objective Optimization of Long-run Average and Total Rewards

Abstract

Highlights

Summary

Talk to us

Similar Papers