Abstract

This paper addresses the problem of approximating the set of all solutions for Multi-objective Markov Decision Processes. We show that in the vast majority of interesting cases, the number of solutions is exponential or even infinite. In order to overcome this difficulty we propose to approximate the set of all solutions by means of a limited precision approach based on White’s multi-objective value-iteration dynamic programming algorithm. We prove that the number of calculated solutions is tractable and show experimentally that the solutions obtained are a good approximation of the true Pareto front.

Highlights

  • Markov decision processes (MDPs) are a well-known conceptual tool useful for modelling the operation of systems as sequential decision processes

  • This paper analyzes some practical difficulties that arise in the solution of Multi-objective Markov decision processes (MOMDPs)

  • We show that the number of nondominated policy values is tractable only under a number of limiting assumptions

Read more

Summary

Introduction

Markov decision processes (MDPs) are a well-known conceptual tool useful for modelling the operation of systems as sequential decision processes. When it is possible to explicitly state the decision maker’s preferences prior to problem solving as a scalar function to be optimized, we are lead to single-policy approaches ( Perny and Weng [9], Wray et al [19]). With more general preference models where mixture policies are not acceptable (e.g. for ethical reasons, see Lizotte et al [7]), Pareto-optimal non-stationary policies need to be taken into consideration This case was theoretically solved by White [18], but the exact solution is not possible in practice due to the infeasible (or even infinite) size of the Pareto front. The solution to a MOMDP is given by the V(s) sets of all states

Combinatorial explosion
Recursive backwards algorithm
Vector value iteration algorithm
Vector value iteration with limited precision
Comparing Pareto front approximations
Stochastic deep sea treasure
Findings
Conclusions and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call