Abstract
This paper addresses the problem of approximating the set of all solutions for Multi-objective Markov Decision Processes. We show that in the vast majority of interesting cases, the number of solutions is exponential or even infinite. In order to overcome this difficulty we propose to approximate the set of all solutions by means of a limited precision approach based on White’s multi-objective value-iteration dynamic programming algorithm. We prove that the number of calculated solutions is tractable and show experimentally that the solutions obtained are a good approximation of the true Pareto front.
Highlights
Markov decision processes (MDPs) are a well-known conceptual tool useful for modelling the operation of systems as sequential decision processes
This paper analyzes some practical difficulties that arise in the solution of Multi-objective Markov decision processes (MOMDPs)
We show that the number of nondominated policy values is tractable only under a number of limiting assumptions
Summary
Markov decision processes (MDPs) are a well-known conceptual tool useful for modelling the operation of systems as sequential decision processes. When it is possible to explicitly state the decision maker’s preferences prior to problem solving as a scalar function to be optimized, we are lead to single-policy approaches ( Perny and Weng [9], Wray et al [19]). With more general preference models where mixture policies are not acceptable (e.g. for ethical reasons, see Lizotte et al [7]), Pareto-optimal non-stationary policies need to be taken into consideration This case was theoretically solved by White [18], but the exact solution is not possible in practice due to the infeasible (or even infinite) size of the Pareto front. The solution to a MOMDP is given by the V(s) sets of all states
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have