Abstract
We consider finite Markov decision processes with undiscounted total effective payoff. We show that there exist uniformly optimal pure and stationary strategies that can be computed by solving a polynomial number of linear programs. This implies that in a two-player zero-sum stochastic game with perfect information and with total effective payoff there exists a stationary best response to any stationary strategy of the opponent. From this, we derive the existence of a uniformly optimal pure and stationary saddle point. Finally we show that mean payoff can be viewed as a special case of total payoff.
Highlights
1.1 Basic concepts1.1.1 Markov decision proccessesWe will consider Markov decision processes (MDPs) with total effective payoff
If there are no random nodes in the MDP, a uniformly optimal stationary strategy can be found by a combinatorial algorithm that solves a polynomial number of minimum mean-cycle problems [18]; we omit the details from this version
Total payoff MDPs/games considered in this paper can be thought of as a generalization of shortest path problems/games, when we do not assume that there is a single terminal
Summary
We will consider Markov decision processes (MDPs) with total effective payoff. Let G = (V, E) be a finite directed graph (digraph) in which loops and multiple arcs are allowed. The vertices v ∈ V are called positions (or states) and the arcs e ∈ E are called moves (or transitions). The vertex-set V is partitioned into two subsets V = VW ∪ VR that correspond to white and random positions, controlled respectively, by a player (decision maker), who will be called Max, and by nature. Let us denote by E(u) the set of arcs leaving u and assume that E(u) = ∅ in every position u ∈ V. For all random positions u ∈ VR we are given probabilities p(u, v) > 0 for all random moves (u, v) ∈ E(u) such that (u,v)∈E(u) p(u, v) = 1. Leibniz International Proceedings in Informatics Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.