Abstract

We consider finite Markov decision processes with undiscounted total effective payoff. We show that there exist uniformly optimal pure and stationary strategies that can be computed by solving a polynomial number of linear programs. This implies that in a two-player zero-sum stochastic game with perfect information and with total effective payoff there exists a stationary best response to any stationary strategy of the opponent. From this, we derive the existence of a uniformly optimal pure and stationary saddle point. Finally we show that mean payoff can be viewed as a special case of total payoff.

Highlights

  • 1.1 Basic concepts1.1.1 Markov decision proccessesWe will consider Markov decision processes (MDPs) with total effective payoff

  • If there are no random nodes in the MDP, a uniformly optimal stationary strategy can be found by a combinatorial algorithm that solves a polynomial number of minimum mean-cycle problems [18]; we omit the details from this version

  • Total payoff MDPs/games considered in this paper can be thought of as a generalization of shortest path problems/games, when we do not assume that there is a single terminal

Read more

Summary

Markov decision proccesses

We will consider Markov decision processes (MDPs) with total effective payoff. Let G = (V, E) be a finite directed graph (digraph) in which loops and multiple arcs are allowed. The vertices v ∈ V are called positions (or states) and the arcs e ∈ E are called moves (or transitions). The vertex-set V is partitioned into two subsets V = VW ∪ VR that correspond to white and random positions, controlled respectively, by a player (decision maker), who will be called Max, and by nature. Let us denote by E(u) the set of arcs leaving u and assume that E(u) = ∅ in every position u ∈ V. For all random positions u ∈ VR we are given probabilities p(u, v) > 0 for all random moves (u, v) ∈ E(u) such that (u,v)∈E(u) p(u, v) = 1. Leibniz International Proceedings in Informatics Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

Strategies
Effective payoffs
Stochastic games with perfect information
Main results
Applications of the total payoff
Characterization of pure stationary optima in total MDPs
Potential transformation
Characterization of pure and stationary optima
LP formulation
General MDPs
Discounted BWR-games
Existence of a saddle point in positional strategies
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call