Markov decision processes and stochastic games with total effective payoff

Endre Boros,Vladimir Gurvich,Kazuhisa Makino,Khaled Elbassioni

doi:10.1007/s10479-018-2898-8

Abstract

We consider finite Markov decision processes with undiscounted total effective payoff. We show that there exist uniformly optimal pure and stationary strategies that can be computed by solving a polynomial number of linear programs. This implies that in a two-player zero-sum stochastic game with perfect information and with total effective payoff there exists a stationary best response to any stationary strategy of the opponent. From this, we derive the existence of a uniformly optimal pure and stationary saddle point. Finally we show that mean payoff can be viewed as a special case of total payoff.

Highlights

1.1 Basic concepts1.1.1 Markov decision proccessesWe will consider Markov decision processes (MDPs) with total effective payoff
If there are no random nodes in the MDP, a uniformly optimal stationary strategy can be found by a combinatorial algorithm that solves a polynomial number of minimum mean-cycle problems [18]; we omit the details from this version
Total payoff MDPs/games considered in this paper can be thought of as a generalization of shortest path problems/games, when we do not assume that there is a single terminal

Summary

Markov decision proccesses

We will consider Markov decision processes (MDPs) with total effective payoff. Let G = (V, E) be a finite directed graph (digraph) in which loops and multiple arcs are allowed. The vertices v ∈ V are called positions (or states) and the arcs e ∈ E are called moves (or transitions). The vertex-set V is partitioned into two subsets V = VW ∪ VR that correspond to white and random positions, controlled respectively, by a player (decision maker), who will be called Max, and by nature. Let us denote by E(u) the set of arcs leaving u and assume that E(u) = ∅ in every position u ∈ V. For all random positions u ∈ VR we are given probabilities p(u, v) > 0 for all random moves (u, v) ∈ E(u) such that (u,v)∈E(u) p(u, v) = 1. Leibniz International Proceedings in Informatics Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

Strategies

Effective payoffs

Stochastic games with perfect information

Main results

Applications of the total payoff

Characterization of pure stationary optima in total MDPs

Potential transformation

Characterization of pure and stationary optima

LP formulation

General MDPs

Discounted BWR-games

Existence of a saddle point in positional strategies

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Annals of Operations Research	Publication Date: May 28, 2018
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Markov decision processes and stochastic games with total effective payoff

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Annals of Operations Research

Lead the way for us

Similar Papers

Markov decision processes and stochastic games with total effective payoff
...
-
, et. al. ...
01 Jan 2015
01 Jan 2015

Contraction Mappings in the Theory Underlying Dynamic Programming
Eric V Denardo
SIAM Review | VOL. 9
Eric V DenardoEric V Denardo
01 Apr 1967
SIAM Review | VOL. 9

Quantitative stochastic parity games
...
-
, et. al. ...
11 Jan 2004
11 Jan 2004

Semi-Markov and Jump Markov Controlled Models: Average Cost Criterion
M Yu Kitayev
Theory of Probability & Its Applications | VOL. 30
M Yu KitayevM Yu Kitayev
01 Jun 1986
Theory of Probability & Its Applications | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Markov decision processes and stochastic games with total effective payoff

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Annals of Operations Research