Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor

Thomas Dueholm Hansen,Peter Bro Miltersen,Uri Zwick

doi:10.1145/2432622.2432623

Thomas Dueholm Hansen, Peter Bro Miltersen + Show 1 more

Open Access

https://doi.org/10.1145/2432622.2432623

Copy DOI

Abstract

Ye [2011] showed recently that the simplex method with Dantzig’s pivoting rule, as well as Howard’s policy iteration algorithm, solve discounted Markov decision processes (MDPs), with a constant discount factor, in strongly polynomial time. More precisely, Ye showed that both algorithms terminate after at most O ( mn 1− γ log n 1− γ ) iterations, where n is the number of states, m is the total number of actions in the MDP, and 0 < γ < 1 is the discount factor. We improve Ye’s analysis in two respects. First, we improve the bound given by Ye and show that Howard’s policy iteration algorithm actually terminates after at most O ( m 1− γ log n 1− γ ) iterations. Second, and more importantly, we show that the same bound applies to the number of iterations performed by the strategy iteration (or strategy improvement ) algorithm, a generalization of Howard’s policy iteration algorithm used for solving 2-player turn-based stochastic games with discounted zero-sum rewards. This provides the first strongly polynomial algorithm for solving these games, solving a long standing open problem. Combined with other recent results, this provides a complete characterization of the complexity the standard strategy iteration algorithm for 2-player turn-based stochastic games; it is strongly polynomial for a fixed discount factor, and exponential otherwise.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor

Abstract

Talk to us

Similar Papers

More From: Journal of the ACM

Lead the way for us

Journal: Journal of the ACM	Publication Date: Feb 1, 2013
Citations: 126

Similar Papers

Traffic Signal Control based on Markov Decision Process**This work is supported in part by the National Science Foundation of China (Grant No. 61374110, 61433002, 61221003), NSFC International Cooperation Project (Grant No. 71361130012).
Yunwen Xu ... Dewei Li
IFAC-PapersOnLine | VOL. 49
Yunwen Xu, et. al.Yunwen Xu ... Dewei Li
01 Jan 2015
IFAC-PapersOnLine | VOL. 49

Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
Bruno Scherrer
Mathematics of Operations Research | VOL. 41
Bruno ScherrerBruno Scherrer
10 Feb 2016
Mathematics of Operations Research | VOL. 41

Linear programming considerations on Markovian Decision Processes with no discounting
Shunji Osaki ... Hisashi Mine
Journal of Mathematical Analysis and Applications | VOL. 26
Shunji Osaki, et. al.Shunji Osaki ... Hisashi Mine
01 Apr 1969
Journal of Mathematical Analysis and Applications | VOL. 26

Continuous Average Control of Piecewise Deterministic Markov Processes
Oswaldo Luiz Do Valle Costa ... Francois Dufour
-
Oswaldo Luiz Do Valle Costa, et. al.Oswaldo Luiz Do Valle Costa ... Francois Dufour
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor

Abstract

Talk to us

Similar Papers

More From: Journal of the ACM