Simplifying Optimal Strategies in Stochastic Games

J Flesch,O J Vrieze,F Thuijsman

doi:10.1137/s0363012996311940

J Flesch, O J Vrieze + Show 1 more

Open Access

https://doi.org/10.1137/s0363012996311940

Copy DOI

Abstract

We deal with zero-sum limiting average stochastic games. We show that the existence of arbitrary optimal strategies implies the existence of $-optimal strategies, for all >0$, and the existence of optimal strategies. We present such a construction for which we do not even need to know these optimal strategies. Furthermore, an example demonstrates that the existence of optimal strategies is not implied by the existence of optimal strategies, so the result is sharp. More generally, one can evaluate a strategy $\pi $ for the maximizing player, player 1, by the reward $\phi _s(\pi )$ that $\pi $ guarantees to him when starting in state s. A strategy $\pi $ is called nonimproving if $\phi _s(\pi )\geq \phi _s(\pi [h])$ for all s and for all finite histories h with final state s, where $\pi [h]$ is the strategy $\pi $ conditional on the history h. Using the evaluation $\phi $, we may define the relation $\varepsilon between strategies. A strategy $\pi ^1 $ is called $-better than $\pi ^2$ if $\phi _s(\pi ^1)\geq \phi _s(\pi ^2)-\varepsilon $ for all $s$. We show that for any nonimproving strategy $\pi $, for all >0$, there exists an $% \varepsilon $-better strategy and a (0-)better strategy as well. Since all optimal strategies are nonimproving, this result can be regarded as a generalization of the above result for optimal strategies. Finally, we briefly discuss some other extensions. Among others, we indicate possible simplifications of strategies that are only optimal for particular initial states by almost stationary $-optimal strategies, for all >0$, and by almost Markov optimal strategies. We also discuss the validity of the above results for other reward functions. Several examples clarify these issues.

Full Text