Abstract

We consider a two-person zero-sum stochastic game with an infinite-time horizon. The payoff is a linear combination of expected total discounted rewards with different discount factors. For a model with a countable state space and compact action sets, we characterize the set of persistently optimal (subgame perfect) policies. For a model with finite state and action sets and with perfect information, we prove the existence of an optimal pure Markov policy, which is stationary from some epoch onward, and we describe an algorithm to compute such a policy. We provide an example that shows that an optimal policy, which is stationary after some step, may not exist for weighted discounted sequential games with finite state and action sets and without the perfect information assumption. We also provide examples of similar phenomena of nonstationary behavior for the following two classes of problems with weighted discounted criteria: (i) models with one controller and with finite state and compact action sets, and (ii) nonzero-sum games with perfect information and with finite state and action sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call