Abstract

We consider a class of discrete-time two person zero-sum Markov games with Borel state and action spaces, and possibly unbounded payoffs. The game evolves according to the recursive equation xn+1=F(xn,an,bn,ξn),n=0,1,…, where the disturbance process {ξn} is formed by independent and identically distributed Rk-valued random vectors, which are observable but their common density ρ∗ is unknown for both players. Combining suitable methods of statistical estimation of ρ∗ with optimization procedures, we construct a pair of average optimal strategies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call