Abstract
We consider a class of discrete-time two person zero-sum Markov games with Borel state and action spaces, and possibly unbounded payoffs. The game evolves according to the recursive equation xn+1=F(xn,an,bn,ξn),n=0,1,…, where the disturbance process {ξn} is formed by independent and identically distributed Rk-valued random vectors, which are observable but their common density ρ∗ is unknown for both players. Combining suitable methods of statistical estimation of ρ∗ with optimization procedures, we construct a pair of average optimal strategies.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have