Abstract

AbstractThis study is concerned with a game model involving repeated play of a matrix game with unknown entries; it is a two‐person, zero‐sum, infinite game of perfect recall. The entries of the matrix ((pij)) are selected according to a joint probability distribution known by both players and this unknown matrix is played repeatedly. If the pure strategy pair (i, j) is employed on day k, k = 1, 2, …, the maximizing player receives a discounted income of βk ‐ 1 Xij, where β is a constant, 0 ≤ β ⩽ 1, and Xij assumes the value one with probability pij or the value zero with probability 1 ‐ pij. After each trial, the players are informed of the triple (i, j, Xij) and retain this knowledge. The payoff to the maximizing player is the expected total discounted income.It is shown that a solution exists, the value being characterized as the unique solution of a functional equation and optimal strategies consisting of locally optimal play in an auxiliary matrix determined by the past history.A definition of an ϵ‐learning strategy pair is formulated and a theorem obtained exhibiting ϵ‐optimal strategies which are ϵ‐learning. The asymptotic behavior of the value is obtained as the discount tends to one.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call