Reinforcement learning with foregone payoff information in normal form games

Naoki Funai

doi:10.1016/j.jebo.2022.06.021

Abstract

This paper studies the reinforcement learning of Erev and Roth with foregone payoff information in normal form games: players observe not only the realised payoffs but also foregone payoffs, the ones which they could have obtained if they had chosen the other actions. We provide conditions under which the reinforcement learning process almost surely converges to a regular quantal response equilibrium (Goeree et al. 2005). We also show that the reinforcement learning model and an adaptive learning model which nests experience-weighted attraction learning, payoff assessment learning and stochastic fictitious play learning models share the same asymptotic behaviour under the linear choice rule of the reinforcement learning model. In addition, we provide conditions under which the reinforcement learning process under the logit choice rule almost surely converges to a Nash equilibrium.

Full Text