Abstract
This paper studies reinforcement learning in which players base their action choice on valuations they have for the actions. We identify two general conditions on valuation updating rules that together guarantee that the probability of playing a subgame perfect Nash equilibrium (SPNE) converges to one in games where no player is indifferent between two outcomes without every other player being also indifferent. The same conditions guarantee that the fraction of times a SPNE is played converges to one almost surely. We also show that for additively separable valuations, in which valuations are the sum of empirical and error terms, the conditions guaranteeing convergence can be made more intuitive. In addition, we give four examples of valuations that satisfy our conditions. These examples represent different degrees of sophistication in learning behavior and include well-known examples of reinforcement learning.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.