Abstract

Given an arbitrary black-box strategy for the Iterated Prisoner’s Dilemma game, it is often difficult to gauge to which extent it can be exploited by other strategies. In the presence of imperfect public monitoring and resulting observation errors, deriving a theoretical solution is even more time-consuming. However, for any strategy the reinforcement learning algorithm Q-Learning can construct a best response in the limit case. In this article I present and discuss several improvements to the Q-Learning algorithm, allowing for an easy numerical measure of the exploitability of a given strategy. Additionally, I give a detailed introduction to reinforcement learning aimed at economists.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call