Abstract
We focus on learning during development in a group of individuals that play a competitive game with each other. The game has two actions and there is negative frequency dependence. We define the distribution of actions by group members to be an equilibrium configuration if no individual can improve its payoff by unilaterally changing its action. We show that at this equilibrium, one action is preferred in the sense that those taking the preferred action have a higher payoff than those taking the other, more prosocial, action. We explore the consequences of a simple ‘unbiased’ reinforcement learning rule during development, showing that groups reach an approximate equilibrium distribution, so that some achieve a higher payoff than others. Because there is learning, an individual’s behaviour can influence the future behaviour of others. We show that, as a consequence, there is the potential for an individual to exploit others by influencing them to be the ones to take the non-preferred action. Using an evolutionary simulation, we show that population members can avoid being exploited by over-valuing rewards obtained from the preferred option during learning, an example of a bias that is ‘rational’.
Highlights
In this paper we are concerned with a group that stays together for some time, perhaps during development, with group members competing with each other for a resource such as food
As a consequence, there is the potential for an individual to exploit others by influencing them to be the ones to take the non-preferred action
For example in the classic Producer-Scrounger game [1, 2], producers, who search for food, benefit scroungers, who exploit the food that has been found by others
Summary
OPEN ACCESS Citation: McNamara JM, Houston AI, Leimar O (2021) Learning, exploitation and bias in games. We focus on learning during development in a group of individuals that play a competitive game with each other. We define the distribution of actions by group members to be an equilibrium configuration if no individual can improve its payoff by unilaterally changing its action. We explore the consequences of a simple ‘unbiased’ reinforcement learning rule during development, showing that groups reach an approximate equilibrium distribution, so that some achieve a higher payoff than others. As a consequence, there is the potential for an individual to exploit others by influencing them to be the ones to take the non-preferred action. We show that population members can avoid being exploited by over-valuing rewards obtained from the preferred option during learning, an example of a bias that is ‘rational’
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.