Learning models in interdependence situations

Willem Horst ,Van Der

doi:10.6100/ir709305

Abstract

Many approaches to learning in games fall into one of two broad classes: reinforcement and belief learning models. Reinforcement learning assumes that successful past actions have a higher probability to be played in the future. Belief learning assumes that players have beliefs about which action the opponent(s) will choose and that players determine their own choice of action by finding the action with the highest payoff given the beliefs about the actions of others. Belief learning and (a specific type of) reinforcement learning are special cases of a hybrid learning model called Experience Weighted Attraction (EWA). Some previous studies explicitly state that it is difficult to determine the underlying process (either reinforcement learning, belief learning, or something else) that generated the data for several games. This leads to the main question of this thesis: Can we distinguish between different types of EWA-based learning, with reinforcement and belief learning as special cases, in repeated 2 x 2 games? In Chapter 2 we derive predictions for behavior in three types of games using the EWA learning model using the concept of stability: there is a large probability that all players will make the same choice in round t +1 as in t. Herewith, we conclude that belief and reinforcement learning can be distinguished, even in 2 x 2 games. Maximum differentiation in behavior resulting from either belief or reinforcement learning is obtained in games with Nash equilibria with negative payoffs and at least one other strategy combination with only positive payoffs. Our results help researchers to identify games in which belief and reinforcement learning can be discerned easily. Our theoretical results imply that the learning models can be distinguished after a sufficient number of rounds have been played, but it is not clear how large that number needs to be. It is also not clear how likely it is that stability actually occurs in game play. Thereto, we also examine the main question by simulating data from learning models in Chapter 3. We use the same three types of 2 x 2 games as before and investigate whether we can discern between reinforcement and belief learning in an experimental setup. Our conclusion is that this is also possible, especially in games with positive payoffs and in the repeated Prisoner’s Dilemma game, even when the repeated game has a relatively small number of rounds. We also show that other characteristics of the players’ behavior, such as the number of times a player changes strategy and the number of strategy combinations the player uses, can help differentiate between the two learning models. So far, we only considered pure belief and pure reinforcement learning, and nothing in between. For Chapter 4, we therefore consider a broader class of learning models and we try to find under which conditions, we can re-estimate three parameters of EWA learning model from simulated data, generated for different games and scenarios. The results show low rates of convergence of the estimation algorithm, and even if the algorithm converges then biased estimates of the parameters are obtained most of the time. Hence, we must conclude that re-estimating the exact parameters in a quantitative manner is difficult in most experimental setups. However, qualitatively we can find patterns that pinpoint in the direction of either belief or reinforcement learning. Finally, in the last chapter, we study the effect of a player’s social preferences on his own payoff in 2 x 2 games with only a mixed strategy equilibrium, under the assumption that the other player has no social preferences. We model social preferences with the Fehr-Schmidt inequity aversion model, which contains parameters for and spite. Eighteen different mixed equilibrium games are identified that can be classified into Regret games, Risk games, and RiskRegret games, with six games in each class. The effects of envy and spite in these games are studied in five different status scenarios in which the player with social preferences receives much higher, mostly higher, about equal, mostly lower, or much lower payoffs. The theoretical and simulation results reveal that the effects of social preferences are variable across scenarios and games, even within scenario-game combinations. However, we can conclude that the effects of envy and spite are analogous, on average beneficial to the player with the social preferences, and most positive when the payoffs are about equal and in Risk games.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning models in interdependence situations

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Analyzing behavior implied by EWA learning: An emphasis on distinguishing reinforcement from belief learning
W Van Der Horst ... M.A.L.M Van Assen
Journal of Mathematical Psychology | VOL. 54
W Van Der Horst, et. al.W Van Der Horst ... M.A.L.M Van Assen
17 Dec 2009
Journal of Mathematical Psychology | VOL. 54

Estimating the Experience-Weighted Attractions for the Migration-Emission Game
Michinori Uwasu
Theoretical Economics Letters | VOL. 02
Michinori UwasuMichinori Uwasu
01 Jan 2012
Theoretical Economics Letters | VOL. 02

Heterogeneity in generalized reinforcement learning and its relation to cognitive ability
Shu-Heng Chen ... Ye-Rong Du
Cognitive Systems Research | VOL. 42
Shu-Heng Chen, et. al.Shu-Heng Chen ... Ye-Rong Du
19 Nov 2016
Cognitive Systems Research | VOL. 42

The Strategy Evolution in Double Auction Based on the Experience-Weighted Attraction Learning Model
Qian Yu ... Yaqin Liu
IEEE Access | VOL. 7
Qian Yu, et. al.Qian Yu ... Yaqin Liu
01 Jan 2019
IEEE Access | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning models in interdependence situations

Abstract

Talk to us

Similar Papers