Spike-based Decision Learning of Nash Equilibria in Two-Player Games

Johannes Friedrich,Walter Senn

doi:10.1371/journal.pcbi.1002691

Johannes Friedrich, Walter Senn

Open Access

https://doi.org/10.1371/journal.pcbi.1002691

Copy DOI

Journal: PLoS Computational Biology	Publication Date: Sep 27, 2012
Citations: 40	License type: CC BY 4.0

Affiliation: University of Bern

Abstract

Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.

Highlights

Neuroeconomics is an interdisciplinary research field that tries to explain human decision making in neuronal terms
Multi-agent games, are not Markovian as the evolution of the environment typically does depend on the current state, and on the history and on the adaptation of the other agents. Such games can be described as partially observable Markov decision processes (POMDP, [6]) by embedding the sequences and the learning strategies of the other agents into a large state space
We have presented a policy gradient method for population reinforcement learning which, unlike temporal difference (TD)-learning, can cope with POMDPs and can be implemented in neuronal terms [7]

Summary

Introduction

Neuroeconomics is an interdisciplinary research field that tries to explain human decision making in neuronal terms. Classical models in neuroeconomics are based on temporal difference (TD) learning [1], an algorithm to maximize the total expected reward [2] with potential neuronal implementations [3,4] It assumes that the environment can be described as a Markov decision process (MDP), i.e. by a finite number of states with fixed transition probabilities [5]. Multi-agent games, are not Markovian as the evolution of the environment typically does depend on the current state, and on the history and on the adaptation of the other agents Such games can be described as partially observable Markov decision processes (POMDP, [6]) by embedding the sequences and the learning strategies of the other agents into a large state space. Maximizing one’s own payoff while assuming stationarity in the opponents strategy is called a fictitious play and conditions are studied when this play effectively converges to a stationary (Nash) equilibrium [8]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spike-based Decision Learning of Nash Equilibria in Two-Player Games

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology

Lead the way for us

Similar Papers

Local Search Methods for Finding a Nash Equilibrium in Two-Player Games
Sofia Ceppi ... Nicola Gatti
-
Sofia Ceppi, et. al.Sofia Ceppi ... Nicola Gatti
01 Aug 2010
01 Aug 2010

Hardness of Approximation Between P and NP
Aviad Rubinstein
-
Aviad RubinsteinAviad Rubinstein
30 May 2019
30 May 2019

Equilibria, fixed points, and complexity classes
Mihalis Yannakakis
Computer Science Review | VOL. 3
Mihalis YannakakisMihalis Yannakakis
25 Apr 2009
Computer Science Review | VOL. 3

Solution concepts for games with ambiguous payoffs
Dorian Beauchêne
Theory and Decision | VOL. 80
Dorian BeauchêneDorian Beauchêne
28 May 2015
Theory and Decision | VOL. 80

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spike-based Decision Learning of Nash Equilibria in Two-Player Games

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology