Strategic best-response learning in multiagent systems

Bikramjit Banerjee,Jing Peng

doi:10.1080/13623079.2011.571819

Abstract

We present a novel and uniform formulation of the problem of reinforcement learning against bounded memory adaptive adversaries in repeated games, and the methodologies to accomplish learning in this novel framework. First we delineate a novel strategic definition of best response that optimises rewards over multiple steps, as opposed to the notion of tactical best response in game theory. We show that the problem of learning a strategic best response reduces to that of learning an optimal policy in a Markov Decision Process (MDP). We deal with both finite and infinite horizon versions of this problem. We adapt an existing Monte Carlo based algorithm for learning optimal policies in such MDPs over finite horizon, in polynomial time. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, simple experiments in the Prisoner's Dilemma, and coordination games show that even when no extra domain knowledge (besides that an upper bound on the opponent's memory size is known) is assumed, the error can still be small. We also experiment with a general infinite-horizon learner (using function-approximation to tackle the complexity of history space) against a greedy bounded memory opponent and show that while it can create and exploit opportunities of mutual cooperation in the Prisoner's Dilemma game, it is cautious enough to ensure minimax payoffs in the Rock–Scissors–Paper game.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Strategic best-response learning in multiagent systems

Abstract

Talk to us

Similar Papers

More From: Journal of Experimental & Theoretical Artificial Intelligence

Lead the way for us

Journal: Journal of Experimental & Theoretical Artificial Intelligence	Publication Date: Jun 1, 2012
Citations: 26

Similar Papers

Efficient learning of multi-step best response
Bikramjit Banerjee ... Jing Peng
-
Bikramjit Banerjee, et. al.Bikramjit Banerjee ... Jing Peng
25 Jul 2005
25 Jul 2005

Emergence of super cooperation of prisoner's dilemma games on scale-free networks.
Angsheng Li ... Xi Yong
PLOS ONE | VOL. 10
Angsheng Li, et. al.Angsheng Li ... Xi Yong
02 Feb 2015
PLOS ONE | VOL. 10

Spatial evolutionary game theory: Hawks and Doves revisited
...
Proceedings of the Royal Society of London. Series B: Biological Sciences | VOL. 263
, et. al. ...
22 Sep 1996
Proceedings of the Royal Society of London. Series B: Biological Sciences | VOL. 263

Locus of control and learning to cooperate in a prisoner's dilemma game
Christophe Boone ... Arjen Van Witteloostuijn
Personality and Individual Differences | VOL. 32
Christophe Boone, et. al.Christophe Boone ... Arjen Van Witteloostuijn
08 Mar 2002
Personality and Individual Differences | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Strategic best-response learning in multiagent systems

Abstract

Talk to us

Similar Papers

More From: Journal of Experimental & Theoretical Artificial Intelligence