Abstract

A dynamical systems perspective on multi-agent learning, based on the link between evolutionary game theory and reinforcement learning, provides an improved, qualitative understanding of the emerging collective learning dynamics. However, confusion exists with respect to how this dynamical systems account of multi-agent learning should be interpreted. In this article, I propose to embed the dynamical systems description of multi-agent learning into different abstraction levels of cognitive analysis. The purpose of this work is to make the connections between these levels explicit in order to gain improved insight into multi-agent learning. I demonstrate the usefulness of this framework with the general and widespread class of temporal-difference reinforcement learning. I find that its deterministic dynamical systems description follows a minimum free-energy principle and unifies a boundedly rational account of game theory with decision-making under uncertainty. I then propose an on-line sample-batch temporal-difference algorithm which is characterized by the combination of applying a memory-batch and separated state-action value estimation. I find that this algorithm serves as a micro-foundation of the deterministic learning equations by showing that its learning trajectories approach the ones of the deterministic learning equations under large batch sizes. Ultimately, this framework of embedding a dynamical systems description into different abstraction levels gives guidance on how to unleash the full potential of the dynamical systems approach to multi-agent learning.

Highlights

  • I compare the deterministic learning equations (Sect. 3) with the sample-batch algorithm (Sect. 4) to show that their learning trajectories match under large batch sizes and the sample-batch algorithm can be seen as an algorithmic foundation of the deterministic learning dynamics

  • The first environment I use as a testbed is a one-agent stochastic game, i.e., a Markov decision process

  • I proposed to regard the replicator reinforcement learning dynamics perspective on multi-agent learning as a level of cognitive analysis

Read more

Summary

Introduction

A sound understanding of multi-agent systems is relevant for other fields, such as biology [68], economics [28], Germany sustainability [10] and social sciences in general [19]. The agents need to learn an appropriate course of actions by themselves. Stochastic games are a formal model for multi-agent environment systems. They generalize both repeated normal form games and Markov decision processes (MDPs). MDPs are generalized by introducing multiple agents. Repeated games are generalized by introducing an environment with multiple states and transition probabilities between those states.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call