Multi-agent reinforcement learning with approximate model learning for competitive games.

Young Joon Park,Yoon Sang Cho,Seoung Bum Kim,Drew Fudenberg

doi:10.1371/journal.pone.0222215

Young Joon Park, Yoon Sang Cho + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0222215

Copy DOI

Journal: PloS one	Publication Date: Sep 11, 2019
Citations: 11	License type: CC BY 4.0

Affiliation: Korea University

Abstract

We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients that promote cooperation between agents by communication. The learning process does not require access to opponents’ parameters or observations because the agents are trained separately from the opponents. The actor networks enable the agents to communicate using forward and backward paths while the critic network helps to train the actors by delivering them gradient signals based on their contribution to the global reward. Moreover, to address nonstationarity due to the evolving of other agents, we propose approximate model learning using auxiliary prediction networks for modeling the state transitions, reward function, and opponent behavior. In the test phase, we use competitive multi-agent environments to demonstrate by comparison the usefulness and superiority of the proposed method in terms of learning efficiency and goal achievements. The comparison results show that the proposed method outperforms the alternatives.

Highlights

Multi-agent reinforcement learning has garnered attention by addressing many challenges, including autonomous vehicles [1], network packet delivery [2], distributed logistics [3], multiple robot control [4], and multiplayer games [5, 6]
We hypothesized that the approximate model learning using auxiliary prediction networks (AMLAPNs) would show better performance because they can perceive the intrinsic mechanism in the environments, including the adversarial team’s policies
The multi-agent deep deterministic policy gradients (MADDPG) shows the highest relative scores, we can identify the effectiveness of the AMLAPN in constantly improving the performance of the Competitive training framework using recurrent layers (CTRL)

Summary

Introduction

Multi-agent reinforcement learning has garnered attention by addressing many challenges, including autonomous vehicles [1], network packet delivery [2], distributed logistics [3], multiple robot control [4], and multiplayer games [5, 6]. Most of the recent work considers fully cooperative tasks and communication within agents [7, 8], yet multiagent competition is one of the crucial domains for multi-agent reinforcement learning. This task aims to coevolve two or more agents which interact with each other in the same environment. Competitive multi-agent reinforcement learning was behind the recent success of Go without human knowledge [9]. The competitive multi-agent environment provides agents with a customized curriculum to facilitate efficient learning and avoid local optimum [10]

Objectives

Methods

Results

Conclusion