Learning adversarial policy in multiple scenes environment via multi-agent reinforcement learning

Yang Li,Xinzhi Wang,Wei Wang,Zhenyu Zhang,Jianshu Wang,Xiangfeng Luo,Shaorong Xie

doi:10.1080/09540091.2020.1832961

Abstract

ABSTRACT Learning adversarial policy aims to learn behavioural strategies for agents with different goals, is one of the most significant tasks in multi-agent systems. Multi-agent reinforcement learning (MARL), as a state-of-the-art learning-based model, employs centralised or decentralised control methods to learn behavioural strategies by interacting with environments. It suffers from instability and slowness in the training process. Considering that parallel simulation or computation is an effective way to improve training performance, we propose a novel MARL method called Multiple scenes multi-agent proximal Policy Optimisation (MPO) in this paper. In MPO, we first simulate multiple parallel scenes in the training environment. Multiple policies control different agents in the same scene, and each policy also controls several identical agents from multiple scenes. Then, we expand proximal policy optimisation (PPO) with an improved actor-critic network, ensuring the stability of training in multi-agent tasks. The actor network only uses local information for decision making, and the critic network uses global information for training. Finally, effective training trajectories are computed with two criteria from multiple parallel scenes rather than single to accelerate the learning process. We evaluate our approach in two simulated 3D environments, one of which is Unity's official open-source soccer game, and the other is unmanned surface vehicles (USVs) built by Unity. Experiments demonstrate that MPO converges more stable and faster than benchmark methods in model training, and demonstrates excellent adversarial policy compared with benchmark models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning adversarial policy in multiple scenes environment via multi-agent reinforcement learning

Abstract

Talk to us

Similar Papers

More From: Connection Science

Lead the way for us

Journal: Connection Science	Publication Date: Nov 3, 2020
Citations: 5

Similar Papers

Data driven hybrid edge computing-based hierarchical task guidance for efficient maritime escorting with multiple unmanned surface vehicles
Jiajia Xie ... Yan Peng
Peer-to-Peer Networking and Applications | VOL. 13
Jiajia Xie, et. al.Jiajia Xie ... Yan Peng
06 Mar 2020
Peer-to-Peer Networking and Applications | VOL. 13

Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning
Jiawei Xia ... Zhong Liu
Defence Technology | VOL. 29
Jiawei Xia, et. al.Jiawei Xia ... Zhong Liu
11 Oct 2022
Defence Technology | VOL. 29

Policy Adaptive Multi-agent Deep Deterministic Policy Gradient
Yixiang Wang ... Feng Wu
-
Yixiang Wang, et. al.Yixiang Wang ... Feng Wu
01 Jan 2020
01 Jan 2020

Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay
Xiaoying Sun ... Jinchao Chen
-
Xiaoying Sun, et. al.Xiaoying Sun ... Jinchao Chen
03 Oct 2022
03 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning adversarial policy in multiple scenes environment via multi-agent reinforcement learning

Abstract

Talk to us

Similar Papers

More From: Connection Science