A Confrontation Decision-Making Method with Deep Reinforcement Learning and Knowledge Transfer for Multi-Agent System

Chunyang Hu

doi:10.3390/sym12040631

Abstract

In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.

Highlights

Reinforcement learning is a type of machine learning method for robot learning [2]
Reinforcement learning is mainly applied to many interactive behaviors and decision-making problems—such as video games, robot control systems, human-computer dialogue, etc, which cannot be well dealt with by the well-known supervised learning and unsupervised learning methods [4,5,6,7]
Compared with the method of learning from scratch, the proposed method can improve the learning performance for the reinforcement learning (RL) model in a new task

Summary

Introduction

Reinforcement learning is a type of machine learning method for robot learning [2]. The learning agent receives a reward from the environment. Through continuous interaction with the environment, a learning agent can achieve the goal. Reinforcement learning is mainly applied to many interactive behaviors and decision-making problems—such as video games, robot control systems, human-computer dialogue, etc, which cannot be well dealt with by the well-known supervised learning and unsupervised learning methods [4,5,6,7]. Reinforcement learning is a learning process in which the agent constantly interacts with the environment to gain learning experience, which has the characteristics of active learning and adaptive learning. The policy model allows NPC to choose an action using an action policy

Methods

Results

Conclusion