Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment.

Quang Dang Nguyen,Mikhail Prokopenko

doi:10.3389/frobt.2020.00123

Abstract

We describe and evaluate a neural network-based architecture aimed to imitate and improve the performance of a fully autonomous soccer team in RoboCup Soccer 2D Simulation environment. The approach utilizes deep Q-network architecture for action determination and a deep neural network for parameter learning. The proposed solution is shown to be feasible for replacing a selected behavioral module in a well-established RoboCup base team, Gliders2d, in which behavioral modules have been evolved with human experts in the loop. Furthermore, we introduce an additional performance-correlated signal (a delayed reward signal), enabling a search for local maxima during a training phase. The extension is compared against a known benchmark. Finally, we investigate the extent to which preserving the structure of expert-designed behaviors affects the performance of a neural network-based solution.

Highlights

Deep Learning provided a major breakthrough in Artificial Intelligence (AI)
We extend the problem of imitation learning and transform it into a reinforcement learning (RL) framework with an Markov decision process (MDP), with 5-tuple {State S, Action A, Reward R, Transition Probability P, Discount Rate γ }
We examine the question of whether the structure of a expert-designed system is worth preserving, by analyzing the performance of this structure maintained with either two replacement blocks processed in sequence, or a replaced structure comprising a single neural network managing a selection of all defensive behaviors

Summary

Introduction

Deep Learning provided a major breakthrough in Artificial Intelligence (AI). Due to high capability of generalization after training on a large number of samples, a neural network is able to learn complex multivariate functions, both linear and non-linear. One of the important problems in robotics is to determine situation-based and goal-directed actions for agents. This problem has been addressed by deep learning algorithms developed along two promising directions: Imitation Learning (IL) and Reinforcement Learning (RLs). A way to deal with these difficulties is offered by IL In this scheme, the agents try to mimic existing expert-designed systems using a large number of demonstrations, hoping to replicate the expert systems’ performance. The objective of RCSS is to create an environment, based on football game rules, which can benchmark the performance of different multi-agent solutions and architectures (Kitano et al, 1997, 1998; Noda et al, 1998)

Objectives

Results

Conclusion