Distributed Reinforcement Learning with Self-Play in Parameterized Action Space

Jun Ma,Guangda Chen,Jiakai Song,Jianmin Ji,Shunyi Yao

doi:10.1109/smc52423.2021.9659219

Abstract

Self-play has been shown to be effective to provide a proper training curriculum for a reinforcement learning agent in competitive multi-agent environments without direct supervision. However, its performance is still unstable for problems with sparse rewards, e.g., the scoring task with goalkeeper for robots in RoboCup soccer. It is challenging to solve these tasks in reinforcement learning, especially for those that require combining high-level actions with flexible control. To address these challenges, we introduce a distributed self-play training framework for an extended proximal policy optimization (PPO) algorithm that learns to act in parameterized action space and plays against a group of opponents, i.e., a league. Experiments on the domain of simulated RoboCup soccer show that, the approach is effective and learns more robust policies against various opponents compared to existing reinforcement learning methods. A demonstration video is available online at https://youtu.be/BuLli1vND4.

Full Text