Abstract

Nowadays, Reinforcement Learning (RL) is applied to various real-world tasks and attracts much attention in the fields of games, robotics, and autonomous driving. It is very challenging and devices overwhelming to directly apply RL to real-world environments. Due to the reality gap simulated environment does not match perfectly to the real-world scenario and additional learning cannot be performed. Therefore, an efficient approach is required for RL to find an optimal control policy and get better learning efficacy. In this paper, we propose federated reinforcement learning based on multi agent environment which applying a new federation policy. The new federation policy allows multi agents to perform learning and share their learning experiences with each other e.g., gradient and model parameters to increase their learning level. The Actor-Critic PPO algorithm is used with four types of RL simulation environments, OpenAI Gym's CartPole, MoutainCar, Acrobot, and Pendulum. In addition, we did real experiments with multiple Rotary Inverted Pendulum (RIP) to evaluate and compare the learning efficiency of the proposed scheme with both environments.

Highlights

  • Reinforcement learning has been applied to games, robotics, and autonomous driving which required precise control and accurate results [1]–[5]

  • RELATED WORK Federated reinforcement learning is a type of multi-agent reinforcement learning [22] which is used for distributed agents system, such as games, robotics systems, and autonomous driving [2]–[5]

  • For the federated reinforcement learning that applied the federation policy proposed for MountainCarContinuous, Acrobot, and Pendulum, the learning respectively ended at episodes 455, 1222, and 3196

Read more

Summary

INTRODUCTION

Reinforcement learning has been applied to games, robotics, and autonomous driving which required precise control and accurate results [1]–[5]. We are motivated by previous research trends to apply federated multi-agent reinforcement learning to multiple real devices and improve learning performance. For the simulation and real device experiment we used Actor-Critic Proximal Policy Optimization (Actor-Critic PPO) [18]–[20],which the best performance for an agent-based reinforcement learning algorithm among other policy gradient methods. It includes Trust Region Policy Optimization (TRPO) [21], which exhibits low computation and high performance. The main contributions of the paper are summarized as follows: 1) We propose an extended federated reinforcement learning approach to allow multi-agent for controlling the simulation and a RIP system.

RELATED WORK
FEDERATED REINFORCEMENT LEARNING FOR ACCELERATION
EXPERIMENTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call