Adaptive Modular Reinforcement Learning for Robot Controlled in Multiple Environments

Teppei Iwata,Takeshi Shibuya

doi:10.1109/access.2021.3070704

Teppei Iwata, Takeshi Shibuya

Open Access

https://doi.org/10.1109/access.2021.3070704

Copy DOI

Abstract

This paper proposes an adaptive modular reinforcement learning architecture and an algorithm for robot control operating in multiple environments. Reinforcement learning autonomously acquires control rules by interacting between the agent and the controlled system. Consequently, reinforcement learning is expected to be applied to robot control where model identification is difficult. These robots are often expected to operate in multiple environments. However, existing reinforcement learning algorithms require prior knowledge of changes in the environment. In this paper, we proposed an architecture and algorithm that does not require prior knowledge of the environment. In this architecture, the policy can be acquired by increasing the number of modules based on the interaction with the controlled system. Therefore, the proposed method can be applied to robots whose dynamics change without losing the feature that the reinforcement learning algorithm does not require prior knowledge of the controlled system. Two numerical experiments were conducted to evaluate the proposed method, which improved the performance by approximately 25 % when compared to the conventional methods.

Highlights

Reinforcement learning is a framework for machine learning that optimizes the operation of a system by solving multistep decision-making problems [1]
Reinforcement learning applicable to robot control, where the state transition is deterministic but the dynamics of the controlled systemis complex, and model identification is difficult
ADAPTIVE MODULAR LINEAR QUADRATIC TRACKING CONTROLLER we propose an adaptive modular linear quadratic tracker (AMLQT), which is an implementation of adaptive modular reinforcement learning architecture for solving problems formulated by III

Summary

Introduction

Reinforcement learning is a framework for machine learning that optimizes the operation of a system by solving multistep decision-making (action decision) problems [1]. Reinforcement learning consists of an agent (controller) and a controlled system. The agent receives states obtained from the controlled systemand the reward that indicate the control performance. The agent chooses the action (control input) that maximizes the accumulated reward by trial and error. Reinforcement learning has the feature that the policy can be acquired autonomously through the interaction between the agent and the controlled system. Because of this feature, reinforcement learning applicable to robot control, where the state transition is deterministic but the dynamics of the controlled systemis complex, and model identification is difficult. Reinforcement learning has been applied to various robot control tasks [2]–[6] such as reconfigurable robots [7]–[9] and trajectory control for manipulators [10]

Methods

Results

Conclusion