Abstract

Reinforcement learning refers to a machine learning paradigm in which an agent interacts with the environment to learn how to perform a task. The characteristics of the environment may change over time or be affected by disturbances not controlled, avoiding the agent finding a proper policy. Some approaches attempt to address these problems, as interactive reinforcement learning, where an external entity helps the agent learn through advice. Other approaches, such as robust reinforcement learning, allow the agent to learn the task, acting in a disturbed environment. In this paper, we propose an approach that addresses interactive reinforcement learning problems in a dynamic environment, where advice provides information on the task and the dynamics of the environment. Thus, an agent learns a policy in a disturbed environment while receiving advice. We implement our approach in the dynamic version of the cart-pole balancing task and a simulated robotic arm dynamic environment to organize objects. Our results show that the proposed approach allows an agent to complete the task satisfactorily in a dynamic, continuous state-action domain. Moreover, experimental results suggest agents trained with our approach are less sensitive to changes in the characteristics of the environment than interactive reinforcement learning agents.

Highlights

  • Reinforcement Learning (RL) is a learning approach that tries to solve the problem of an agent interacting with the environment to learn the desired task autonomously

  • Our paper is organized as follows: In Section II, we present the basics of Reinforcement Learning, actor-critic and soft actor-critic, and in Section III, we describe the interactive feedback and the dynamic approach used in this paper

  • We present our results in two separates environments, cart-pole balancing and the simulated robotic arm

Read more

Summary

Introduction

Reinforcement Learning (RL) is a learning approach that tries to solve the problem of an agent interacting with the environment to learn the desired task autonomously. The agent must be able to sense a state from the environment and take actions that affect it to reach a new state. The agent receives a reward signal from the environment that it tries to maximize throughout the learning [1]. The agent learns from its own experience, taking actions, and discovering which ones produce the greatest reward. The agent does not learn the policy directly, but it can approximate it, storing values for each state-action pair. Many RL algorithms approximate the value function, which maps state or state-action pairs to an expected reward amount

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.