Subgoal-Based Reward Shaping to Improve Efficiency in Reinforcement Learning

Takato Okudo,Seiji Yamada

doi:10.1109/access.2021.3090364

Abstract

Reinforcement learning, which acquires a policy maximizing long-term rewards, has been actively studied. Unfortunately, this learning type is too slow and difficult to use in practical situations because the state-action space becomes huge in real environments. Many studies have incorporated human knowledge into reinforcement Learning. Though human knowledge on trajectories is often used, a human could be asked to control an AI agent, which can be difficult. Knowledge on subgoals may lessen this requirement because humans need only to consider a few representative states on an optimal trajectory in their minds. The essential factor for learning efficiency is rewards. Potential-based reward shaping is a basic method for enriching rewards. However, it is often difficult to incorporate subgoals for accelerating learning over potential-based reward shaping. This is because the appropriate potentials are not intuitive for humans. We extend potential-based reward shaping and propose a subgoal-based reward shaping. The method makes it easier for human trainers to share their knowledge of subgoals. To evaluate our method, we obtained a subgoal series from participants and conducted experiments in three domains, four-rooms(discrete states and discrete actions), pinball(continuous and discrete), and picking(both continuous). We compared our method with a baseline reinforcement learning algorithm and other subgoal-based methods, including random subgoal and naive subgoal-based reward shaping. As a result, we found out that our reward shaping outperformed all other methods in learning efficiency.

Highlights

Reinforcement learning(RL) can acquire a policy maximizing long-term rewards in an environment
The performance of random subgoal potential-based reward shaping (RSRS) was close to human subgoal-based reward shaping (HSRS), but worse than HSRS
The performance of RSRS was close to HSRS in the four-rooms and pinball domains

Summary

Introduction

Reinforcement learning(RL) can acquire a policy maximizing long-term rewards in an environment. Designers do not need to specify how to achieve a goal; they only need to specify what a learning agent should achieve with a reward function. A reinforcement learning agent performs both exploration and exploitation to find how to achieve a goal by itself. It is common for the state-action space to be quite large in a real environment like robotics. Since a human could have knowledge that would be helpful to such an agent in some cases, a promising approach is utilizing human knowledge [1], [2], [3]. Wang and Taylor’s approach transfers a policy by requesting that a human selects an action for a state during the learning

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Subgoal-Based Reward Shaping to Improve Efficiency in Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Reward Shaping with Dynamic Trajectory Aggregation
Takato Okudo ... Seiji Yamada
-
Takato Okudo, et. al.Takato Okudo ... Seiji Yamada
18 Jul 2021
18 Jul 2021

Reward shaping using directed graph convolution neural networks for reinforcement learning and games
Jianghui Sang ... Hengfu Yin
Frontiers in Physics | VOL. 11
Jianghui Sang, et. al.Jianghui Sang ... Hengfu Yin
09 Nov 2023
Frontiers in Physics | VOL. 11

Learning Potential in Subgoal-Based Reward Shaping
Takato Okudo ... Seiji Yamada
IEEE Access | VOL. 11
Takato Okudo, et. al.Takato Okudo ... Seiji Yamada
01 Jan 2023
IEEE Access | VOL. 11

Policy invariance under reward transformations for multi-objective reinforcement learning
Patrick Mannion ... Enda Howley
Neurocomputing | VOL. 263
Patrick Mannion, et. al.Patrick Mannion ... Enda Howley
16 Jun 2017
Neurocomputing | VOL. 263

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Subgoal-Based Reward Shaping to Improve Efficiency in Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access