Abstract

Hierarchical reinforcement learning works on temporally extended actions or skills to facilitate learning. How to automatically form such abstraction is challenging, and many efforts tackle this issue in the options framework. While various approaches exist to construct options from different perspectives, few of them concentrate on options' adaptability during learning. This paper presents an algorithm to create options and enhance their quality online. Both aspects operate on detected communities of the learning environment's state transition graph. We first construct options from initial samples as the basis of online learning. Then a rule-based community revision algorithm is proposed to update graph partitions, based on which existing options can be continuously tuned. Experimental results in two problems indicate that options from initial samples may perform poorly in more complex environments, and our presented strategy can effectively improve options and get better results compared with flat reinforcement learning.

Highlights

  • Reinforcement learning (RL) is a machine learning branch where an agent learns to optimize its behavior by trialand-error interaction with its environment

  • As solving small-scale subproblems would be simpler than solving the entire one, Hierarchical Reinforcement Learning (HRL) is expected to be more efficient than flat RL

  • The remainder of this paper is organized as follows: In Section 2 we describe some basic ideas of RL and the options framework

Read more

Summary

Introduction

Reinforcement learning (RL) is a machine learning branch where an agent learns to optimize its behavior by trialand-error interaction with its environment. Traditional RL researches suffer the inability of use in complex practical problems due to the so-called “Curse-of-Dimensionality,” that is, the exponential growth of memory requirements with the number of state variables. Hierarchical Reinforcement Learning (HRL) aims to reduce the dimensionality through decomposing the RL problem into several subproblems. In the HRL research community, three main frameworks, HAM [1], options framework [2], and MAXQ [3], provide different paradigms of problem hierarchies and learning methodologies. These all make HRL work on temporally extended actions or skills. How to automatically form useful abstractions, or skill acquisition, is an attractive issue

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call