Abstract

In recent years hierarchical concepts of temporal abstraction have been integrated in the reinforcement learning framework to improve scalability. However, existing approaches are limited to domains where a decomposition into subtasks is known a priori. In this article we propose the concept of explicitly selecting time scale related abstract actions if no subgoal related abstract actions are available. This concept is realised with multi-step actions on different time scales that are combined in one single action set. We exploit the special structure of the action set in the MSA-Q-learning algorithm. This approach is suited for learning optimal policies in “unstructured” domains where a decomposition into subtasks is not known in advance or does not exist at all. By learning different explicitly specified time scales simultaneously, we achieve a considerable improvement of learning speed, which we demonstrate on several benchmark problems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.