Reinforcement Learning in POMDPs With Memoryless Options and Option-Observation Initiation Sets

Denis Steckelmacher,Peter Vrancx,Ann Nowé,Hélène Plisnier,Anna Harutyunyan,Diederik Roijers

doi:10.1609/aaai.v32i1.11606

Denis Steckelmacher, Peter Vrancx + Show 4 more

Open Access

https://doi.org/10.1609/aaai.v32i1.11606

Copy DOI

Abstract

Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the initiation set of options conditional on the previously-executed option, and show that options with such Option-Observation Initiation Sets (OOIs) are at least as expressive as Finite State Controllers (FSCs), a state-of-the-art approach for learning in POMDPs. OOIs are easy to design based on an intuitive description of the task, lead to explainable policies and keep the top-level and option policies memoryless. Our experiments show that OOIs allow agents to learn optimal policies in challenging POMDPs, while being much more sample-efficient than a recurrent neural network over options.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Apr 29, 2018
Citations: 8	License type: cc-by-sa

R Discovery Prime

R Discovery Prime

Reinforcement Learning in POMDPs With Memoryless Options and Option-Observation Initiation Sets

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

HTN-Style Planning in Relational POMDPs Using First-Order FSCs
Felix Müller ... Susanne Biundo
-
Felix Müller, et. al.Felix Müller ... Susanne Biundo
01 Jan 2010
01 Jan 2010

Verifiable RNN-Based Policies for POMDPs Under Temporal Logic Constraints
Steven Carr ... Nils Jansen
-
Steven Carr, et. al.Steven Carr ... Nils Jansen
01 Jul 2020
01 Jul 2020

Low-level finite state control of knee joint in paraplegic standing
A.J Mulder ... G Zilvold
Journal of Biomedical Engineering | VOL. 14
A.J Mulder, et. al.A.J Mulder ... G Zilvold
01 Jan 1992
Journal of Biomedical Engineering | VOL. 14

A model-free method based on RDPG for fiber diameter steady control
Yang Cao ... Honggang Wang
Optical Fiber Technology | VOL. 83
Yang Cao, et. al.Yang Cao ... Honggang Wang
20 Jan 2024
Optical Fiber Technology | VOL. 83

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reinforcement Learning in POMDPs With Memoryless Options and Option-Observation Initiation Sets

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence