Automatic landmark discovery for learning agents under partial observability

Alper Demіr,Faruk Polat,Erkіn Çіlden

doi:10.1017/s026988891900002x

Abstract

AbstractIn the reinforcement learning context, a landmark is a compact information which uniquely couples a state, for problems with hidden states. Landmarks are shown to support finding good memoryless policies for Partially Observable Markov Decision Processes (POMDP) which contain at least one landmark. SarsaLandmark, as an adaptation of Sarsa(λ), is known to promise a better learning performance with the assumption that all landmarks of the problem are known in advance.In this paper, we propose a framework built upon SarsaLandmark, which is able to automatically identify landmarks within the problem during learning without sacrificing quality, and requiring no prior information about the problem structure. For this purpose, the framework fuses SarsaLandmark with a well-known multiple-instance learning algorithm, namely Diverse Density (DD). By further experimentation, we also provide a deeper insight into our concept filtering heuristic to accelerate DD, abbreviated as DDCF (Diverse Density with Concept Filtering), which proves itself to be suitable for POMDPs with landmarks. DDCF outperforms its antecedent in terms of computation speed and solution quality without loss of generality.The methods are empirically shown to be effective via extensive experimentation on a number of known and newly introduced problems with hidden state, and the results are discussed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic landmark discovery for learning agents under partial observability

Abstract

Talk to us

Similar Papers

More From: The Knowledge Engineering Review

Lead the way for us

Journal: The Knowledge Engineering Review	Publication Date: Jan 1, 2019
Citations: 4

Similar Papers

Solving POMDPs with Automatic Discovery of Subgoals
...
-
, et. al. ...
01 Jan 2009
01 Jan 2009

Tractable POMDP-planning for robots with complex non-linear dynamics
Marcus Hoerger
-
Marcus HoergerMarcus Hoerger
16 Mar 2020
16 Mar 2020

Optimal maintenance policies for three-states POMDP with quality measurement errors
Mohammad M Aldurgam ... Salih O Duffuaa
-
Mohammad M Aldurgam, et. al.Mohammad M Aldurgam ... Salih O Duffuaa
01 Dec 2010
01 Dec 2010

Inventory control with partially observable states
...
-
, et. al. ...
01 Dec 2019
01 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic landmark discovery for learning agents under partial observability

Abstract

Talk to us

Similar Papers

More From: The Knowledge Engineering Review