Apprenticeship learning with few examples

Abdeslam Boularias,Brahim Chaib-Draa

doi:10.1016/j.neucom.2012.11.002

Abstract

We consider the problem of imitation learning when the examples, provided by an expert human, are scarce. Apprenticeship learning via inverse reinforcement learning provides an efficient tool for generalizing the examples, based on the assumption that the expert's policy maximizes a value function, which is a linear combination of state and action features. Most apprenticeship learning algorithms use only simple empirical averages of the features in the demonstrations as a statistics of the expert's policy. However, this method is efficient only when the number of examples is sufficiently large to cover most of the states, or the dynamics of the system is nearly deterministic. In this paper, we show that the quality of the learned policies is sensitive to the error in estimating the averages of the features when the dynamics of the system is stochastic. To reduce this error, we introduce two new approaches for bootstrapping the demonstrations by assuming that the expert is near-optimal and the dynamics of the system is known. In the first approach, the expert's examples are used to learn a reward function and to generate furthermore examples from the corresponding optimal policy. The second approach uses a transfer technique, known as graph homomorphism, in order to generalize the expert's actions to unvisited regions of the state space. Empirical results on simulated robot navigation problems show that our approach is able to learn sufficiently good policies from a significantly small number of examples.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Apprenticeship learning with few examples

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Nov 16, 2012
Citations: 27

Similar Papers

Apprenticeship learning via soft local homomorphisms
Abd Boularias ... Brahim Chaib-Draa
-
Abd Boularias, et. al.Abd Boularias ... Brahim Chaib-Draa
01 May 2010
01 May 2010

Inverse reinforcement learning using Dynamic Policy Programming
Eiji Uchibe ... Kenji Doya
-
Eiji Uchibe, et. al.Eiji Uchibe ... Kenji Doya
01 Oct 2014
01 Oct 2014

Proposal and Evaluation of the Improved Penalty Avoiding Rational Policy Making Algorithm
...
-
, et. al. ...
01 Jan 2009
01 Jan 2009

A survey of inverse reinforcement learning
Stephen Adams ... Tyler Cody
Artificial Intelligence Review | VOL. 55
Stephen Adams, et. al.Stephen Adams ... Tyler Cody
08 Feb 2022
Artificial Intelligence Review | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Apprenticeship learning with few examples

Abstract

Talk to us

Similar Papers

More From: Neurocomputing