Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

Mingxuan Jing,Bin Fang,Huaping Liu,Chao Yang,Fuchun Sun,Wenbing Huang,Xiaojian Ma

doi:10.1609/aaai.v34i04.5953

Abstract

In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations. Most of existing RLfD methods require demonstrations to be perfect and sufficient, which yet is unrealistic to meet in practice. To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD in a formal way, and then point out that previous methods suffer from two issues in terms of optimality and convergence, respectively. Upon the theoretical findings we have derived, we tackle these two issues by regarding the expert guidance as a soft constraint on regulating the policy exploration of the agent, which eventually leads to a constrained optimization problem. We further demonstrate that such problem is able to be addressed efficiently by performing a local linear search on its dual form. Considerable empirical evaluations on a comprehensive collection of benchmarks indicate our method attains consistent improvement over other RLfD counterparts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Apr 3, 2020
Citations: 40

Similar Papers

UNIFY: A unified policy designing framework for solving integrated Constrained Optimization and Machine Learning problems
Mattia Silvestri ... Michela Milano
Knowledge-Based Systems | VOL. 303
Mattia Silvestri, et. al.Mattia Silvestri ... Michela Milano
22 Aug 2024
Knowledge-Based Systems | VOL. 303

A Two Level Local Search for MAX-SAT Problems with Hard and Soft Constraints
John Thornton ... Abdul Sattar
-
John Thornton, et. al.John Thornton ... Abdul Sattar
01 Jan 2002
01 Jan 2002

Towards Generalization and Efficiency in Reinforcement Learning

-

02 Jul 2019
02 Jul 2019

Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models
Yuchen Wu ... Melissa Mozifian
-
Yuchen Wu, et. al.Yuchen Wu ... Melissa Mozifian
30 May 2021
30 May 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence