Overcoming Exploration in Reinforcement Learning with Demonstrations

Ashvin Nair,Wojciech Zaremba,Pieter Abbeel,Bob Mcgrew,Marcin Andrychowicz

doi:10.1109/icra.2018.8463162

Abstract

Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Overcoming Exploration in Reinforcement Learning with Demonstrations

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Path planning of robotic arm based on deep reinforcement learning algorithm
Mostafa Al‐Gabalawy
Advanced Control for Applications | VOL. 4
Mostafa Al‐GabalawyMostafa Al‐Gabalawy
01 Mar 2022
Advanced Control for Applications | VOL. 4

Intrinsically Motivated Multi-Goal Reinforcement Learning Using Robotics Environment Integrated with OpenAI Gym
Sivasubramanian Balasubramanian
Journal of Science & Technology | VOL. 4
Sivasubramanian BalasubramanianSivasubramanian Balasubramanian
17 Nov 2023
Journal of Science & Technology | VOL. 4

Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards
Guoyu Zuo ... Jiangeng Li
International Journal of Advanced Robotic Systems | VOL. 17
Guoyu Zuo, et. al.Guoyu Zuo ... Jiangeng Li
01 Jan 2020
International Journal of Advanced Robotic Systems | VOL. 17

Hindsight Balanced Reward Shaping
Mengxuan Shao ... Kun Han
-
Mengxuan Shao, et. al.Mengxuan Shao ... Kun Han
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Overcoming Exploration in Reinforcement Learning with Demonstrations

Abstract

Talk to us

Similar Papers