Abstract

Deep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.

Highlights

  • Recent successes in deep reinforcement learning (DRL) have been achieved in domains with a well-specified reward function such as in game-playing [53] or robot control [49]

  • Very small amounts of queries let us outperform prior imitationbased approaches on Fetch and ShadowHand tasks. These games involve sparse and delayed rewards. These results suggest that Goal-driven Active Learning (GoAL) can greatly benefit in exploration efficiency and could help to expand the possible applications of RL

  • In this paper we presented Goal-driven Active Learning (GoAL), a method introducing interactive goal-driven demonstrations to both learn more effectively and efficiently

Read more

Summary

Introduction

Recent successes in deep reinforcement learning (DRL) have been achieved in domains with a well-specified reward function such as in game-playing [53] or robot control [49]. A line of work for overcoming the above-mentioned issues is goal-conditioned learning, a form of self-supervision that constructs a goal-conditioned policy to learn how to reach multiple goals [44, 68]. This idea was extended in Hindsight Experience Replay (HER) [4] to artificially generate new transitions by relabeling goals seen along the state trajectory. It may still require a large amount of data to capture complex policies. It drives the agent to learn how to achieve multiple goals without simulating interactions—generating and recomputing rewards of a single transition can be converted into many valid training examples

Objectives
Methods
Findings
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.