Deep Q-learning From Demonstrations

Todd Hester,Andrew Sendonaris,Bilal Piot,Tom Schaul,Joel Leibo,Ian Osband,Audrunas Gruslys,Marc Lanctot,John Quan,Gabriel Dulac-Arnold,Dan Horgan,John Agapiou,Olivier Pietquin,Matej Vecerik

doi:10.1609/aaai.v32i1.11757

Abstract

Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. In this paper we study a setting where the agent may access data from previous control of the system. We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism. DQfD works by combining temporal difference updates with supervised classification of the demonstrator’s actions. We show that DQfD has better initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN) as it starts with better scores on the first million steps on 41 of 42 games and on average it takes PDD DQN 83 million steps to catch up to DQfD’s performance. DQfD learns to out-perform the best demonstration given in 14 of 42 games. In addition, DQfD leverages human demonstrations to achieve state-of-the-art results for 11 games. Finally, we show that DQfD performs better than three related algorithms for incorporating demonstration data into DQN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep Q-learning From Demonstrations

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Apr 29, 2018
Citations: 438

Similar Papers

Optimizing Agent Training with Deep Q-Learning on a Self-Driving Reinforcement Learning Environment
Pedro Rodrigues ... Susana Vieira
-
Pedro Rodrigues, et. al.Pedro Rodrigues ... Susana Vieira
01 Dec 2020
01 Dec 2020

Deep Q Learning in Stabilization of Inverted Pendulum
S.Suganthi Amudhan* ... Dr.Bhavin Sedani
International Journal of Innovative Technology and Exploring Engineering | VOL. 9
S.Suganthi Amudhan*, et. al.S.Suganthi Amudhan* ... Dr.Bhavin Sedani
30 Dec 2020
International Journal of Innovative Technology and Exploring Engineering | VOL. 9

Deep Q-learning with Explainable and Transferable Domain Rules
Yichuan Zhang ... Junxiang Li
-
Yichuan Zhang, et. al.Yichuan Zhang ... Junxiang Li
01 Jan 2020
01 Jan 2020

Leveraging deep reinforcement learning for design space exploration with multi-fidelity surrogate model
Haokun Li ... Yan Yan
Journal of Engineering Design | VOL. ahead-of-print
Haokun Li, et. al.Haokun Li ... Yan Yan
25 Jun 2024
Journal of Engineering Design | VOL. ahead-of-print

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Q-learning From Demonstrations

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence