Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

Mingfei Sun,Shimon Whiteson,Sam Devlin,Katja Hofmann

doi:10.1609/aaai.v36i8.20813

Abstract

Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our formulation capitalizes on two key insights: (1) the similarity between the Bellman equation and the stationary state-action distribution equation allows us to derive a novel temporal difference (TD) learning approach; and (2) the use of a deterministic policy simplifies the TD learning. Combined, these insights yield a practical algorithm, Deterministic and Discriminative Imitation (D2-Imitation), which oper- ates by first partitioning samples into two replay buffers and then learning a deterministic policy via off-policy reinforcement learning. Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation on many control tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 4

Similar Papers

On the Reuse Bias in Off-Policy Reinforcement Learning
Chengyang Ying ... Hang Su
-
Chengyang Ying, et. al.Chengyang Ying ... Hang Su
01 Aug 2023
01 Aug 2023

Enhanced Off-Policy Reinforcement Learning With Focused Experience Replay
Seung-Hyun Kong ... I Made Aswin Nahrendra
IEEE Access | VOL. 9
Seung-Hyun Kong, et. al.Seung-Hyun Kong ... I Made Aswin Nahrendra
01 Jan 2020
IEEE Access | VOL. 9

State Distribution-Aware Sampling for Deep Q-Learning
Weichao Li ... Fuxian Huang
Neural Processing Letters | VOL. 50
Weichao Li, et. al.Weichao Li ... Fuxian Huang
09 Nov 2018
Neural Processing Letters | VOL. 50

Rethinking Population-assisted Off-policy Reinforcement Learning
Bowen Zheng ... Ran Cheng
-
Bowen Zheng, et. al.Bowen Zheng ... Ran Cheng
12 Jul 2023
12 Jul 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence