Abstract

While Generative Adversarial Imitation Learning (GAIL) shows remarkable performance in many high dimensional imitation learning tasks, it requires too many sampled transitions, which are infeasible for some real world problems. In this paper, we demonstrate how exploiting the reward function in GAIL can improve sample efficiency. We design our algorithm end-to-end differentiable so that the learned reward function can directly participate in policy updates. End-to-end differentiability can be achieved by introducing a forward model of the environment, enabling direct calculation of the cumulative reward function. However, using a forward model has two significant limitations that it heavily relies on the performance of the forward model and requires multi-step prediction, which causes severe error accumulation. The proposed end-to-end differentiable adversarial imitation learning algorithm alleviates these limitations. Also, we suggest applying several existing regularization techniques for robust training of a forward model. We call our algorithm, integrated with these regularization methods, fully Differentiable Regularized GAIL (DRGAIL), and test DRGAIL on continuous control tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.