Abstract

Imitation learning provides a family of promising frameworks that learn policies from expert demonstrations directly. However, most imitation learning methods assume that the expert demonstrations come from the same expert and have a single modality. In fact, the expert demonstrations may be generated by different experts in different modalities. Auxiliary classifier generative adversarial imitation learning (AC-GAIL) uses an auxiliary classifier to classify samples according to modalities, so that the generator can perform different actions according to different modalities, and obtain a multi-modal policy. However, we find that AC-GAIL’s objective function missing a conditional entropy, and this conditional entropy cannot be calculated directly. Missing the conditional entropy can result in a decrease in the performance of the learned policy. In this paper, we propose a method that can deal with the problem of missing conditional entropy in AC-GAIL, named twin auxiliary classifiers GAIL (TAC-GAIL). Specifically, we add another auxiliary classifier to the framework of AC-GAIL, which is used to classify the generated samples. We theoretically prove the effectiveness of this method, and the experimental results on MuJoCo tasks show that TAC-GAIL can effectively improve the performance of the learned multi-modal policy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call