Diverse Imitation Learning via Self-Organizing Generative Models.

Arash Vahabpour,Tianyi Wang,Vwani Roychowdhury,Omead Pooladzandi,Qiujing Lu

doi:10.1109/tnnls.2024.3401170

Abstract

Imitation learning (IL) is a well-known problem in the field of Markov decision process (MDP), where one is given multiple demonstration trajectories generated by expert(s), and the goal is to replicate the hidden expert-policies so that when the MDP is run independently, it generates trajectories close to the demonstrated ones. IL is one of the most useful tools used in building versatile robots that can learn from examples. This task becomes particularly challenging when the expert exhibits a mixture of behavior modes. Prior work has introduced latent variables to model variations of the expert policy. However, our experiments show that the existing works do not exhibit appropriate imitation of individual modes. To tackle this problem, we first draw inspiration from the well-known classical technique of self-organizing maps (SOMs) and introduce an encoder-free generative model-referred to as the self-organizing generative (SOG) model-for learning multimodal data distributions from samples. We then apply SOG for behavior cloning (BC)-a framework that learns deterministic policies without considering the environment-to accurately distinguish and imitate different modes. Then, we integrate it with generative adversarial IL (GAIL)-a framework that learns policies while considering the environment-to make the learning robust toward compounding errors at unseen states. We show that our method significantly outperforms the state of the art across multiple experiments within the MuJoCo simulator, including locomotion and robotic manipulation tasks.

Full Text