Abstract

Generative Adversarial Imitation Learning is a powerful and practical approach for learning sequential decision-making policies. Compared to Reinforcement Learning, Generative Adversarial Imitation Learning has the ability to lean sequential decision-making policies when the reward function is unknown. Despite the significant empirical progresses, Instability of training and a great quantity of interactions are the two main problems of generative adversarial imitation learning. These problems are more prominent in multi-task environments and high-dimensional state spaces. In order to apply this algorithm to tasks such as autonomous driving and visual navigation, researchers use other imitation learning algorithms to train generative adversarial imitation learning. We focus more attention on improving the training efficiency of the generative adversarial imitation learning algorithm to ensure that the algorithm can complete the training independently without any pre-trained step. Goal conditioned setting such as goal conditioned reward function has recently shown promising training result in the policy training on navigation and manipulator tasks. Incorporate with goal conditioned reward function, we propose a novel generative adversarial imitation learning framework replacing the PPO policy update rules with the off-policy algorithm Dueling-DQN. Only with a small amount of expert data, our proposed method achieves performance that surpasses GAIL and Dueling-DQN on UAV visual navigation and obstacle avoidance tasks in urban and wild environments without the help of other imitation learning algorithms. Furthermore, our research fills the gap of generative adversarial imitation learning in the field of UAV visual obstacle avoidance navigation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call