Abstract

Efficient exploration is the core issue of deep reinforcement learning. Although state-of-the-art exploration methods have achieved much progress in many tasks, they usually underperform in procedurally-generated environments, indicating the low capability of generalization of the agent. To address the problem, a self-imitation exploration approach for procedurally-generated environments, referred to as Double Self-Imitation Learning (DSIL), is proposed. DSIL screens out good history experiences of exploration by utilizing an episode scoring rule that considers local scores, global scores and external rewards. Then DSIL employs a cooperation strategy to reproduce the agent’s past good exploration behaviors by combining generative adversarial imitation learning (GAIL) and behavioral cloning (BC). Specifically, DSIL is composed of a reinforcement learning module and a discriminator. The discriminator generates intrinsic rewards by judging the similarity of the current state–action pairs to the past good exploration experiences. The policy of agent is optimized alternately by the BC task and the reinforcement learning algorithm in the GAIL task; meanwhile, the reinforcement learning module and the discriminator are updated alternately in the GAIL task. Experiments on several procedurally-generated environments demonstrated that the proposed DSIL significantly outperformed existing exploration approaches in sample efficiency and performance, that is, DSIL makes the agent have stronger generalization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call