Abstract

Developing artificial intelligence (AI) agents is challenging for efficient exploration in visually rich and complex environments. In this study, we formulate the exploration question as a reinforcement learning problem and rely on intrinsic motivation to guide exploration behavior. Such intrinsic motivation is driven by curiosity and is calculated based on episode memory. To distribute the intrinsic motivation, we use a count-based method and temporal distance to generate it synchronously. We tested our approach in 3D maze-like environments and validated its performance in exploration tasks through extensive experiments. The experimental results show that our agent can learn exploration ability from raw sensory input and accomplish autonomous exploration across different mazes. In addition, the learned policy is not biased by stochastic objects. We also analyze the effects of different training methods and driving forces on exploration policy.

Highlights

  • Exploration behavior is the fundamental of organisms for survival and reproduction

  • In order to tradeoff the influence between the bonus, we test the effect of different parameter groups, which setup α + β ≡ 1 and sample them in the same interval (0.1), and mainly demonstrate two results: the episode reward and the amount of interaction required to encode the environment. e results are averaged over the top 5 random hyperparameters and are summarized in Figure 9 after data

  • We proposed an autonomous exploration method based on deep reinforcement learning and the concept of intrinsic motivation

Read more

Summary

Introduction

Exploration behavior is the fundamental of organisms for survival and reproduction. For example, animals searching for food may have to travel long distances without getting any reward from the environment [1, 2]. We propose a DRL method, augmented with intrinsic motivation, for training agents to accomplish autonomous exploration through vision only. E first is a count-based method, which pays attention to the novelty bonus of the environment that has been explored and encourages the agent to reach the rarely visited states. E second method is determined by temporal distance [22, 23] between current observation and those in memory It calculates the novelty bonus of unexplored areas and tries to push the agent to distant places.

Background
Exploration Method
Experiment Setup
Method
Method Pretraining
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call