Abstract

Deep reinforcement learning (RL) is known as an emerging research trend in machine learning for autonomous systems. In real-world scenarios, the extrinsic rewards, acquired from the environment for learning an agent, are usually missing or extremely sparse. Such an issue of sparse reward constrains the learning capability of agent because the agent only updates the policy when the goal state is successfully attained. It is always challenging to implement an efficient exploration in RL algorithms. To tackle the sparse reward and inefficient exploration, the agent needs other helpful information to update its policy even when there is no interaction with the environment. This paper proposes the stochastic curiosity maximizing exploration (SCME), a learning strategy explored to allow the agent to act as human. We cope with the sparse reward problem by encouraging the agent to explore future diversity. To do so, a latent dynamic system is developed to acquire the latent states and latent actions to predict the variations in future conditions. The mutual information and the prediction error in the predicted states and actions are calculated as the intrinsic rewards. The agent based on SCME is therefore learned by maximizing these rewards to improve sample efficiency for exploration. The experiments on PyDial and Super Mario Bros show the benefits of the proposed SCME in dialogue system and computer game, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call