Abstract

Model-free episodic reinforcement learning problems define the environment reward with functions that often provide only sparse information throughout the task. Consequently, agents are not given enough feedback about the fitness of their actions until the task ends with success or failure. Previous work addresses this problem with reward shaping. In this paper we introduce a novel approach to improve model-free reinforcement learning agents' performance with a three step approach. Specifically, we collect demonstration data, use the data to recover a linear function using inverse reinforcement learning and we use the recovered function for potential-based reward shaping. Our approach is model-free and scalable to high dimensional domains. To show the scalability of our approach we present two sets of experiments in a two dimensional Maze domain, and the 27 dimensional Mario AI domain. We compare the performance of our algorithm to previously introduced reinforcement learning from demonstration algorithms. Our experiments show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call