Abstract

Transfer learning has shown great potential to accelerate reinforcement learning (RL) by utilizing prior knowledge of relevant task that has been learned in the past. Policy Reuse Q-learning (PRQL) is a general policy transfer framework, which speeds up the learning process of the target task by probabilistically reusing source policies from the policy library. In this paper, we propose an improved PRQL method to achieve more fast probabilistic policy reuse in deep reinforcement learning (DRL). First, we extend the basic PRQL algorithm to DRL, proposing a probability policy reuse algorithm that builds on DRL to solve more complex problems. Second, PRQL algorithms usually use a metric based on the average gain to measure the similarity between tasks. However, it contains very limited information and must be delayed until the end of an episode to update, which is inefficient. Instead, we propose a new metric based on fitting the reward function, which can make the agent converge to the most suitable reuse policy more quickly and accurately. We demonstrate the detection accuracy, received cumulative reward, and speed of convergence of our method in three complex Markov tasks. Experimental results show that our method can consistently achieve efficient policy transfer in these tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call