Abstract

Reinforcement learning algorithms usually require a large number of empirical samples and give rise to a slow convergence in practical applications. One solution is to introduce transfer learning: Knowledge from well-learned source tasks can be reused to reduce sample request and accelerate the learning of target tasks. However, if an unmatched source task is selected, it will slow down or even disrupt the learning procedure. Therefore, it is very important for knowledge transfer to select appropriate source tasks that have a high degree of matching with target tasks. In this paper, a novel task matching algorithm is proposed to derive the latent structures of value functions of tasks, and align the structures for similarity estimation. Through the latent structure matching, the highly-matched source tasks are selected effectively, from which knowledge is then transferred to give action advice, and improve exploration strategies of the target tasks. Experiments are conducted on the simulated navigation environment and the mountain car environment. The results illustrate the significant performance gain of the improved exploration strategy, compared with traditional ϵ -greedy exploration strategy. A theoretical proof is also given to verify the improvement of the exploration strategy based on latent structure matching.

Highlights

  • Reinforcement learning (RL) is where an agent guides its actions based on the rewards obtained from the trial-and-error interaction with the environment [1,2]

  • In RL, the knowledge obtained from previous situations can be reused as heuristics to achieve effective knowledge transfer, speeding up the learning procedure in new situations and reducing sample request [3]; knowledge transfer is able to mitigate much the issue caused by a change on the problem configuration as mentioned above

  • (ii) Based on latent structure matching (LSM), we present an improved exploration strategy, that is built on the knowledge obtained from the highly-matched source task

Read more

Summary

Introduction

Reinforcement learning (RL) is where an agent guides its actions based on the rewards obtained from the trial-and-error interaction with the environment [1,2]. The similarity estimation of tasks is the main way to select matched source tasks in the existing works on knowledge transfer for RL. Clustering algorithms [22] were used in some works to tackle large number of tasks In these works, the clustering of policies, value functions, rewards, and dynamics of tasks, were modeled as random process to estimate the similarity [23,24]. (ii) Based on LSM, we present an improved exploration strategy, that is built on the knowledge obtained from the highly-matched source task. This improved strategy reduces random exploration in value function space of tasks, effectively improving the performance of RL agents.

Knowledge Transfer in RL
Low Rank Embedding
Method
Value Function Transfer
Experiments
Experiments on Maze Navigation Problem
LSM-based exploration ǫ-greedy exploration
Experiments on Mountain Car Problem
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call