Abstract

Transfer Learning (TL) has received a great deal of attention because of its ability to speed up Reinforcement Learning (RL) by reusing learned knowledge from other tasks. This paper proposes a new transfer learning framework, referred to as Transfer Learning via Artificial Neural Network Approximator (TL-ANNA). It builds an Artificial Neural Network (ANN) transfer approximator to transfer the related knowledge from the source task into the target task and reuses the transferred knowledge with a Probabilistic Policy Reuse (PPR) scheme. Specifically, the transfer approximator maps the state of the target task symmetrically to states of the source task with a certain mapping rule, and activates the related knowledge (components of the action-value function) of the source task as the input of the ANNs; it then predicts the quality of the actions in the target task with the ANNs. The target learner uses the PPR scheme to bias the RL with the suggested action from the transfer approximator. In this way, the transfer approximator builds a symmetric knowledge path between the target task and the source task. In addition, two mapping rules for the transfer approximator are designed, namely, Full Mapping Rule and Group Mapping Rule. Experiments performed on the RoboCup soccer Keepaway task verified that the proposed transfer learning methods outperform two other transfer learning methods in both jumpstart and time to threshold metrics and are more robust to the quality of source knowledge. In addition, the TL-ANNA with the group mapping rule exhibits slightly worse performance than the one with the full mapping rule, but with less computation and space cost when appropriate grouping method is used.

Highlights

  • Reinforcement learning (RL) is a popular learning paradigm for solving sequential decision-making problems

  • It builds an Artificial Neural Network (ANN) transfer approximator to transfer the related knowledge from the source task to the target task and reuses the transferred knowledge based on the Probabilistic Policy Reuse (PPR) scheme

  • Since there are two types of transfer approximator due to two mapping rules, two sets of experiments have been performed for the proposed transfer learning (TL)-ANNA methods

Read more

Summary

Introduction

Reinforcement learning (RL) is a popular learning paradigm for solving sequential decision-making problems. The motivation of this paper is to reduce human involvements in the transfer learning, and to make a better use of the related features between the source task and the target task. This paper makes a number of contributions by proposing a new transfer learning framework This framework builds a transfer approximator to predict the quality of target task actions based on source task knowledge, and biases the action selection of the target task learner with the Probabilistic Policy Reuse (PPR) scheme [2,3]. It builds an Artificial Neural Network (ANN) transfer approximator to transfer the related knowledge from the source task to the target task and reuses the transferred knowledge based on the PPR scheme.

Literature Review
Reinforcement Learning
Transfer Learning via ANN Approximator
State Feature Mapping Rules
Full Mapping
Group Mapping
Construction of the ANNs
Generating Input Samples
Generating Output Samples
Biasing the Action Selection with PPR Scheme
Experiments and Results
Keepaway Task
TL-ANNA in Keepaway
Experiment Settings and Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call