Abstract

Existing Deep Reinforcement Learning (DRL) methods are still faced with sample inefficiency, so it is difficult to learn from scratch. Transfer Learning (TL) has shown great potential to accelerate DRL by leveraging prior knowledge from past learned policies of relevant tasks. Existing transfer approaches either explicitly computes the similarity between tasks or select appropriate source policies to provide guided explorations for the target task. However, how to directly optimize the target policy by alternatively utilizing knowledge from appropriate source policies without explicitly measuring the similarity is currently missing. We propose a novel Policy Transfer Framework (PTF) based on this idea. Our PTF models multi-policy transfer as option learning, and we directly optimize the target policy using the transferred knowledge. We propose an adaptive, heuristic mechanism to avoid negative transfer. PTF can be easily combined with DRL methods. And experimental results show it outperforms existing DRL approaches, and state-of-the-art policy transfer methods both in discrete and continuous action space.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call