Abstract

In this paper, we present a new approach to transfer in Reinforcement Learning (RL) for cross-domain tasks. Unlike, available transfer approaches, where target task learning is accelerated through initialized learning from source, we propose to adapt and reuse the optimal source policy directly in the related domains. We show the optimal policy from a related source task can be near optimal in target domain provided an adaptive policy accounts for the model error between target and the projected source. A significant advantage of the proposed policy augmentation is in generalizing the policies across related domains without having to re-Iearn the new tasks. We demonstrate that, this architecture leads to better sample efficiency in the transfer, reducing sample complexity of target task learning to target apprentice learning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call