Learning domain structure through probabilistic policy reuse in reinforcement learning

Fernando Fernández,Manuela Veloso

doi:10.1007/s13748-012-0026-6

Fernando Fernández, Manuela Veloso

Open Access

https://doi.org/10.1007/s13748-012-0026-6

Copy DOI

Abstract

Policy Reuse is a transfer learning approach to improve a reinforcement learner with guidance from previously learned similar policies. The method uses the past policies as a probabilistic bias where the learner chooses among the exploitation of the ongoing learned policy, the exploration of random unexplored actions, and the exploitation of past policies. In this work, we demonstrate that Policy Reuse further contributes to the learning of the structure of a domain. Interestingly and almost as a side effect, Policy Reuse identifies classes of similar policies revealing a basis of core-policies of the domain. We demonstrate theoretically that, under a set of conditions to be satisfied, reusing such a set of core-policies allows us to bound the minimal expected gain received while learning a new policy. In general, Policy Reuse contributes to the overall goal of lifelong reinforcement learning, as (i) it incrementally builds a policy library; (ii) it provides a mechanism to reuse past policies; and (iii) it learns an abstract domain structure in terms of core-policies of the domain.

Full Text