Abstract

Improving the generalization ability of reinforcement learning (RL) agents is an open and challenging problem and has gradually received attention in recent years. Previous work attempts to partially solve this problem by recombining several learned policies to achieve compositional generalization. However, these methods are usually limited to tasks that are composed of fixed subtasks with no task-agnostic environment change. In this paper, we propose a more flexible method termed MER to learn a compositionally generalizable policy that depends on task-dependent invariant elements among tasks, instead of depending on fixed subtasks. As a result, our learned policy can overcome the task-agnostic change in the environment and generalize to more different compositional tasks. Theoretical analysis and experimental results show that our method outperforms traditional methods and exhibits superior generalization ability in unseen new tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call