Abstract

The automatic subpolicy discovery approach in hierarchical reinforcement learning (HRL) has recently achieved promising performance on sparse reward tasks. This accelerates transfer learning and unsupervised intelligent creatures while eliminating the domain-specific knowledge constraint. Most previously developed approaches are demonstrated to suffer from collapsing into the situation where one subpolicy dominates the whole task, since they cannot ensure the diversity of different subpolicies. In contrast, this article proposes a task-agnostic regularizer (TAR) for learning diverse subpolicies in HRL. Specifically, we first formulate the discovery of diverse subpolicies as a trajectory inference problem and then propose a corresponding information-theoretic objective to encourage diversity. Subsequently, considering computability, we instantiate the objective as two simplifications for discrete and continuous action spaces. We extensively evaluate the proposed diversity-driven regularizer on three HRL task domains: 1) meta reinforcement learning; 2) hierarchical policy learning in the option framework; and 3) unsupervised subpolicy discovery. The extensive results obtained show that our TAR approach can improve upon the state-of-the-art performance on all three HRL domains without modifying any existing hyperparameters, indicating the wide applicability and robustness of our approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call