Abstract

Language as an abstraction for hierarchical agents is promising to solve compositional long-time horizon decision-making tasks. The learning of the agent poses significant challenges, as it typically requires plenty of trajectories annotated with languages. This paper addresses the challenge of learning such an agent under the scarcity of language annotations. One approach for leveraging unannotated data is to generate pseudo-labels for unannotated trajectories using sparse seed annotations. However, as the scenes of the environment and tasks assigned to the agent are diverse, the inference of language instructions is sometimes incorrect, causing the policy to learn to ground incorrect instructions to actions. In this work, we propose a self-supervised language-conditioned hierarchical skill policy (SLHSP) which utilizes unannotated data to learn reusable and general task-related skills to facilitate learning from sparse annotations. We demonstrate that the SLHSP that learned with less than 10% of annotated trajectories has a comparable performance to one that learned with 100% of annotated data. Our approach to the challenging ALFRED benchmark leads to a notable improvement in the success rate over a strong baseline also optimized for sparsely annotated data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call