Compositional RL Agents That Follow Language Commands in Temporal Logic

Yen-Ling Kuo,Andrei Barbu,Boris Katz

doi:10.3389/frobt.2021.689550

Abstract

We demonstrate how a reinforcement learning agent can use compositional recurrent neural networks to learn to carry out commands specified in linear temporal logic (LTL). Our approach takes as input an LTL formula, structures a deep network according to the parse of the formula, and determines satisfying actions. This compositional structure of the network enables zero-shot generalization to significantly more complex unseen formulas. We demonstrate this ability in multiple problem domains with both discrete and continuous state-action spaces. In a symbolic domain, the agent finds a sequence of letters that satisfy a specification. In a Minecraft-like environment, the agent finds a sequence of actions that conform to a formula. In the Fetch environment, the robot finds a sequence of arm configurations that move blocks on a table to fulfill the commands. While most prior work can learn to execute one formula reliably, we develop a novel form of multi-task learning for RL agents that allows them to learn from a diverse set of tasks and generalize to a new set of diverse tasks without any additional training. The compositional structures presented here are not specific to LTL, thus opening the path to RL agents that perform zero-shot generalization in other compositional domains.

Highlights

To reliably interact with humans in physical world, robots must learn to execute commands that are extended in time while being responsive to changes in their environments
While individual formulas can be learned by deep networks with extensive experience, we demonstrate how to compose together tasks and skills to learn a general principle of how to encode all linear temporal logic (LTL) formulas and follow them without per-formula experience
An LTL formula and a map containing a robot are provided to an agent which must immediately find a sequence of moves that result in behavior of the robot that satisfies the LTL command

Summary

Introduction

To reliably interact with humans in physical world, robots must learn to execute commands that are extended in time while being responsive to changes in their environments. This requires the robot to jointly represent the symbolic knowledge in language and the perceptual information from the environment as well as generalize to different commands and maps. Commands expressed in LTL encode temporal constraints that should be true while executing the command Executing such commands is difficult in robotics because integration is required between the complex symbolic reasoning that finds satisfying sequences of moves for an LTL command and data-driven perceptual capabilities required to sense the environment. We demonstrate how to integrate the learning abilities of neural networks with the symbolic structure of LTL commands to achieve a new capability: learning to perform end-to-end zero-shot execution of LTL commands

Methods

Results

Conclusion