Learning Policies by Learning Rules

Brandon Araki,Daniela Rus,Lillian Chin,Jeana Choi,Xiao Li

doi:10.1109/lra.2021.3139380

Brandon Araki, Daniela Rus + Show 3 more

Open Access

https://doi.org/10.1109/lra.2021.3139380

Copy DOI

Abstract

Efficiently learning interpretable policies for complex tasks from demonstrations is a challenging problem. We present Hierarchical Inference with Logical Options (HILO), a novel learning algorithm that learns to imitate expert demonstrations by learning the rules that the expert is following. The rules are represented as linear temporal logic (LTL) formulas, which are interpretable and capable of encoding complex behaviors. Unlike previous works, which learn rules from high-level propositions, HILO learns rules by taking both propositions and low-level trajectories as input. It does this by defining a Bayesian model over LTL formulas, propositions, and low-level trajectories. The Bayesian model bridges the gap from formula to low-level trajectory by using a planner to find an optimal policy for a given LTL formula. Stochastic variational inference is then used to find a posterior distribution over formulas and policies given expert demonstrations. We show that by learning rules from both propositions and low-level states, HILO outperforms previous work on a rule-learning task and on four planning tasks while needing less data. We also validate HILO in the real world by teaching a robotic arm a complex packing task.

Highlights

I N the imitation learning (IL) problem, desired behaviors are learned by imitating expert demonstrations [1]
We introduced Hierarchical Inference with Logical Options (HILO), a method for inferring and planning with linear temporal logic (LTL) formulas given low-level trajectory demonstrations
We showed how HILO improves over other work by incorporating planning in the inference loop

Summary

INTRODUCTION

I N the imitation learning (IL) problem, desired behaviors are learned by imitating expert demonstrations [1]. Efficient manner by introducing Hierarchical Inference with Logical Options (HILO), an IL algorithm that learns a policy by learning the rules that the expert is following. Given a low-level environment including propositions; a set of pretrained low-level policies; and expert demonstrations, HILO learns a distribution over LTL formulas and policies that characterize the task the expert is performing. HILO achieves this by defining a hierarchical Bayesian model that relates LTL formulas to propositions and low-level trajectories. This paper makes the following contributions: 1) We introduce a hierarchical Bayesian model that incorporates the LOF-VI planner to relate LTL formulas to policies, thereby defining a joint distribution over LTL formulas, propositions, and low-level trajectories. 2) We use the Bayesian model to define a stochastic variational inference problem that infers a posterior distribution over interpretable LTL formulas and policies given a set of expert demonstrations. We validate HILO in a real-world setting by teaching a robotic arm a complex grocery-packing task

RELATED WORK

PRELIMINARIES

PROBLEM STATEMENT

HIERARCHICAL INFERENCE WITH LOGICAL OPTIONS

Bayesian Model

Sampling LTL formulas

EXPERIMENTS & RESULTS

Inference over Low-level States

Planning

Real-world Packing

CONCLUSION & FUTURE WORK

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning Policies by Learning Rules

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Robotics and Automation Letters

Lead the way for us

Journal: IEEE Robotics and Automation Letters	Publication Date: Apr 1, 2022
License type: CC BY 4.0

Similar Papers

From LTL Formulae to Büchi Automata: A Direct Translation Using On-the-Fly De-Generalization
Zheng Qin ... Zhipeng Li
-
Zheng Qin, et. al.Zheng Qin ... Zhipeng Li
01 Dec 2015
01 Dec 2015

Temporal normal form for Linear Temporal Logic formulae1
Hui-Xian Shi ... Yong-Ming Li
Journal of Intelligent & Fuzzy Systems | VOL. 30
Hui-Xian Shi, et. al.Hui-Xian Shi ... Yong-Ming Li
01 Mar 2016
Journal of Intelligent & Fuzzy Systems | VOL. 30

Explaining Multi-stage Tasks by Learning Temporal Logic Formulas from Suboptimal Demonstrations
Glen Chou ... Dmitry Berenson
-
Glen Chou, et. al.Glen Chou ... Dmitry Berenson
12 Jul 2020
12 Jul 2020

Monitoring programs using rewriting
K Havelund ... G Rosu
-
K Havelund, et. al.K Havelund ... G Rosu
26 Nov 2001
26 Nov 2001

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Policies by Learning Rules

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Robotics and Automation Letters