Learning Perceptual Concepts by Bootstrapping From Human Queries

Andreea Bobu,Dieter Fox,Wei Yang,Maya Cakmak,Yu-Wei Chao,Balakumar Sundaralingam,Chris Paxton

doi:10.1109/lra.2022.3196164

Abstract

When robots operate in human environments, it's critical that humans can quickly teach them new <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">concepts:</i> object-centric properties of the environment that they care about (e.g., objects <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">near</i> , <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">upright</i> , etc). However, teaching a new perceptual concept from high-dimensional robot sensor data (e.g., point clouds) is demanding, requiring an unrealistic amount of human labels. To address this, we propose a framework called Perceptual Concept Bootstrapping (PCB). First, we leverage the inherently lower-dimensional <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">privileged information</i> , e.g., object poses and bounding boxes, available from a simulator only at training time to rapidly learn a low-dimensional, geometric concept from minimal human input. Second, we treat this low-dimensional concept as an automatic labeler to synthesize a large-scale high-dimensional data set with the simulator. With these two key ideas, PCB alleviates human label burden while still learning perceptual concepts that work with real sensor input where no privileged information is available. We evaluate PCB for learning spatial concepts that describe object state or multi-object relationships, and show it achieves superior performance compared to baseline methods. We also demonstrate the utility of the learned concepts in motion planning tasks on a 7-DoF Franka Panda robot.

Full Text