Autonomously learning to visually detect where manipulation will succeed

Hai Nguyen,Charles C Kemp

doi:10.1007/s10514-013-9363-y

Abstract

Visual features can help predict if a manipulation behavior will succeed at a given location. For example, the success of a behavior that flips light switches depends on the location of the switch. We present methods that enable a mobile manipulator to autonomously learn a function that takes an RGB image and a registered 3D point cloud as input and returns a 3D location at which a manipulation behavior is likely to succeed. With our methods, robots autonomously train a pair of support vector machine (SVM) classifiers by trying behaviors at locations in the world and observing the results. Our methods require a pair of manipulation behaviors that can change the state of the world between two sets (e.g., light switch up and light switch down), classifiers that detect when each behavior has been successful, and an initial hint as to where one of the behaviors will be successful. When given an image feature vector associated with a 3D location, a trained SVM predicts if the associated manipulation behavior will be successful at the 3D location. To evaluate our approach, we performed experiments with a PR2 robot from Willow Garage in a simulated home using behaviors that flip a light switch, push a rocker-type light switch, and operate a drawer. By using active learning, the robot efficiently learned SVMs that enabled it to consistently succeed at these tasks. After training, the robot also continued to learn in order to adapt in the event of failure.

Highlights

Informing robot manipulation with computer vision continues to be a challenging problem in human environments such as homes
Obtaining labeled examples can be difficult, since the robot needs to act in the real-world and human labeling can be labor intensive and have errors, ambiguity, and inconsistencies (Barriuso and Torralba 2012). We address this issue in our work by combining active learning, which reduces the number of examples needed, with autonomous learning methods that eliminate the need for human labeling beyond an initialization process
We have previously demonstrated that behaviors of this form can perform a variety of useful mobile manipulation tasks when provided with a 3D location designated with a laser pointer (Nguyen et al 2008)

Summary

Introduction

Informing robot manipulation with computer vision continues to be a challenging problem in human environments such as homes. The robot must handle wide variation in the appearance of task-relevant components of the world that can affect its ability to perform tasks successfully. Lighting can vary from home to home and from hour to hour due to indoor lighting and windows. Important components of household mechanisms used during manipulation, such as drawer handles and switches, can be distinctive or even unique. The perspective from which a mobile robot observes the component will vary

Objectives

Methods

Conclusion