Evaluating Guided Policy Search for Human-Robot Handovers

Alap Kshirsagar,Guy Hoffman,Armin Biess

doi:10.1109/lra.2021.3067299

Alap Kshirsagar, Guy Hoffman + Show 1 more

Open Access

https://doi.org/10.1109/lra.2021.3067299

Copy DOI

Abstract

We evaluate the potential of Guided Policy Search (GPS), a model-based reinforcement learning (RL) method, to train a robot controller for human-robot object handovers. Handovers are a key competency for collaborative robots and GPS could be a promising approach for this task, as it is data efficient and does not require prior knowledge of the robot and environment dynamics. However, existing uses of GPS did not consider important aspects of human-robot handovers, namely large spatial variations in reach locations, moving targets, and generalizing over mass changes induced by the object being handed over. In this work, we formulate the reach phase of handovers as an RL problem and then train a collaborative robot arm in a simulation environment. Our results indicate that GPS is limited in the spatial generalizability over variations in the target location, but that this issue can be mitigated with the addition of local controllers trained over target locations in the high error regions. Moreover, learned policies generalize well over a large range of end-effector masses. Moving targets can be reached with comparable errors using a global policy trained on static targets, but this results in inefficient, high-torque, trajectories. Training on moving targets improves trajectories, but results in worse worst-case performance. Initial results suggest that lower-dimensional state representations are beneficial for GPS performance in handovers.

Highlights

I N THIS work, we develop and evaluate a robot controller that uses Guided Policy Search (GPS) to perform reaching motions for object handovers
We evaluate the performance of the global policy learnt with GPS for large variations in target locations, moving targets, and changes in robot dynamics
We build upon the Bregman-Alternating Direction Method of Multipliers (BADMM)-GPS implementation by Finn et al [34]

Summary

Introduction

I N THIS work, we develop and evaluate a robot controller that uses Guided Policy Search (GPS) to perform reaching motions for object handovers. Date of publication March 18, 2021; date of current version April 5, 2021. This letter was recommended for publication by Associate Editor G. (Guy Hoffman and Armin Biess contributed to this work.) (Corresponding authors: Alap Kshirsagar; Armin Biess.)

Methods

Results

Conclusion