Learning Replanning Policies With Direct Policy Search

Florian Brandherm,Riad Akrour,Jan Peters,Gerhard Neumann

doi:10.1109/lra.2019.2901656

Florian Brandherm, Riad Akrour + Show 2 more

Open Access

PDF Available

https://doi.org/10.1109/lra.2019.2901656

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Direct policy search has been successful in learning challenging real-world robotic motor skills by learning open-loop movement primitives with high sample efficiency. These primitives can be generalized to different contexts with varying initial configurations and goals. Current state-of-the-art contextual policy search algorithms can however not adapt to changing, noisy context measurements. Yet, these are common characteristics of real-world robotic tasks. Planning a trajectory ahead based on an inaccurate context that may change during the motion often results in poor accuracy, especially with highly dynamical tasks. To adapt to updated contexts, it is sensible to learn trajectory replanning strategies. We propose a framework to learn trajectory replanning policies via contextual policy search and demonstrate that they are safe for the robot, can be learned efficiently, and outperform non-replanning policies for problems with partially observable or perturbed context.

Highlights

A POPULAR approach to learn robotic skills is the combination of direct policy search with structured trajectory representations that can be initialized from demonstrations via imitation learning [1], [2]
This demonstrates that contextual direct policy search algorithms, when coupled with smooth parameterized trajectory generators are more powerful than they are given credit for in the literature
We propose a combination of replanning policies with replannable Dynamical movement primitives (DMPs) and show how they can be learned with policy search by introducing some light assumptions

Summary

INTRODUCTION

A POPULAR approach to learn robotic skills is the combination of direct policy search with structured trajectory representations that can be initialized from demonstrations via imitation learning [1], [2]. One approach to robot reinforcement learning is to learn the Q-function of a time-dependent control policy [10], [11] Such a trajectory representation can be problematic. This is an important issue since in reality, the frequency of context measurement can be orders of magnitude slower that the robot control frequency (e.g., optical tracking) Such a time-dependent policy typically requires a large number of parameters to produce complex trajectories. An alternative approach that requires to learn much fewer parameters is to learn a simple policy that determines the parameters of a low-dimensional structured trajectory representation [3] For this purpose, a variety of direct policy search algorithms have been suggested [13]. This demonstrates that contextual direct policy search algorithms, when coupled with smooth parameterized trajectory generators are more powerful than they are given credit for in the literature

RELATED WORK

PRELIMINARIES

REPLANNING POLICIES

Stationary Replanning Policy

Non-Stationary Replanning Policy

REPLANNING CONTROL

Safety of Replanning

Planar Hole Reaching With State Perturbations

Hole Reaching With Partial Observability

Simulated Table Tennis

EXPERIMENT

VIII. CONCLUSION

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Robotics and Automation Letters	Publication Date: Apr 1, 2019
Citations: 2	License type: CC BY 4.0

R Discovery Prime

Learning Replanning Policies With Direct Policy Search

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Robotics and Automation Letters

Lead the way for us

Similar Papers

Model-based contextual policy search for data-efficient generalization of robot skills
Andras Kupcsik ... Gerhard Neumann
Artificial Intelligence | VOL. 247
Andras Kupcsik, et. al.Andras Kupcsik ... Gerhard Neumann
01 Dec 2014
Artificial Intelligence | VOL. 247

An adaptive safety layer with hard constraints for safe reinforcement learning in multi-energy management systems
Glenn Ceusters ... Maarten Messagie
Sustainable Energy, Grids and Networks | VOL. 36
Glenn Ceusters, et. al.Glenn Ceusters ... Maarten Messagie
31 Oct 2023
Sustainable Energy, Grids and Networks | VOL. 36

Comparative study of model-based and model-free reinforcement learning control performance in HVAC systems
Cheng Gao ... Dan Wang
Journal of Building Engineering | VOL. 74
Cheng Gao, et. al.Cheng Gao ... Dan Wang
01 Sep 2023
Journal of Building Engineering | VOL. 74

Data-Efficient Generalization of Robot Skills with Contextual Policy Search
Andras Kupcsik ... Gerhard Neumann
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 27
Andras Kupcsik, et. al.Andras Kupcsik ... Gerhard Neumann
29 Jun 2013
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Learning Replanning Policies With Direct Policy Search

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Robotics and Automation Letters