Abstract
Direct policy search has been successful in learning challenging real-world robotic motor skills by learning open-loop movement primitives with high sample efficiency. These primitives can be generalized to different contexts with varying initial configurations and goals. Current state-of-the-art contextual policy search algorithms can however not adapt to changing, noisy context measurements. Yet, these are common characteristics of real-world robotic tasks. Planning a trajectory ahead based on an inaccurate context that may change during the motion often results in poor accuracy, especially with highly dynamical tasks. To adapt to updated contexts, it is sensible to learn trajectory replanning strategies. We propose a framework to learn trajectory replanning policies via contextual policy search and demonstrate that they are safe for the robot, can be learned efficiently, and outperform non-replanning policies for problems with partially observable or perturbed context.
Highlights
A POPULAR approach to learn robotic skills is the combination of direct policy search with structured trajectory representations that can be initialized from demonstrations via imitation learning [1], [2]
This demonstrates that contextual direct policy search algorithms, when coupled with smooth parameterized trajectory generators are more powerful than they are given credit for in the literature
We propose a combination of replanning policies with replannable Dynamical movement primitives (DMPs) and show how they can be learned with policy search by introducing some light assumptions
Summary
A POPULAR approach to learn robotic skills is the combination of direct policy search with structured trajectory representations that can be initialized from demonstrations via imitation learning [1], [2]. One approach to robot reinforcement learning is to learn the Q-function of a time-dependent control policy [10], [11] Such a trajectory representation can be problematic. This is an important issue since in reality, the frequency of context measurement can be orders of magnitude slower that the robot control frequency (e.g., optical tracking) Such a time-dependent policy typically requires a large number of parameters to produce complex trajectories. An alternative approach that requires to learn much fewer parameters is to learn a simple policy that determines the parameters of a low-dimensional structured trajectory representation [3] For this purpose, a variety of direct policy search algorithms have been suggested [13]. This demonstrates that contextual direct policy search algorithms, when coupled with smooth parameterized trajectory generators are more powerful than they are given credit for in the literature
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have