Learning Control Policies from Optimal Trajectories

Christoph Zelch,Oskar Von Stryk,Jan Peters

doi:10.1109/icra40945.2020.9196791

Abstract

The ability to optimally control robotic systems offers significant advantages for their performance. While time-dependent optimal trajectories can numerically be computed for high dimensional nonlinear system dynamic models, constraints and objectives, finding optimal feedback control policies for such systems is hard. This is unfortunate, as without a policy, the control of real-world systems requires frequent correction or replanning to compensate for disturbances and model errors.In this paper, a feedback control policy is learned from a set of optimal reference trajectories using Gaussian processes. Information from existing trajectories and the current policy is used to find promising start points for the computation of further optimal trajectories. This aspect is important as it avoids exhaustive sampling of the complete state space, which is impractical due to the high dimensional state space, and to focus on the relevant region.The presented method has been applied in simulation to a swing-up problem of an underactuated pendulum and an energy-minimal point-to-point movement of a 3-DOF industrial robot.

Full Text