Abstract

This paper presents a user-friendly method for programming humanoid robots without the need for expert knowledge. We propose a combination of imitation learning and reinforcement learning to teach and optimize demonstrated trajectories. An initial trajectory for reinforcement learning is generated using a stable whole-body motion imitation system. The acquired motion is then refined using a stochastic optimal control-based reinforcement learning algorithm called Policy Improvement with Path Integrals with Covariance Matrix Adaptation ( $$\text {PI}^{2}$$ -CMA). We tested the approach for programming humanoid robot reaching motion. Our experimental results show that the proposed approach is successful at learning reaching motions while preserving the postural balance of the robot. We also show how a stable humanoid robot trajectory learned in simulation can be effectively adapted to different dynamic environments, e.g. a different simulator or a real robot. The resulting learning methodology allows for quick and efficient optimization of the demonstrated trajectories while also taking into account the constraints of the desired task. The learning methodology was tested in a simulated environment and on the real humanoid robot TALOS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call