Abstract

This paper investigates the influence of reference motion quality and other design choices on the performance of deep reinforcement learning for bipedal walking with Proximate Policy Optimization (PPO). We use parametrized Cartesian quintic splines to generate reference actions for an omnidirectional walk policy. By using parameter sets with different qualities, we show that the performance of the trained policy correlates to the quality of the reference motion. We also show that a policy in Cartesian space performs superior to a joint-space-based one if an advantageous representation of orientation is chosen. Additionally, we show that using an initial bias for the policy speeds up the training and leads to higher performances for policies using position control. We also show that we can achieve a stable omnidirectional walk on a wide variety of simulated humanoid robots.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call