Reference Motion Quality and Design Choices for Bipedal Walking with PPO

Marc Bestmann,Jianwei Zhang

doi:10.1142/s0219843623500251

Abstract

This paper investigates the influence of reference motion quality and other design choices on the performance of deep reinforcement learning for bipedal walking with Proximate Policy Optimization (PPO). We use parametrized Cartesian quintic splines to generate reference actions for an omnidirectional walk policy. By using parameter sets with different qualities, we show that the performance of the trained policy correlates to the quality of the reference motion. We also show that a policy in Cartesian space performs superior to a joint-space-based one if an advantageous representation of orientation is chosen. Additionally, we show that using an initial bias for the policy speeds up the training and leads to higher performances for policies using position control. We also show that we can achieve a stable omnidirectional walk on a wide variety of simulated humanoid robots.

Full Text