Abstract

Balancing and push-recovery are essential capabilities enabling humanoid robots to solve complex locomotion tasks. In this context, classical control systems tend to be based on simplified physical models and hard-coded strategies. Although successful in specific scenarios, this approach requires demanding tuning of parameters and switching logic between specifically-designed controllers for handling more general perturbations. We apply model-free Deep Reinforcement Learning for training a general and robust humanoid push-recovery policy in a simulation environment. Our method targets high-dimensional whole-body humanoid control and is validated on the iCub humanoid. Reward components incorporating expert knowledge on humanoid control enable fast learning of several robust behaviors by the same policy, spanning the entire body. We validate our method with extensive quantitative analyses in simulation, including out-of-sample tasks which demonstrate policy robustness and generalization, both key requirements towards real-world robot deployment.

Highlights

  • B IPEDS are those creatures that make use of two legs for moving while maintaining static or dynamic equilibrium

  • These methods are more targeted towards benchmarking model-free Deep Reinforcement Learning (DRL) for continuous control and realistic animation of simplified characters rather than applicability to real humanoid robots

  • DRL-based methods for whole-body humanoid control remain an open problem and have the potential for learning highdimensional locomotion policies, further improving humanoid capabilities to recover from external perturbations

Read more

Summary

INTRODUCTION

B IPEDS are those creatures that make use of two legs for moving while maintaining static or dynamic equilibrium. Learned behaviors often display unnatural characteristics, such as asymmetric gaits, abrupt motions of the body and limbs, or even unrealistic motions exploiting imperfections and glitches in the physical simulator of choice These issues significantly limit generalization and transferability to real-world robots. Control architectures are often organized as hierarchies composed of trajectory optimization [8], simplified model control, and whole-body quadratic programming [9], [10] While such approaches have achieved considerable results both on simulated and real humanoid robots, they: 1) Rely on an accurate description of the robot dynamics; 2) Require hand-crafted features for online execution [11]; 3) Present challenges when simultaneously facing different tasks. Inspired by floating-base dynamics – encoding sufficient information for solving the task with no prior knowledge about the desired trajectories

Control-Theoretic Approaches
Deep Reinforcement Learning Approaches
BACKGROUND
ENVIRONMENT
Action
Reward
Other Specifications
Deterministic Planar Forces
Random Spherical Forces on the Base Links
Training Performance
Random Spherical Forces on the Chest and Elbow Links
DISCUSSION
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call