Abstract

Energy efficiency is critical for the locomotion of quadruped robots. However, energy efficiency values found in simulations do not transfer adequately to the real world. To address this issue, we present a novel method, named Policy Search Transfer Optimization (PSTO), which combines deep reinforcement learning and optimization to create energy-efficient locomotion for quadruped robots in the real world. The deep reinforcement learning and policy search process are performed by the TD3 algorithm and the policy is transferred to the open-loop control trajectory further optimized by numerical methods, and conducted on the robot in the real world. In order to ensure the high uniformity of the simulation results and the behavior of the hardware platform, we introduce and validate the accurate model in simulation including consistent size and fine-tuning parameters. We then validate those results with real-world experiments on the quadruped robot Ant by executing dynamic walking gaits with different leg lengths and numbers of amplifications. We analyze the results and show that our methods can outperform the control method provided by the state-of-the-art policy search algorithm TD3 and sinusoid function on both energy efficiency and speed.

Highlights

  • Legged locomotion [1] is essential for robots to traverse difficult environments with agility and grace

  • Other than training directly in the real world [29], our method can avoid the potential damage to the robots during the training process which is critical for valuable robots

  • We propose a novel method to learn and transfer an energy-efficient control method for a quadruped robot in the real world by deep reinforcement learning and optimization

Read more

Summary

Introduction

Legged locomotion [1] is essential for robots to traverse difficult environments with agility and grace. The energy efficiency of mobile robots still have room for improvement when performing a dynamic locomotion. Learning-based approaches, especially deep reinforcement learning methods, have achieved tremendous progress in controlling robots [4–7]. Policy search [8], as a subfield of deep reinforcement learning, is widely studied in recent years. Numbers of policy search algorithms have appeared to improve the performance, sample efficiency while reducing the entropy in the learning process e.g., DDPG [4], TRPO [5], PPO [9], SAC [10], and TD3 [11]. These algorithms automate the training process and produce feasible locomotion for robots without much human interference

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.