Abstract

To improve the robustness of biped walking, a model parameters optimization method based on policy gradient decent learning is presented. For the linear inverted pendulum mode-based model parameters optimization, firstly, select the input parameters of the inverted pendulum model and the torso attitude parameters of the robot as the correction variables and establish the correction equation. Then, using the tracking errors of center of mass (CoM) of the robot and the errors of the robot posture relative to the upright state of the body to establish the fitness function. According to the fitness function, the gain coefficients in the model parameters correction equation are optimized by using the strategy gradient learning method, and the modified gain parameters are substituted into the model parameters correction equation to obtain the correction amount. By applying the model parameters optimization strategy, the robot can quickly and in real time adjust the body posture and walking patterns under unknown disturbances, hence, the walking robustness can be enhanced. Simulation and experiments on a full-body humanoid robot NAO validate the effectiveness of the proposed method. The experiments show that the optimized model yields a more controlled, robust walk on NAO robot and on various surfaces without additional manual parameters tuning.

Highlights

  • IntroductionStable and robust biped walking gait generation is very important

  • For humanoid robots, stable and robust biped walking gait generation is very important

  • Humanoid walking is commonly realized by planning the center of mass (CoM) trajectories, so that the resultant zero moment point (ZMP) trajectory follows a desired ZMP trajectory, which is normally determined by predefined foot positioning

Read more

Summary

Introduction

Stable and robust biped walking gait generation is very important. When the CoM has the position xtb and velocity x_tb relative to the origin of Q at the beginning of a single support phase, rx, ðx0Þx, and ðx_0Þx can be computed by the following equations, so that all the pendulum parameters can be determined. At this time, the method of policy gradient learning does not directly optimize the input parameters of the inverted pendulum but optimizes the gain coefficient of the compensation amount, thereby indirectly adjusting the gait of the robot. In the process of gradient learning, if the value of the fitness function decreases gradually, the errors in the centroid following and the body inclination are gradually reduced, and the adaptability of the robot under the current parameter set is gradually increased. When the number of iterations reaches the preset value N iter, the iteration ends

Experiments and results
Conclusions and discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.