Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller

Yutaka Nakamura,Tomohiro Shibata,Yoichi Tokita,Shin Ishii,Takeshi Mori

doi:10.20965/jrm.2005.p0636

Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller

Yutaka Nakamura, Tomohiro Shibata + Show 3 more

https://doi.org/10.20965/jrm.2005.p0636

Copy DOI

Journal: Journal of Robotics and Mechatronics	Publication Date: Dec 20, 2005
Citations: 3

#Central Pattern Generator Controller #Central Pattern Generator + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Referring to the mechanism of animals’ rhythmic movements, motor control schemes using a central pattern generator (CPG) controller have been studied. We previously proposed reinforcement learning (RL) called the CPG-actor-critic model, as an autonomous learning framework for a CPG controller. Here, we propose an off-policy natural policy gradient RL algorithm for the CPG-actor-critic model, to solve the “exploration-exploitation” problem by meta-controlling “behavior policy.” We apply this RL algorithm to an automatic control problem using a biped robot simulator. Computer simulation demonstrated that the CPG controller enables the biped robot to walk stably and efficiently based on our new algorithm.

Full Text