Abstract

We propose a reinforcement learning (RL) framework to improve policies for a high-dimensional system through fewer interactions with real environments than standard RL methods. In our learning framework, we first use off-line simulations to improve the controller parameters with an approximated environment model to generate samples along locally optimized trajectories. We then use the approximated dynamics to improve the performance of a tool manipulation task in a path integral RL framework, which updates a policy from the sampled trajectories of the state and action vectors and the cost. In this study, we apply our proposed method to a bimanual humanoid motor learning task in which we need to explicitly consider a closed-chain constraint. We show that a 51-DOF real humanoid robot can learn to manipulate a rod to hit via-points using both arms within 36 interactions in a real environment.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.