Abstract

This study proposes a novel hybrid imitation learning (HIL) framework in which behavior cloning (BC) and state cloning (SC) methods are combined in a mutually complementary manner to enhance the efficiency of robotic manipulation task learning. The proposed HIL framework efficiently combines BC and SC losses using an adaptive loss mixing method. It uses pretrained dynamics networks to enhance SC efficiency and performs stochastic state recovery to ensure stable learning of policy networks by transforming the learner’s task state into a demo state on the demo task trajectory during SC. The training efficiency and policy flexibility of the proposed HIL framework are demonstrated in a series of experiments conducted to perform major robotic manipulation tasks (pick-up, pick-and-place, and stack tasks). In the experiments, the HIL framework showed about a 2.6 times higher performance improvement than the pure BC and about a four times faster training time than the pure SC imitation learning method. In addition, the HIL framework also showed about a 1.6 times higher performance improvement and about a 2.2 times faster training time than the other hybrid learning method combining BC and reinforcement learning (BC + RL) in the experiments.

Highlights

  • An advanced service robot automates tasks performed by humans in the past—recognizing the surrounding conditions correctly, including human motion, the position of objects, and obstacles

  • This study proposes a hybrid imitation learning (HIL) framework, which is a novel imitation learning framework characterized by integrating Behavior cloning (BC) and state cloning (SC) in a mutually complementary manner

  • This study proposes an HIL framework as an efficient method for learning robotic manipulation tasks

Read more

Summary

Introduction

Recognizing the surrounding conditions correctly, including human motion, the position of objects, and obstacles. The second hybrid learning method [19], combining the imitation reward R BC and the task reward R RL , shows high flexibility of Sensors 2021, 21, 3409 the learned policy, since it allows extra experience data to be used for training the policy. This paper presents the results of a series of experiments testing the performance of the proposed HIL framework in executing pick-up, pick-and-place, and stack tasks and demonstrates its high training efficiency and policy flexibility. This paper presents the process for and results of a series of object manipulation experiments using a 9-DOF (degree of freedom) Jaco robotic hand, demonstrating the high learning efficiency and policy flexibility of the proposed HIL framework.

Related Work
Problem Description
State Cloning with Dynamics Network
Pretraining the Dynamics Network
Manipulation Tasks
Model Training
Experiments in Simulated Envirionment
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.