Abstract

Visual imitation learning is a promising approach that promotes robots to learn skills from visual demonstrations. However, current visual imitation learning approaches introduce unreasonable assumptions that the contexts of the visual demonstrations and the robot observations are consistent, which affects the flexibility and scalability of the approaches. It is a key challenge for robots to learn from visual demonstrations with inconsistent contexts. Inconsistent contexts may cause a serious difference in the pixel distribution of the operator and the environment, which makes vision-based control policies hardly effective. In this paper, we propose a novel imitation learning framework to enable robots to reproduce behavior by watching human demonstrations with inconsistent contexts, such as different viewpoints, operators, backgrounds, object appearances and positions. Specifically, our framework consists of three networks: flow-based viewpoint transformation network (FVTrans), robot2human alignment network (RANet) and inverse dynamics network (IDNet). First, FVTrans transforms various third-person demonstrations into the fixed robot execution view. With a meta learning strategy, FVTrans can quickly adapt to novel contexts with few samples. Then, RANet aligns the human and the robot at the feature level. Therefore, the demonstration feature can be used as a subgoal of the current moment. Finally, IDNet predicts the joint angles of the robot. We collect a multi-context dataset on the real robot (UR5) for three tasks, including grasping cups, sweeping garbage and placing objects. We empirically demonstrate that our framework can perform three tasks with a high success rate and be effectively generalized to different contexts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.