Abstract
Robot manipulation tasks can be carried out effectively, provided the state representation is satisfactorily detailed. Embodiment difference, Viewpoint difference, and Domain difference are some of the challenges in learning from human demonstration. This work proposes a self-supervised and multi-viewpoint spatial and temporal features unified representation learning method. The algorithm consists of two components: (a) Spatial Component, which learns the setting of the environment, i.e., on which pixels to focus on most to get the best representation of the image regardless of point of view, and (b) Temporal Component that learns how snapshots taken from multiple viewpoints simultaneously (i.e., at the same time-step but from a different viewpoint) are similar and how these snaps are different from snaps taken at a different time-step but same viewpoint. Further, these representations are integrated with the Reinforcement Learning (RL) framework to learn accurate behaviors from videos of humans performing the manipulation task. The effectiveness of this approach is illustrated by training the robots to learn various manipulation tasks i.e., (a) grab objects (b) lift objects (c) open and close drawers from expert demonstrations provided by humans. The algorithm shows great promise and is highly successful across all the manipulation tasks. The robot learns to pick up objects of various shapes, sizes and colors having different orientations and placements on the table. The robot also successfully learns how to open and close drawers. The method is highly sample efficient and addresses the challenges of embodiment, viewpoint, and domain difference.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.