Abstract

This paper addresses the problem of animating a person in static images, the core task of which is to infer future poses for the person. Existing approaches predict future poses in the 2D space, suffering from entanglement of pose action and shape. We propose a method that generates actions in the 3D space and then transfers them to the 2D person. We first lift the 2D pose of the person to a 3D skeleton, then propose a 3D action synthesis network predicting future skeletons, and finally devise a self-supervised action transfer network that transfers the actions of 3D skeletons to the 2D person. Actions generated in the 3D space look plausible and vivid. More importantly, self-supervised action transfer allows our method to be trained only on a 3D MoCap dataset while being able to process images in different domains. Experiments on three image datasets validate the effectiveness of our method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.