Abstract

Image animation creates visually compelling effects by animating still source images according to driving videos. Recent work performs animation on arbitrary objects using unsupervised methods and can relatively robustly perform motion transfer on human bodies. However, the complex representation of motion and unknown correspondence between human bodies often lead to issues such as distorted limbs and missing semantics, which make human animation challenging. In this paper, we propose a semantically guided, unsupervised method of motion transfer, which uses semantic information to model motion and appearance. Specifically, we use a pre-trained human parsing network to encode the rich and diverse foreground semantic information, thus generating fine details. Secondly, we use a cross-modal attention layer to learn the semantic region’s correspondence between human bodies to guide the network in selecting appropriate input features, prompting the network to generate accurate results. Experiments demonstrate that our method outperforms state-of-the-art methods in motion-related metrics, while effectively addressing the problems of semantic missing and unclear limb structures prevalent in human motion transfer. These improvements can facilitate its applications in various fields, such as education and entertainment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.